Projects-Institute for Software Research - Carnegie Mellon University

CMU REU-SE: Projects

Design Notations for Mobile Privacy and Security

Mentor: Travis Breaux

Mobile devices and Internet of Things use context to deliver rich, personalized services to end users. However, these services often collect sensitive personal information about end users, which can expose end users to privacy threats and vulnerabilities. To help developers plan new and exciting applications that leverage our personal tastes, preferences and behaviors, the Requirements Engineering Lab at CMU has begun developing the Eddy language to express privacy requirements across distributed, service-oriented systems. An initial language syntax and semantics has been implemented in Description Logic and Java with tool support for detecting conflicts among different privacy policies. The REU-SE student will participate in the design and implementation of an extension to the language and will help identify opportunities to optimize the current tool suite for better performance.

The ideal student would have an interest in privacy, knowledge representation, multi-agent systems and logic. Experience writing language parsers or compilers is a plus.

Customizable Typechecking for Language Extensions in Wyvern

Mentor: Jonathan Aldrich

Wyvern is a language designed to have good support for implementing software frameworks in domains with significant security or dependability requirements. In Wyvern, a library or framework can define syntactic extensions to the language, allowing domain- or framework-specific concerns to be expressed in a natural way, and ensuring that domain- or framework-specific constraints can be checked at compile time. An initial syntax extension mechanism for Wyvern has been defined and should be implemented by the summer. The REU-SE student will participate in the design and implementation of a corresponding extension mechanism for the Wyvern typechecker.

The ideal student has taken a course or has prior industry or research experience in programming languages or compilers.

Energy Consumption in Large Configuration Spaces

Mentors: Christian Kästner and Yuvraj Agarwal

Both software and hardware configuration options can have significant impact on the energy consumption of a software system, such as a web server or video encoder. However, the huge configuration space with an exponential number of combinations of options makes finding the best configuration or just understanding the effect of a configuration option hard. In this project, you will measure the impact of software and hardware configuration options on energy consumption in common software systems (on measurement infrastructure we provide) and explore how results can be generalized and presented to users to make greener configuration decisions.

The ideal candidate has an interest in green computing and is fluent with Unix environments.

Analysis of Highly Configurable Open-Source Systems

Mentor: Christian Kästner

We are building tools that can analyze highly configurable software systems and software product lines, focusing on C code with preprocessor variability. Among others, we use them to find bugs in the Linux kernel (~10000 configuration options) or embedded systems software. Instead of analyzing every configuration independently (which does not scale, since there are easily more configurations than atoms in the universe), we look at all configurations at the same time, exploiting the similarity between configurations.

We have many ideas about further analyses, including type checking, static control-flow and data-flow analysis, pointer analysis, taint analysis, symbolic execution, testing, GLR parsing, and others that could be applied to either the source code or the build system. Also empirical projects are possible (e.g., mining software repositories to quantify the influence of configuration options, analyzing interactions in practice, or estimate the attack surface of different configurations). An interesting target to investigate, besides various open source systems, might be DARPA's hardened Android kernel or correlating bugs from NIST's CVE database with configuration complexity.

The ideal candidate has taken a course on compilers, static analysis, or mining software repositories and is familiar with C. Most tools in our infrastructure are written in Scala; knowledge of Scala or a similar functional programming language is a plus. When working on the project, you will also be exposed to research around software product lines. The work will be embedded in the larger TypeChef project, with collaborators at CMU, in Germany, Denmark, and Canada.

Specific goals will be matched depending on interests and background.

Foundations for Architecture-Based Distributed Systems

Mentor: Jonathan Aldrich

System designers often express and reason about the design of a distributed system using architecture diagrams, showing the nodes of computation in the system and the connections between them. However, when the system is implemented, this level of design is lost, and it can be difficult to see whether the implemented system matches the architecture or to trace parts of the source code to the archtectural features they implement. In this project, the REU-SE student will participate in the design and prototyping of an approach that makes the architectural description a "live" element of the system, so that adding a node or connection to a machine-readable architectural description results in the run-time creation of that node or that connection. A capability-based resource access model will be used to enforce that nodes only access the network through connections defined in the architecture.

The ideal student has taken a course or has prior industry or research experience in programming languages, software architecture, and/or compilers.

Impact Models for Self-Adaptive Systems

Mentor: Bradley Schmerl

Self-adaptive systems continually adapt to changing or erroneous conditions to attempt to provide optimal service. Deciding the optimal changes at run time is an active and challenging research area. One aspect of this problem is being able to specify the impact that changes may have on the quality of service of the system, so that potential changes can be compared and chosen from. The Rainbow system in the ABLE group at CMU wishes to extend the way that impact is expressed to be sensitive to the conditions of the system and the environment in which it runs. We have a proposed syntax and semantics for a system. The REU-SE student will participate in the implementation of these impact models into the Rainbow system.

The ideal student would have an interest in self-adaptive system, and experience with writing translators or compilers.

Self-Adaptive Models of Self-Adaptive Systems

Mentors: Bradley Schmerl and David Garlan

Self-adaptive systems continually adapt to changing or erroneous conditions to attempt to provide optimal service. Often, such systems used models of a system being managed to reason about whether and which changes to make and when. However, these systems rarely contain models of the self-adaptive, or managing, part of themselves to help guide the way in which that part of the system should adapt. Such an ability will greatly improve the adaptability of these systems to modern computing environments. Furthermore, models of the self-adaptive system would be helpful in initial configuration and deployment of these systems, which is currently quite challenging and error-prone. The REU-SE student will participate in the design and implementation of this capability in Rainbow, and in developing examples that can be used to demonstrate the improvements of this new capability.

The ideal student would have an interest in self-adaptive systems and systems development.

Analyzing Interactions and Isolation among Configuration Options

Mentors: Jürgen Pfeffer and Christian Kästner

In highly configurable systems, such as SELinux, the configuration space is simply too big for (re-) certifying every configuration in isolation. In this project, we will combine software analysis with network analysis to detect which configuration options interact and which have local effects. Instead of analyzing a system as Linux and SELinux for every combination of configuration settings one by one (> 102000 even considering compile-time configurations only), we will analyze the effect of each configuration option once for the entire configuration space.

The ideal student is familiar with the C programming language, program analysis (e.g., control-flow graphs, pointer analysis), and/or the Linux kernel.

Semantic search for automatic program repair

Mentor: Claire Le Goues

Semantic code search is an emerging technique that allows programmers to find code snippets by specifying what the code should do, instead of words it should contain (e.g., searching for code that reverses a string, not code that contains the word "reverse"). We're working on ways to use semantic search to find code snippets to fix bugs in code. The REU student would work on implementing a simple dynamic analysis to help construct search queries ("based on test case behavior, the code should do this, but not that") to pass to a database of candidate code snippets.

The ideal candidate has taken a course or otherwise has prior experience in programming languages (exposure to functional languages is a plus), compilers, or static analysis; is comfortable in a Unix environment; and/or has experience in modifying open-source code.

We have several other projects in the area of automatic program repair that might be a good fit for an REU student, with the exact project to be decided upon based on the interest and background of the candidate, including building a tool to automatically correct code in the face of external API changes or developing an analysis that can explain or make guarantees about the effect a change will have on a program at runtime.

Requirements Discovery using Crowdsourcing and NLP

Mentor: Travis Breaux

Software requirements frequently appear in policies, mailing lists and forums, however, the shear volume of text describing these requirements exceeds the limits of human resources available to process this information. To help developers extract meaningful requirements from these sources, the Requirements Engineering Lab at CMU has begun developing hybridized tools that combine crowdsourced human language understanding with natural language processing (NLP) techniques to overcome technical challenges due to ambiguity and domain-specific terminology. Early versions of these tools have been written in Java and Python and have been applied to privacy policies. The REU-SE student will participate in the extension of these tools to new information sources and will help identify opportunities to optimize current tools for better performance. Along the way, they will learn to work with crowdsourcing platforms and advanced tools for natural language processing.

The ideal student would have an interest in natural language processing, crowdsourcing, and knowledge representation.

Architectural models for self-protecting mobile systems

Mentors: Bradley Schmerl and David Garlan

Self-protecting systems adapt themselves to respond the threats and maliciousness at run-time. Our group has a framework that can handle self-adaptivity more generally, but there are many challenges applying this to self-protection, especially of mobile platforms such as Android. In particular, we need to understand the architectural model of the platform, especially understanding how apps may interact and what vulnerabilities they could exploit, how to target monitoring of communication to discover when potential vulnerabilities are exploited, and what to do once an exploit is attempted. The REU-SE student will participate in the implementation of a system that can generate architectural models of Android from static analysis of APK's and code, and use these models to determine potential vulnerabilities, using third party analysis tools.

Immutability after initialization

Mentors: Brad A. MyersJosh Sunshine, and Jonathan Aldrich

Design guidelines commonly recommend the use of immutable objects—objects that cannot change after they are constructed—to improve the security, simplicity, and maintainability of the resulting software. However, many important real-world data structures are recursive (e.g. doubly linked lists, trees with nodes containing both parent and child references, etc. ) and therefore cannot be initialized completely during construction. In addition, Stylos and Clarke showed that programmers learn to use classes whose instance variables are initialized by setters much more quickly than classes whose instance variables are initialized in a constructor.

We plan to investigate a refinement of immutability, which we call immutability after initialization (IAI), which enables both recursive data structures and the exploratory programming advocated by Stylos and Clarke, while providing many of the security, simplicity, and maintainability benefits of immutability. IAI objects will exist in three distinct phases -- construction, initialization, and use. In the construction and initialization phases, instance variables can be initialized or modified at will. All instance variables will be initialized by the end of the initialization phase, and instance variables cannot be initialized or modified during the use phase. Sound verification tools will enforce these properties.

We intend to build two variants of the verification tool on top of Java: a static checker enabled by to-be-determined program annotations and a runtime monitor. We will evaluate these two tools against each other and vanilla Java in a within-subjects programming experiment. Programmers will come to the lab and perform a series of tasks using IAI APIs derived from security-critical domains (e.g. web servers, hypervisors, single-sign-on systems). We will evaluate the impact of the tools on the security/correctness of participant code and time-on-task (a proxy for programmer productivity).

Quality assurance in large-scale database systems

Mentor: Claire Le Goues and Andy Pavlo

The Carnegie Mellon Automated Software Quality group is teaming up with the CMU Database group to explore better ways to develop and test novel database systems. The Database group is building a new main-memory database system that supports fast transaction processing workloads simultaneously with on-line analytical operations. Given the complexity of such a project, they want to have confidence in the quality and correctness of their system. At the same time, quality assurance techniques in the construction of such systems is an interesting research problem in itself. There are a couple of projects within this larger effort that are of suitable scope for an REU student, including a large-scale empirical study of common defects and/or QA practices in database development and the initial construction of a differential testing framework for database systems.

The ideal candidate for this REU has taken a university-level database course or otherwise has experience with using relational databases. Likewise, they should have either significant experience interacting with or studying open source systems or has taken a university-level course on software engineering (especially involving testing).