CMU REU-SE: Projects
Mentor: Travis Breaux
Mobile devices and Internet of Things use context to deliver rich, personalized services to end users. However, these services often collect sensitive personal information about end users, which can expose end users to privacy threats and vulnerabilities. To help developers plan new and exciting applications that leverage our personal tastes, preferences and behaviors, the Requirements Engineering Lab at CMU has begun developing the Eddy language to express privacy requirements across distributed, service-oriented systems. An initial language syntax and semantics has been implemented in Description Logic and Java with tool support for detecting conflicts among different privacy policies. The REU-SE student will participate in the design and implementation of an extension to the language and will help identify opportunities to optimize the current tool suite for better performance.
The ideal student would have an interest in privacy, knowledge representation, multi-agent systems and logic. Experience writing language parsers or compilers is a plus.
Mentor: Jonathan Aldrich
Wyvern is a language designed to have good support for implementing software frameworks in domains with significant security or dependability requirements. In Wyvern, a library or framework can define syntactic extensions to the language, allowing domain- or framework-specific concerns to be expressed in a natural way, and ensuring that domain- or framework-specific constraints can be checked at compile time. An initial syntax extension mechanism for Wyvern has been defined and should be implemented by the summer. The REU-SE student will participate in the design and implementation of a corresponding extension mechanism for the Wyvern typechecker.
The ideal student has taken a course or has prior industry or research experience in programming languages or compilers.
Mentor: Brad A. Myers
Up and coming social coding platforms such as Github provide a transparent environment, where developers can follow other developers and individual projects to get informed about commits, forks, bug reports, and discussions on those projects. The transparency raises awareness and deepens engagement beyond just individual releases. However, when following many developers, quickly a massive amount of information accumulates, threatening to outweigh the benefit of transparency in the first place. Active developers perceive current notifications streams as noisy and overwhelming.
In this project, we want to help developers fight the information overload by identifying unusual commits that deserve additional attention. For example, a commit may be unusual when it is large and scattered in contrast to otherwise small and local changes, when developers make changes in an unfamiliar part of the code or in a language that does not fit their profile, when a change was committed at an unusual time, and so forth. Technically, we will explore anomaly detection, in essence statistical machine-learning approaches, to learn common patterns from the history of Github projects to then classify new commits as common or unusual. In this project, we will explore applicable techniques and develop and evaluate a prototype on real Github data.
The ideal candidate has experience with Github and has taken a course on machine-learning topics (possibly even fraud detection) and is interested in applying them in other fields. Specific goals will be matched depending on interests and background.
Both software and hardware configuration options can have significant impact on the energy consumption of a software system, such as a web server or video encoder. However, the huge configuration space with an exponential number of combinations of options makes finding the best configuration or just understanding the effect of a configuration option hard. In this project, you will measure the impact of software and hardware configuration options on energy consumption in common software systems (on measurement infrastructure we provide) and explore how results can be generalized and presented to users to make greener configuration decisions.
The ideal candidate has an interest in green computing and is fluent with Unix environments.
Mentor: Christian Kästner
We are building tools that can analyze highly configurable software systems and software product lines, focusing on C code with preprocessor variability. Among others, we use them to find bugs in the Linux kernel (~10000 configuration options) or embedded systems software. Instead of analyzing every configuration independently (which does not scale, since there are easily more configurations than atoms in the universe), we look at all configurations at the same time, exploiting the similarity between configurations.
We have many ideas about further analyses, including type checking, static control-flow and data-flow analysis, pointer analysis, taint analysis, symbolic execution, testing, GLR parsing, and others that could be applied to either the source code or the build system. Also empirical projects are possible (e.g., mining software repositories to quantify the influence of configuration options, analyzing interactions in practice, or estimate the attack surface of different configurations). An interesting target to investigate, besides various open source systems, might be DARPA's hardened Android kernel or correlating bugs from NIST's CVE database with configuration complexity.
The ideal candidate has taken a course on compilers, static analysis, or mining software repositories and is familiar with C. Most tools in our infrastructure are written in Scala; knowledge of Scala or a similar functional programming language is a plus. When working on the project, you will also be exposed to research around software product lines. The work will be embedded in the larger TypeChef project, with collaborators at CMU, in Germany, Denmark, and Canada.
Specific goals will be matched depending on interests and background.
Mentor: Jonathan Aldrich
System designers often express and reason about the design of a distributed system using architecture diagrams, showing the nodes of computation in the system and the connections between them. However, when the system is implemented, this level of design is lost, and it can be difficult to see whether the implemented system matches the architecture or to trace parts of the source code to the archtectural features they implement. In this project, the REU-SE student will participate in the design and prototyping of an approach that makes the architectural description a "live" element of the system, so that adding a node or connection to a machine-readable architectural description results in the run-time creation of that node or that connection. A capability-based resource access model will be used to enforce that nodes only access the network through connections defined in the architecture.
The ideal student has taken a course or has prior industry or research experience in programming languages, software architecture, and/or compilers.
Mentor: Bradley Schmerl
Self-adaptive systems continually adapt to changing or erroneous conditions to attempt to provide optimal service. Deciding the optimal changes at run time is an active and challenging research area. One aspect of this problem is being able to specify the impact that changes may have on the quality of service of the system, so that potential changes can be compared and chosen from. The Rainbow system in the ABLE group at CMU wishes to extend the way that impact is expressed to be sensitive to the conditions of the system and the environment in which it runs. We have a proposed syntax and semantics for a system. The REU-SE student will participate in the implementation of these impact models into the Rainbow system.
The ideal student would have an interest in self-adaptive system, and experience with writing translators or compilers.
Mentor: David Garlan
Non-computing professionals are increasingly called upon to assemble software components to support professional activities, especially in domains such as eScience and medical informatics. Many of these activities take some time and work on large datasets, but at times these professionals just want quick estimates of results. Understanding the ramifications of such choices on the time, accuracy, and quality of the results is an error which has not yet been explored in detail. The ABLE group at CMU is developing approaches that can help to communicate the consequences of these choices to end users by leverage formal model checking to make trade-offs and choose compositions that satisfy them. We use the Alloy model checker to generate candidate solutions and then rank them. The REU-SE student will participate in the design and implementation of a translator from composition description languages into an Alloy model, and extend the capabilities of the existing model to provide more nuanced trade-offs.
The ideal student would have an interest in formal modeling and model checking.
Self-adaptive systems continually adapt to changing or erroneous conditions to attempt to provide optimal service. Often, such systems used models of a system being managed to reason about whether and which changes to make and when. However, these systems rarely contain models of the self-adaptive, or managing, part of themselves to help guide the way in which that part of the system should adapt. Such an ability will greatly improve the adaptability of these systems to modern computing environments. Furthermore, models of the self-adaptive system would be helpful in initial configuration and deployment of these systems, which is currently quite challenging and error-prone. The REU-SE student will participate in the design and implementation of this capability in Rainbow, and in developing examples that can be used to demonstrate the improvements of this new capability.
The ideal student would have an interest in self-adaptive systems and systems development.
In highly configurable systems, such as SELinux, the configuration space is simply too big for (re-) certifying every configuration in isolation. In this project, we will combine software analysis with network analysis to detect which configuration options interact and which have local effects. Instead of analyzing a system as Linux and SELinux for every combination of configuration settings one by one (> 102000 even considering compile-time configurations only), we will analyze the effect of each configuration option once for the entire configuration space.
The ideal student is familiar with the C programming language, program analysis (e.g., control-flow graphs, pointer analysis), and/or the Linux kernel.
Mentor: Claire Le Goues
Defects in existing, deployed software are a dominant cost in engineering practice. Companies lose measurable market share the day a vulnerability is announced in their code; developers spend up to half of their time debugging. Our research seeks to understand and then automatically improve software quality. We have developed a technique, GenProg, that uses randomized search to automatically and efficiently fix bugs in source code. While our evaluation has demonstrated that the technique is effective for approximately half of an indicative set of bugs in open-source software, many challenges remain to increase the complexity of defects that can be repaired and increase the understandability and utility of the results of the analysis.
We have several projects along this line of research that might be a good fit for an REU student, with the exact project to be decided upon based on the interest and background of the candidate: adapting semantic search and/or systematic change identification techniques to identify promising code for more complex fixes; building a tool to automatically correct code in the face of external API changes; or developing an analysis that can explain or make guarantees about the effect a change will have on a program at runtime.
The ideal candidate possesses fluency in Unix environments; has taken a course or otherwise has prior experience in programming languages (exposure to functional languages is a plus), compilers, or static analysis; and/or has experience in modifying open-source code.
Mentor: Travis Breaux
Software requirements frequently appear in policies, mailing lists and forums, however, the shear volume of text describing these requirements exceeds the limits of human resources available to process this information. To help developers extract meaningful requirements from these sources, the Requirements Engineering Lab at CMU has begun developing hybridized tools that combine crowdsourced human language understanding with natural language processing (NLP) techniques to overcome technical challenges due to ambiguity and domain-specific terminology. Early versions of these tools have been written in Java and Python and have been applied to privacy policies. The REU-SE student will participate in the extension of these tools to new information sources and will help identify opportunities to optimize current tools for better performance. Along the way, they will learn to work with crowdsourcing platforms and advanced tools for natural language processing.
The ideal student would have an interest in natural language processing, crowdsourcing, and knowledge representation.