Carnegie Mellon University
December 08, 2016

What you see may not be what you get

New Application Detects Code/Privacy Policy Misalignments for Mobile Apps

By Josh Quicksall

What you see may not be what you get

A new application by researchers in Carnegie Mellon University’s Institute for Software Research and The University of Texas at San Antonio (UTSA) can detect misalignments between a mobile application’s privacy policy and how it actually gathers and uses data.

While many applications rely on data, such as geo-location tagging and social media access, to function as advertised, researchers have found that many applications may gather, use, and distribute data in a way that is not reflected in their privacy policy.

Travis Breaux, an Associate Professor in the Institute for Software Research and a Principal Investigator on the project, notes that the work stems from the collaborative synergy brought by UTSA’s strength in static/dynamic code analysis and ISR’s groundbreaking work on privacy and policy research.

While the UTSA team was developing novel analytical approaches to determine how apps gather and handle data, the CMU team worked on creating a system to model privacy policies. “These models can then be used to reason over when data might be repurposed or when there are dangerous ambiguities in your policy,” Breaux explains.

Bringing together these lines of research, the researchers were able to compare a wide range of applications, finding that many had significant misalignments between behavior and stated policy. “Some apps have no privacy policies or a very vague policy,” Breaux notes, “while others make a reasonable attempt but still have large gaps.”

This is often the case, Breaux notes, because privacy policies are frequently based on a template or outdated best practices. “Ideally, the privacy policy should mirror how the app actually operates and behaves.”

To address this misalignment, Breaux and his collaborators essentially needed to bring together two different “languages”: computer code and the written word. “In order to do this, we built an ontology that aligns the terminology that people use in written privacy policies with the function names used to write code.”

This approach, when applied, can not only serve users – allowing them to check whether the app’s privacy policy truthfully represents how an app will behave – but also allows developers to check if their code does not align with their established privacy practices. “This has been one of the more overlooked aspects of privacy research thus far. Often we focus on how a shipped application impacts users without addressing the upstream development efforts that may lead to these misalignments in the first place.”

By plugging their tool into a developer’s integrated development environment, issues can be flagged in real-time, as the code itself is written. “By offering this tool, we’re essentially saying to developers: ‘I know you mean to do the right thing, but maybe you don’t know what the issues are, which data is sensitive, or where users will want to see more preferences to control their data,” Breaux explains.

And while developers already have access to this tool via a plugin for the Android Studio, this is only the first step from the collaboration and their work. Breaux points out that discussions are underway about how their approach might be adapted to perform these analyses on applications as they are loaded onto application distribution platforms, like the Google Play Store.

More generally, the collaboration – enabled by the Science of Security Lablet – has been a wonderful experience, says Breaux. “Often, at CMU, we work with what people call the top-ranked research institutions. While UTSA is not as visible, the team are some of the brightest, most creative people that I’ve worked with. They’re fearless when it comes to high risk research topics and new methods, which has enabled us to think well beyond our own comfort areas. I really look forward to what the future holds for our work together.”