Location, location, location: how movement traces become unique identifiers

By Charlie Cabot - December 06, 2017

“Modern communications mean most individuals today walk around with a beacon that transmits their location.” - the Electronic Frontier Foundation

That beacon, of course, is the smartphone in their pockets. Smartphones collect a rich and valuable data set of location traces, and transmit it to several organisations: mobile phone providers, Wifi providers, and any mobile app services (Facebook, Citymapper, Google Maps) that have permission and can access location services.

But that’s just the start. Location data collection extends far beyond the mobile phone

In-person and online digital purchases (through credit cards or loyalty cards) leave a trail of movements as people shop. Transportation services such as Uber record journeys around town.  Some cities are even rolling out facial recognition technology that record citizens as they walk around during certain events (e.g. the Champions League final in Cardiff last June) and some airports are planning to roll out similar technologies.

With extensive data collection comes both better analytics and great privacy risk

Rich patterns in human location data yield insights that we can use to improve our society, by reducing traffic, improving public transportation, making emergency response plans more efficient, and so on. It is no wonder that organisations seek to collect and analyse such valuable data.

At the same time, location data is highly sensitive, and the way organisations use it can put personal information at risk. There are many things about our location traces that device owners would rather keep private: what time they made it home last night; how often they’ve been to the doctor this month; which church, temple, or mosque they visit regularly, etc. It’s imperative that the organisations collecting and handling this location data treat it with respect and process it only in ways that preserve its privacy.

Implementing privacy-enhancing technologies and processes opens the door for safe analytics. The ultimate goal: to unlock valuable insight from location data without leaking sensitive information about individuals.

Can anonymisation protect location data?

Unfortunately, research suggests that location data cannot be anonymised effectively. Leading privacy expert and Privitar academic advisor Yves-Alexandre de Montjoye and his coauthors wrote previously that “with spatial resolution equal to that given by the carrier’s antennas, four-spatio-temporal points are enough to uniquely identify 95% of individuals”. In a nutshell, this means: if you know approximately where someone was at four different points in time, that’s enough to effectively reverse location data anonymisation. Even if the locations are generalised and your unique identifiers have been removed, someone with sufficient background knowledge about the user might be able to track down their location trace in the supposedly anonymised data.

Learning where someone was at four points is easy -- and getting easier

How easy is it to obtain background knowledge about someone? Our homes are part of public registries, and most of us reveal where we work on LinkedIn. That’s two location points. If we’re tagged in a public tweet or in an Instagram photo, then that could place us at a third location. A fourth and final identifier, such as a public appearance (speaking at a conference or being interviewed), would distinguish our location trace enough to identify us. It’s simple to obtain this background information for a typical person, but if you’re in the public eye, as a celebrity or politician, your location data becomes incredibly easy to find.

For device users, this means: if your mobile phone/transport app/GPS provider tells you they’re anonymising your location traces and sharing them, they’re likely over-representing the privacy precautions they’ve taken. No technique yet exists to make full human location traces truly anonymous; a relatively small amount of background information is almost always enough to identify you and reverse location data anonymisation.

A better approach: releasing aggregates with privacy-preserving techniques

Using aggregation, organisations can achieve much better protection and  still deliver useful analytics. Aggregation refers to the release of statistics about the data, rather than the data itself. For instance, one aggregate might be the number of people in each London borough at a specific time. This statistic doesn’t reveal personal information, but it is informative.

Still, organisations should be careful in releasing aggregate statistics. Certain combinations of aggregate statistics can reveal personal information about an individual to a clever attacker. Here are two types of attacks on aggregate location data:

Differencing attacks

Consider the following aggregate statistics derived from location data, which may be useful for public health research:


Screen Shot 2017-12-06 at 14.57.10.png

At first glance, these statistics seem safe because they are aggregates over large groups of people. However, in combination, they reveal that the one person who moved out of MK3 in September had been frequently visiting the hospital. If I knew that my friend Alice moved out of MK3 in September (or learn this from her house sale record in the land registry), I could learn that she was frequently visiting the hospital. This type of attack is called a differencing attack.

Reconstruction attacks

Another type of privacy attack, the reconstruction attack, needs more published statistics to work but it’s also far more serious – it reconstructs a significant portion of the raw dataset. You can think of each aggregate statistic as an equation, where the variables represent the sensitive attributes. With enough information, the system of equations can be solved and all the sensitive attributes determined - a bit like Sudoku. Statistician John Abowd and his coauthors wrote recently about the seriousness of reconstruction attacks for organisations like the US Census.

Data privacy techniques can keep aggregate statistics safe. One example of such a technique is to add a small amount of uncertainty (i.e. random noise) to the statistics. When applied correctly, this preserves valuable patterns within the data but defends against differencing and reconstruction attacks.

Vulnerable location data: what it means for data-processing organisations

As so often is the case with privacy, it comes down to the trade-off between safeguarding data and opening it up for useful analysis. The good news: data privacy technology allows organisations to reduce the considerable privacy risk while benefiting from the rich insights location data contains.

The differential privacy model is a natural place for organisations to start educating themselves on data privacy. If you’d like to know more, we’ve written an intro to it: find it here.