Privitar Labs

Safely analysing location data

Location Data

Many of the tools and applications we use today, in particular our mobile phones, record location data. This can be simple information such as someone’s home or work address through to complex trajectories pinpointing a person’s route through a city every few seconds from a mobile device.

Location data can also be implicit from other data sources. For example, by examining transactions on a customer’s payment card, it is possible to learn that they visited a particular coffee shop at 09:15 on Wednesday 25th August, and to begin to construct location trajectories for their movements that day.

Organisations and society can benefit from the rich information contained in location data. For example, during the coronavirus pandemic, the ability to reconstruct the movements of infected individuals and to trace those they came into contact with is an essential element in managing the spread. More generally, location data can be used to plan urban transport, or to join the home location of individuals with demographic data from the local area.

However location data is highly sensitive and its use can put personal information at risk. There are many things about our location traces that we would rather keep private: what time we made it home last night; how often we’ve been to the doctor this month; which church, temple, or mosque we visit regularly, and so on. Additionally, the lack of presence at a particular location can pose dangers of its own.

Privacy challenges

Location traces form patterns of behaviour that are unique to individuals. It has been shown that even the coarse location resolution recorded by a mobile carrier’s antennas is very revealing; just four points in time and space are enough to uniquely identify 95% of individuals. This means that even in a dataset where primary identifiers have been removed and locations are generalised to a large area, an adversary with background knowledge of your approximate location at four points in time is likely to be able to isolate your location trace and learn all the other locations you visited.

Linking location data is easy and dangerously revealing

To complicate matters further, it is very easy to obtain background knowledge to link to location traces: our homes are recorded in public registries, and most of us reveal where we work on LinkedIn. That’s two location points. If we’re tagged via a tweet or in an Instagram photo, then that could place us at a third location.

A fourth identifier might be a public appearance (speaking at a conference or being interviewed). It doesn’t take much to distinguish our location trace enough to identify us. It’s already surprisingly simple to obtain this background information for a typical person, but for someone in the public eye, as a celebrity or politician, location data becomes incredibly easy to find.

Social media makes this even easier. Individuals often report where they are, and many popular applications such as Twitter, Foursquare or Facebook Places expose such geo-tagged updates via their public APIs. In most cases, no skills beyond simple programming knowledge are required to collect this information.

Techniques and approaches

General solutions to provide strong privacy protection for location data are beyond what is currently possible. The data may consist of many time-location points for each individual and we must assume that an adversary could have knowledge of any of them. Moreover the data is sparse and highly unique to individuals meaning there is little chance that an individual’s data can hide in the crowd.

Instead we need to consider more specific solutions for groups of problems. For example, for use cases involving a person’s home location, it may be sufficient to generalise this to a local area rather than the exact address.

For other business problems it may be sufficient to provide only query access to the raw data and apply differential privacy to protect the query results. For example, to understand the provisioning of amenities throughout a city, the area can be divided into a grid with each cell representing a small area for a given slice of time. The number of individuals in each cell can then be calculated using differential privacy.
For sophisticated data science analyses, an alternative is to bring the analytics to the data - allow only approved feature extraction and analysis code to be run against the data, in a secured environment, without granting direct access to the data itself. This same approach can be applied to rich free-text data and other very rich data such as genomic data.

Additional Resources

Visit Resource Library

Team up with Privitar Labs

Do contact us if you’d like our help in providing privacy protection to enable processing and analytics of location data.

How Can We Help?

Looking for help? Fill the form and start a new adventure.