The UK Office for National Statistics (ONS) recently published a methodology report on behalf of the Government Statistical Service (GSS). Members of Privitar’s research and policy teams were invited to contribute a chapter, on an area of privacy engineering we are particularly excited about: differential privacy 1.
Differential privacy provides a guarantee that no one can learn anything significant about any individual from their inclusion in a dataset. This helps private companies, as well as government agencies, to share and monetise their data while protecting the privacy of the people in their datasets. This post outlines the key facts about differential privacy, why you need it, and what it can offer you.
The need for differential privacy
Lesson 1: Do not assume aggregate statistics do not reveal anything about individuals.
Data about groups can reveal more than intended about specific individuals. This issue is increasingly serious in the modern world, where adversaries are in possession of powerful computers, sophisticated techniques, and large amounts of auxiliary data. In addition, regulatory demands on, and public expectations of, data holders are heightening. But what exactly are the risks private sector organisations should be trying to defend against?
Take a health insurer providing insurance to the employees of multiple companies - They wish to provide data about trends in certain health conditions to their clients. One approach is to prevent access to the dataset itself and only release aggregate statistics like sums, counts, and averages. The insurer might release to its clients the the prevalence of different health issues amongst staff broken down by demographic groups, rather than the complete dataset itself.
What can differential privacy do for you?
Lesson 2: Differential privacy can allow you to safely use data which is otherwise inaccessible.
Differential privacy was developed to protect against these attacks, as well as all other attacks, whether we know about them yet or not . No other existing privacy approach is capable of doing this. It treats the cause of attacks - information leaked about individuals - rather than the attacks themselves, meaning your defence won’t be broken by discovery of a new attack. Using this robust approach, otherwise inaccessible data can be safely analysed.
Lesson 3: Consider who has, or might have, access to your data, and what they might be capable of. Differential privacy can mitigate risk in the absence of good environmental controls.
Differential privacy can be applied to anything from aggregate statistics to complex machine learning tasks.
When data sharing is taking place within more controlled environments, such as under strict contracts with only a few individuals, the risk of privacy attacks is lower. Without a controlled data environment, businesses must expect an adversary with potentially unlimited background information, state of the art techniques, and plenty of resources. Without differential privacy, it can be very hard to ensure that attacks on aggregate statistics are not possible in these situations.
Lesson 4: Differential privacy gives a mathematically provable guarantee about individual privacy.
An insurer using differential privacy to release aggregate statistics to clients about the health of their staff could give a provable mathematical guarantee that this reveals a limited amount about any individual employee. It is therefore safe for the staff to ‘opt in’, as they can be sure that doing so will not reveal enough information to determine their individual diagnosis.
Lesson 5: Differential privacy allows control of a privacy-utility trade off which exists when using any data.
Differential privacy gives you direct control over the balance between privacy and utility of data analysis. It adds a precise amount of probabilistic noise to your statistics, creating controlled uncertainty, and allows you to tune the level of this noise. More noise improves privacy, as less can be learned about individuals, but will reduce accuracy. A balance between accuracy and privacy must always be found, but no other method makes this so directly accessible.
Lesson 6: Consider carefully the privacy cost of any data analysis. Can you reduce risk by releasing less?
Differential privacy protects multiple analyses of the same data by adding more noise to account for the accumulating risk. Each new analysis will be less accurate than the last. This is sometimes considered a negative point of differential privacy itself - a classic case of “shooting the messenger”. Privacy risk accumulates with each data release no matter what you do to protect the data, but only differential privacy allows you to explicitly detect and respond to accumulated risk. It’s an uncomfortable truth certainly, but one you should be aware of.
Lesson 7: Having a clear understanding of the insights you are trying to draw from your data, and the attacks you are trying to prevent, is essential for calibrating differential privacy.
Calibrating noise is part technical and part pragmatic. The aim is to prevent attacks without overly affecting insights drawn from the data. Be very wary of any vendor who offers you a differential privacy product without suggesting how to calibrate it. Be sceptical, too, of any differential privacy product which does not factor in the risk of multiple analyses. For further reading on this topic see our blog post.
Lesson 8: The openness of differential privacy can help to build public trust, prevent inaccurate conclusions, and facilitate collaboration on best practice.
Differential privacy does not depend on hiding any technical details.This means you can enjoy the benefits of being transparent without additional risk: downstream users can know exactly how noise was added and account for it, preventing false conclusions.
Consider differential privacy if you:
- Want to make better use of data that is currently off limits.
- Understand how privacy protection interferes with your data’s usefulness.
- Want strong protection in an uncontrolled data environment.
- Wish to guarantee to stakeholders that you are using robust, future-proof privacy protection.
To find out more about how differential privacy can help organisations preserve the privacy and utility of data, read our executive summary of the GSS report.
1 This report was co-authored by Professor Kobbi Nissim of Georgetown who, as well as being one of Privitar’s academic advisors, is one of the creators of differential privacy. Along with his co-authors, Nissim was awarded the 2016 Test of Time award and the 2017 Gödel Prize, for his work on differential privacy.