Self-service access to safe data
Protect data and manage risk
Analyze conversational chat data
Right data in the right hands
Align control and business use
Controlled access to data
Flexibility, consistency, scalability
Our professional services
Power responsible use
From clinical to commercial
Optimize data tests
Open new revenue streams
Realize the potential of the cloud
Protect data from misuse
Transform your data
Opinion and industry insights
An A to Z of the industry
The podcast for data leaders
Press releases, awards, and more
Staying at the cutting edge
The team behind Privitar
A thriving partner ecosystem
Our story, values, and careers
Dedicated customer assistance
Jun 22, 2018
Differential privacy has been touted as the solution for accessing all your sensitive data with absolute privacy assurance. The privacy defense that enables you to extract unlimited insights and train AI models while your customer’s hospital visits or pharmacy purchases never get revealed. The reason your compliance team gets out of the way, your data analysis projects start accelerating, and new customers begin lining up with their checkbooks.
Unfortunately not. Like any technology innovation, differential privacy (DP) is susceptible to hype: fact and fiction get mixed together in all the excitement. The adoption of DP by technologists at Apple, Google, and the US Census Bureau has only amplified the buzz. The reality is, DP is in its very first stages of being used in the real world, and practitioners are still figuring out when and how to use it.
What are the questions that cut to the heart of any DP discussion, allowing you to distinguish real, immediate value from overhype? This article presents the big three.
Differential privacy is not a magic wand for all analytics use cases. Firstly, it applies only to aggregate statistics or machine learning models, DP does not permit releasing information about individual entries directly. And further, there is no general-purpose DP solution that enables all types of aggregate analysis on all types of data. Due to some inescapable mathematical truths of information theory, which one can summarise as ‘if you release too many statistics, you reveal all your source data,’ such all-purpose tools are impossible.
But DP has been applied to some well-defined, well-scoped analytics use cases, and here it has successfully unlocked new value. These use cases have some common traits: large datasets and tolerance for approximate, not exact, statistics. And efficient DP algorithms for new use cases are being developed all the time by the research community, so DP’s real world applicability is always growing.
If you have a use case like this and need strong privacy protection, adopting DP makes sense. Which leads to the next question:
Differential privacy is actually short for ‘epsilon-differential privacy’. That parameter, epsilon, controls the strength of the protection that DP provides. Epsilon, somewhat confusingly, is inverse to the amount of privacy ‘ 0.01 is very private while 100 is very un-private. The lower the epsilon, the more privacy.
Why can’t we just set epsilon to 0? Because privacy doesn’t come for free. The lower epsilon goes, the more the accuracy of the data is damaged. There’s a privacy-utility trade-off, and maximising privacy will give complete gibberish data, that is, no utility.
A better course of action is to determine what epsilon will reduce risk to an acceptable level, and select that. However, context matters ‘ in some cases, you may want to nudge the dials more towards privacy, while in others more towards utility. It depends on what you’re protecting, who you’re showing the data to, and what other controls are in place. You want an appropriate epsilon for each context. And of course, your context may change over time.
These epsilon settings also need to be justified. If it becomes public that your customers are protected with epsilon of 5.4, will customers be relieved? Terrified? Will regulators be angry? More likely everyone will be unsure how to feel. You need a principled approach to setting epsilon and a way to justify why the epsilon you’ve chosen is safe in practical terms.
As mentioned, privacy doesn’t come for free. Adopting DP can yield the best defence against many privacy attacks, but it involves limiting statistics and adding noise to them. The loss in utility needs to be worth the gain in privacy, and whether it does depends on context.
Exactly how much accuracy loss are you suffering, and does this matter? For instance, if you are calculating how much to bill your customers, any noise at all is probably unacceptable, because most people want accurate bills. This would be a bad place to use DP.
If you’re okay on the utility front, then consider: exactly how much privacy are you gaining, and does that matter? For instance, if you are showing statistics to a room full of IT staff who have access to the raw data anyway through another channel, you have not gained any real privacy by using DP on your statistics. When considering DP, ask how many people it would really prevent from getting sensitive information.
In some situations, differential privacy yields a needed increase in privacy with no real detriment to utility compared to alternative approaches. The Google RAPPOR project is a detailed, real-world example of this1. It’s in these situations that differential privacy is most valuable.
For more information on Privitar’s differential privacy offerings, contact us.
1 “Learning Statistics with Privacy, aided by the Flip of a Coin’ https://ai.googleblog.com/2014/10/learning-statistics-with-privacy-aided.html
Sorry, no posts matched your criteria.
Our team of data security and privacy experts are here to answer your questions and discuss how modern data provisioning can fuel business growth.