In our experience, organizations often consider anonymization as they seek to use more of the data they hold while managing compliance obligations. We can think about anonymization in terms of two main questions: technical and legal.
The technical question relates to the tools that an organization can use to produce useful, de-identified data. These tools can involve applying controls to the data such as masking, tokenization or generalization, to make it less likely that the data could be used to identify an individual.
The legal question relates to the impact of reducing re-identification risk. Different legal regimes tolerate different levels of re-identification risk. Organizations seeking to anonymize data will often want to take it out of the scope of the relevant data protection regime. The requirement to do that is very different between, say the GDPR and HIPAA.
This blog busts four common myths about anonymization.
Anonymization is sometimes presented as a silver bullet; anonymize your data and then you can do anything you want with it.
The reality? Sadly, it is not. It is a useful legal option, but will only be the right answer some of the time. Effectively de-identifying data, while maintaining sufficient utility and usability for your use case, and accurately assessing re-identification risk, is hard, easy to get wrong, and not always possible.
We often hear the terms used interchangeably or imprecisely. For example, saying that removing direct identifiers is the same as anonymizing data.
The reality? Pseudonymization means processing data such that the data can no longer be attributed to a specific individual without the use of additional information. In practice, this often means removing direct identifiers from the data.
While generally they will not be the same, in some cases, protecting pseudonymized data by placing it into a controlled environment can reduce re-identification risk to the extent that the data is considered anonymous. Variations in environmental controls mean that the same data may be considered pseudonymized in one environment, but anonymous in another.
Sometimes the term ‘anonymous’ is used by those outside of the world of data protection, for example in computer science, to mean data where the risk of re-identification is zero. This is, under some definitions, impossible if the data is also to be useful.
The reality? Crucially, this is not what UK GDPR requires. The law accepts some level of re-identification risk, meaning anonymization is possible. It takes a risk based approach, rather than an absolute one, meaning there can be some potential for re-identification in anonymous data.
A common refrain is that a particular approach will always result in anonymous data.
The reality? This is not the case. Both k-anonymity and differential privacy offer a way of thinking about privacy, they do not prescribe a given level of protection. They both have parameters, k and epsilon, that the organization needs to set. The level of re-identification risk depends on that parameter (and other factors). Evaluating whether the data is in fact anonymous requires looking at what the re-identification risk is for that dataset with that parameter.
Anonymization is complex and not appropriate in all cases. Our new whitepaper “Introduction to Anonymization,” is an in depth analysis of the technical and legal questions surrounding anonymization. We co-authored the paper with Bristows, a highly respected data protection law firm.
The paper focuses on the legal regime in the UK, but the insight on understanding re-identification risk using an attack-based assessment is relevant to anyone working with data in a risk-based legal regime.
Our team of data privacy experts is here to answer your questions and discuss how data privacy can fuel your business.