By Marcus Grazette, Europe Policy Lead at Privitar

Our Data Policy Network event on 1 December took a deep dive into anonymization policy, or de-identification as it’s called in the US. We focused on the situation in the UK, but many of the challenges apply globally.

UK policy on anonymization, and on data use more broadly, is changing: the ICO is consulting on updated guidance to replace their 2012 code and DCMS is also consulting on broader data protection reforms.

Changes in the UK are important, but the UK is not an island when it comes to anonymization. Changes in Europe also matter, particularly for organizations operating across both jurisdictions. The European Data Protection Board is also due to publish new guidance on anonymization and pseudonymization in 2022. We’ll have to wait to see whether the strict test in the 2014 Article 29 Working Party guidance will change, and what impact jurisprudence (for example, the European Court of Justice judgement in Breyer) will have.

This blog will explore why anonymization matters and argue for a pragmatic approach to anonymization, using an example from health research to illustrate the practical impact that taking a pragmatic approach could have.

Does anonymization matter?

It can be tempting to skirt around anonymization by arguing that data protection law offers a choice of routes to data, allowing organizations to choose the appropriate route for their intended use case. Anonymization, that argument goes, is only one possible route, so it doesn’t matter that it can be difficult to use.

We disagree for three main reasons:

  1. Although some use cases may be possible within the perimeter of data protection law, it may be significantly more efficient to achieve them outside of that perimeter, without a significant impact on individual rights and freedoms. The efficiency comes from removing the data governance requirements that would attach to personal data.
  2. Determining whether data is anonymous relies on a risk assessment. The organization must assess the re-identification risk to individuals from whose personal data the de-identified dataset was derived. Risk assessments also underpin Legitimate Interest Assessments and Data Protection Impact Assessments. So a clear, robust approach to risk assessment is important to ensure responsible data use.
  3. Anonymization defines the scope of data protection law. As commentators like Nadezhda Purtova argue: without clear boundaries, data protection risks becoming the “law of everything.”

We’ve seen a clear ambition from policymakers to encourage data use, including for secondary purposes like research and AI. The UK’s National Data Strategy and AI Strategy are both examples. We believe that a pragmatic approach to anonymization, supported by guidance, could offer an efficient way to achieve this ambition without a significant risk to individuals.

What does a pragmatic approach to anonymization mean?

A pragmatic approach requires two main elements: (1) assessing risk based on the context in which data is used into account, not just the risk posed by the data itself and (2) accepting a non-zero risk of re-identification.

The presentation from Guy Cohen, Prvitar’s Head of Policy,  on common re-identification techniques explained why these two elements are important: 

  1. We’ve seen several successful re-identification attacks against publicly available data. From Professor Sweeny’s linkage attack to model inversion, differencing attacks or reconstruction attacks. Data transformations alone may help to defend against these attacks (for example, generalizing data to achieve k-anonymity, or by adding noise). However, the extent of the transformations can reduce utility, making some use cases (research as an example) difficult or impossible. Using data transformations in combination with controls on the context in which data is used is much more powerful. We’ll see some examples of these controls below.
  2. The fundamental law of information recovery essentially states that we cannot release novel, true data without changing an attacker’s ability to make inferences about individuals. If data that supports any inference is personal, and we can only assess risk on the data itself (not the context), then any data that is useful for research (because it’s novel and true) must also be personal.

How does this work in practice?

An example involving health research illustrates this point. Imagine a UK researcher working with patient data. Her research proposal was scrutinized, ensuring that it was of a high scientific value, met ethical requirements and used robust methodologies. She’s been trained in confidentiality and data protection issues, and is contractually barred from attempting re-identification. She may be working with de-identified data in a Trusted Research Environment (TRE) or in a managed IT environment.

In our example, a pragmatic approach to anonymization could mean that the data is anonymous in the researcher’s hands. The controls on the context, taken together with any data transformations, reduce the risk of re-identification. However, re-identification risk is not zero; the researcher can still make inferences about the cohort. But the risk is probably low enough to cross the legal threshold for the data to be considered anonymous.

However, uncertainty about how to apply this pragmatic approach in practice poses challenges for organizations and researchers, for example:

  1. A risk-averse organization may choose a “belt and braces” approach, in other words applying more controls than are actually needed. This matters because:
  • Applying controls restricts how the researcher can use the data (as an example, allowing one method of analysis but not another) and can reduce utility. Both could have significant impact. They may mean that less research is undertaken—and, if using lower utility data, the research may be less insightful. Both could have consequences for health outcomes.
  • Applying controls also takes time. Agreeing contractual terms between legal teams or requiring background checks on researchers can add weeks or months to a project. In a research context, with Ph.D. deadlines or funding milestones, delay can cause significant disruption.
  1. The current “motivated intruder” test assumes a case-by-case approach, making it difficult to scale. The pandemic illustrates this challenge, with some research governance teams facing dramatic increases in the number of requests for access to data.

Clarity on anonymization can enable organizations to optimize their data governance overheads, by applying only the controls necessary to ensure that data crosses the legal threshold from personal to anonymous. 

We’ll publish further thoughts on what the guidance needs to provide in the coming weeks, as a part of our engagement with the ICO on their consultation.

For more analysis and insights, sign up to our Data Policy Network newsletter.