Whether we call it the Second Machine Age or the Fourth Industrial Revolution, we are entering a period of rapid change and technological advancement. As the steam engine replaced human strength with mechanical power, now digital technologies are doing the same for human intellect. Fossil fuels drove the first industrial revolution, and it is only centuries later that we are starting to mitigate the damage of their by-products. For digital technologies the fuel is data, and the harmful by-product is privacy risk. This time though, we are equipped not just with the engines of growth, but also the tools needed to mitigate the risks from the start.

This month saw the release of two very different reports. One was from the UK’s chief Medical Officer (CMO), Professor Dame Sally Davies. The other was from Flourish Consortium, a multi-sector collaboration set up to advance Connected and Autonomous Vehicles (CAVs) in the UK.

How are Genomics and Autonomous Vehicles (CAVs) connected?

Whilst one related to genomics policy and the other CAVs, they both have strong similarities, sharing comparable triggers, opportunities and challenges. Both are examples of how the benefits of many 21st century technological breakthroughs are contingent upon the sharing of personal data, and why it is therefore crucial to design these systems with privacy protection in mind. 

Both genomics and autonomous vehicles have seen rapid advances in recent years, partially triggered by the rapid advances in data processing capabilities which we’ve come to know as Big Data Analytics.

 Autonomous Vehicles (CAVs)  Genomics
The Opportunity CAVs have the potential to prevent accidents and avert avoidable deaths.

Roughly 1.2m people are killed each year in road accidents, many caused by drivers being distracted, drunk or tired. All causes which won’t affect autonomous vehicles. In addition to potentially saving thousands of lives every day, autonomous vehicles will also be more efficient, reducing environmental damage and potentially boosting productivity. 

“Thedata collected can be used to improve the safety and ability of CAVs, to improve traffic flow and to provide services back to users‘ – Flourish, Insurance and Legal Report 2017

Genomics research presents the opportunity for staggering public benefits, such as diagnosing and treating illnesses.

It is estimated that half of Britons will get cancer at some point in their life. Additionally, there are 8,000 rare diseases (diseases affecting fewer than one in 2,000 people) affecting some 3m people, most of which have a genomic cause and have no current cure. For cancer, rare diseases, and a host of other health issues, genomic testing can ensure early diagnostics, targeted treatments and advances in treatments.

 

The Challenge

Realising these opportunities requires the sharing of personal, potentially highly sensitive data.

For CAVs this includes from the sensors in modern cars, including geolocation data. For genomics this includes highly personal and sensitive genomics data.

linking up of large data-systems containing personal identifiable data, on a scale not previously necessary (or possible), is a prerequisite for success’ CMA Annual Report, Generation Genome, 2017

Privacy Risks The geo-location data stored by autonomous vehicles is highly revealing, as many sensitive aspects of our lives are tied to specific locations, and whom we have met can be derived by comparing location data. The GDPR draws out a few special types of personal data which it considers to be especially sensitive and so places additional processing restrictions. Geo-location data can potentially reveal many of these special categories. Characteristics such as belonging to a religion or political party and sexual orientation are all closely correlated to individual’s attendance at specific locations at given times, such as a church on a Sunday morning. Geo-location data also reveals information not specifically protected but potentially even more sensitive. For instance, comparison of two individual’s locations over time could reveal an affair. Genomics data presents a particularly complex privacy risk, complicated by it’s connected nature, i.e. if I reveal my genomics data, I may also be revealing information on my parents, siblings and children. Privacy harms related to genomics data can manifest in many ways, for instance, many organisations and individuals treat those they know to be unwell or believe are likely to become unwell with prejudice.
The start of something

Whilst the CMO’s drive for greater use of genomics data and Flourish’s campaign for the sharing of CAV data may seem singular, we believe they are symptoms of the emergence of the data economy. In the coming years data driven insights will release advances in AI, biotech, connected homes and many others, offering society life changing new opportunities, but all of these advances will be built on the processing of large quantities of personal data with potential privacy risks.

We should not cease innovating for fear that we may introduce new risks, but nor should we press on blindly and accept the collateral damage caused by privacy harms. Instead we should explore alternative ways of deploying these technologies so that privacy is built in by design and the opportunities can be realised whilst minimising privacy risks.

Driven by the explosion of personal data being captured, shared and analysed, in recent years there has been rapid growth in privacy research in academia and leading technology companies. The field is known as privacy engineering. Privitar’s founders identified that technological controls could be used to mitigate privacy risks in an increasingly complex digital economy with higher volumes of sensitive data.  Privitar provides software products that can be used to optimise data utility with an uncompromising approach to privacy

 

How could privacy engineering help genomics research?

Medical research in the UK is often an exemplar for considered and responsible management of privacy risks. However, some of the techniques and approaches currently used focus heavily on environmental controls to protect privacy. For instance, the SAIL databank achieves anonymisation through a combination of pseudonymisation, carefully checking researcher’s backgrounds and data usage, and other environmental controls. The problem with this approach is that environmental controls do not scale easily. Going through background checks and expert peer reviews for all proposals may not be feasible for a large and dynamic research community. The report leaves open how privacy risks should be managed:

a key question becomes what complementary protections and controls need to be in place such that when people do give their valid (but inevitably imperfect) consent, they are not exploited, discriminated against, unfairly treated and have their privacy unacceptably encroached upon

Statistical anonymisation, leveraging fields such as differential privacy, can scale with ease whilst also providing a greater certainty of privacy protection than environmental controls alone. They also provide a way of matching the appropriate risk to the use case. For instance, if a healthcare practitioner wants to see their patients’ information, then, given appropriate governance and consent procedures are followed, they should be able to access relevant raw data. However, the same level of access should not be given to healthcare researchers, especially if they work for a third party, such as a pharmaceutical company. As the report states:

Much progress is going to require the involvement of commercial and technology partners.’

Private companies have an important role to play in medical research, but, understandably, citizens are usually more apprehensive of private sector research. This can be reflected by controlling what data different researchers are able to access, with different privacy policies controlling the access of different groups. For instance, some policies could require a formal privacy guarantee for access, such as that offered by differential privacy. 

Identifying exactly what the best approach is would require a thorough data situation audit. But what can be said is that it is worth matching the advanced technologies which are creating these risks, with the advanced technologies being designed to mitigate these risks.