Self-service access to safe data
Protect data and manage risk
Analyze conversational chat data
Right data in the right hands
Align control and business use
Controlled access to data
Flexibility, consistency, scalability
Our professional services
Power responsible use
From clinical to commercial
Optimize data tests
Open new revenue streams
Realize the potential of the cloud
Protect data from misuse
Transform your data
Opinion and industry insights
An A to Z of the industry
The podcast for data leaders
Press releases, awards, and more
Staying at the cutting edge
The team behind Privitar
A thriving partner ecosystem
Our story, values, and careers
Dedicated customer assistance
Feb 15, 2017
As more personal information is collected about individuals, the threat to privacy becomes ever greater. New technology brings with it more sophisticated methods for obtaining sensitive information by malicious means.
Protecting your customers’ sensitive data can be daunting. To defend yourself effectively, you need to understand what you’re up against.
Often the most valuable insights in data science come from connecting different data sources. So it should come as no surprise that sensitive data can also be uncovered by linking data. This type of privacy violation is termed a linkage attack.
A linkage attack attempts to re-identify individuals in an anonymised dataset by combining that data with another dataset. The ‘linking’ uses indirect identifiers also known as quasi-identifiers.
Quasi-identifiers are pieces of information that are not themselves unique identifiers, but can become identifying when combined with other quasi-identifiers. For instance, an individual’s date of birth and postcode are quasi identifiers; each one alone is not sufficient to identify an individual, but in combination they usually are. Information such as salary, transaction history, overdraft limit, location data and many others are examples of quasi-identifiers.
Linkage attacks are powerful, because seemingly innocuous attributes often suffice to uniquely identify an individual.
Linkage attacks first hit the headlines in 1997, when Massachussets state group insurance commission (GIC) released hospital visit data to researchers, for the purpose of improving healthcare and controlling costs. William Weld, then governer of Massachussets reassured the public that patient privacy was adequately protected, by deleting direct identifiers.
In response, Latanya Sweeney, then MIT graduate student in computer science brought an electoral role database for 20$. By combining this data with the GIC records, she was able to find William Weid’s personal health records with ease.
To resist a linkage attack, the quasi-identifiers in a dataset must be transformed to achieve k-anonymity. This means even when someone has auxiliary information, each record is still indistinguishable from at least k-1 other records.
This technique is appropriate for ‘rectangular’ or ‘model ready’ data. This form of data is typically used by data scientists and data modellers and is characterised by having one row per data subject with each row having a number of ‘features’ exposed as attributes.
One particular attack is where an adversary knows the details of a specific transaction, for an unusual amount, such as ??123.45. By finding this transaction in the data set they can then identify the individual who made the transaction ‘ this attack relies on knowing the trace value, date, merchant or payee, and it being unusual.
To thwart these attacks, a data set is transformed such that the target values and dates are adjusted but are still accurate enough for useful anaylsis.
Advances in privacy engineering can help protect against these attacks. Privitar enables companies to take a comprehensive approach to privacy, by understanding and minimising privacy risks, whilst maximising data utility.
Sorry, no posts matched your criteria.
Our team of data security and privacy experts are here to answer your questions and discuss how modern data provisioning can fuel business growth.