You don't need a PhD to crack bad anonymisation

By Jason McFall - July 14, 2017

At Privitar we’re growing fast, and that means a lot of hiring. It also means that I get about 10 spam emails a week from recruitment agencies pitching for business. Now there are some excellent recruiters with high integrity, and we work a small set of great ones, but the industry has earned a bad reputation. A common tactic is to send out ‘chair through the window’ emails with unsolicited CVs without the candidate’s approval - I hate this, and think this is highly unethical.

Often the recruiter ‘anonymises’ the CV by stripping off names. As CTO of a Privacy Engineering company, I view this as a challenge!
 
I received two CVs last Thursday where the recruiter had taken more care than is usual - I’d like to believe this means he’s a conscientious and decent guy at heart and is being made to spam me by his evil boss, so I’m not going to name and shame him. Every name in the CV had been carefully replaced throughout the narrative - the candidate had studied at UCL, which was anonymised to ‘a top 10 UK university’, while poor University of Kent only made it to ‘UK university'. Similarly their past employers were renamed to ‘Innovative Startup’, 'Global Bank', and so on. Sadly the recruiter didn’t think - or know how - to generalise the candidate's description of his PhD research, about a very specific aspect of the immune system. So all I had to do was google that sentence, find the research paper with that exact title (first result on google), google the first author of the paper, and find a clear match on LinkedIn. Took maybe 30 seconds. 
 
The other candidate hadn’t done a PhD, but had written a nice summary of their career interests. So all I had to do was pick out a nice pithy phrase from the description, google that, and bang, they’d used the same sentence in their LinkedIn profile. Another direct match. Took 10 seconds.
 
Both of these are very simple but real examples of how apparently innocent non-identifying information can be used to link to another data source, often in the public domain, to reidentify an individual. These privacy breaches are called linkage attacks (rather pleasingly, since LinkedIn was the dataset I used in this case). 
 
Even if you’re not in the market for a job in advanced data science or privacy engineering, think hard about how much information about you is in the public domain and can be used in linkage attacks. And if your job is to protect data, think beyond page 1 - just stripping off primary identifiers is not enough.
 
I contacted both candidates on LinkedIn to let them know this is happening, and gave them the email address of the perpetrator. Who knows, maybe they’ll be interested in Privitar and apply to us directly. Wouldn’t that be funny? And it might help to change bad behaviour in the recruitment industry.