Self-service access to safe data
Protect data and manage risk
Analyze conversational chat data
Reduce the time and cost to comply
Right data in the right hands
Align control and business use
Controlled access to data
Flexibility, consistency, scalability
Our professional services
Power responsible use
From clinical to commercial
Optimize data tests
Open new revenue streams
Realize the potential of the cloud
Protect data from misuse
Transform your data
Opinion and industry insights
An A to Z of the industry
The podcast for data leaders
Press releases, awards, and more
Staying at the cutting edge
The team behind Privitar
A thriving partner ecosystem
Our story, values, and careers
Dedicated customer assistance
Jan 08, 2019
Here you can find a short video interview with Dr. Pierre-Andre Maugis, Research Scientist at Privitar, in which he talks about some of the key challenges connected to hashing as a privacy technique.
Why do people use hashing?
Many people in the industry suggest hashing as a reliable privacy solution. There are three main reasons for this:
So, how can hashing still leave data exposed?
The trick is that even if you can’t reconstruct the identifying information from the hash, what you can do is build a dictionary. Building this dictionary is a two step process: First you make a list of all possible identifying items; then you apply the hash function to all items in the list, and build a list of all the corresponding hashes. The result is a dictionary: give me a hash, I can find it in my list, and then tell you the identifying information that it came from.
There are two problems with building a dictionary. First, how do I know which hash function to use? Second, how long does this take? For the first, the fact is that there are only a few good secure hash functions available, so one can try them all. However, it is even sometimes possible to guess the hash function being used through the length and structure of the hashes.
As for the time it takes, well, the larger the range of different values the identifying information can take, the longer building this dictionary takes. However, hash functions are fast, and unless there are billions of possible secret values at least, building the dictionary will not take an intractable amount of time.
What should people be doing instead?
All good solutions require a secure secret, which consists of either a key, a salt, or a token vault (a map of identifiers to random values). This secret is protected by strong security. As there is no way to avoid relying on a secret, encryption or tokenisation is what we recommend: either encrypting the identifying information itself, or keeping a secret list where the identifying information is mapped to the tokens. Doing this will give you better privacy and the same, if not more, utility than simple hashing.
Our team of data security and privacy experts are here to answer your questions and discuss how modern data provisioning can fuel business growth.