What is the Difference Between Tokenization vs. Encryption?

January 29, 2021

by Nilesh Parmar, Senior Privacy Engineer at Privitar

Hang on, why are we talking about the differences between tokenization and encryption?  Don’t they both do the same thing?  At the end of the day, they both just “scramble” the data so it’s unusable, right?  Well, yes… and no. In this introductory post, we’ll be looking at what tokenization and encryption are, when to use them and why you would use one approach over the other. 

What is Encryption?

When you encrypt data, you are “locking” your sensitive data in a way that only authorized parties can “unlock” it and see the actual values. The only way you can unlock that data is with a secret key. Whilst the data is encrypted, it’s basically unusable. The encrypted result is known as “cipher text.” For example, if I were to encrypt the name “John Smith,” depending on the encryption method used, the result could look like this:

John Smith = kMgJDrggu5099aycVkUMh3Vk+v5KpR9Vlj5nFRQ2nZc=

You don’t have much control over what the output looks like, which means that you can’t use this encrypted value for anything meaningful. But that’s the whole point!

The thing about encryption is that it can be reversed. It is designed to be reversed, so that an authorized person can view and use the raw, sensitive data.  In the wrong hands, if someone wants to try hard enough, they can crack the encryption and get to the sensitive data. That’s why there are a range of encryption algorithms from simple to very complex, to suit your needs.

What is Tokenization?

When you tokenize your sensitive data, you are protecting it in a way that means you can still use it after the fact… and you have the ability to choose what that result looks like. For example, after tokenizing “John Smith,” it could look like this:

John Smith = J Smith

John Smith = Jxxx Sxxxx 

John Smith = Henry Ford

Tokenization can provide you with de-identified data that can still be used for analytics, machine learning, data sharing and many other use cases.The output can be completely disconnected from the original sensitive value.

But does that mean encryption is an inferior data protection method to tokenization?  No, it doesn’t. They are simply different methods for protecting your sensitive data, intended for different purposes.

When to Use Encryption

Data can be encrypted in motion or at rest. For example, if data is being sent outside of your network, you may want to encrypt it. If the data gets intercepted between the source and destination, the “hacker” won’t be able to view the contents of that data unless they have the right keys.

It’s also common to encrypt data when you have data sitting in data lakes or some data storage medium at rest. This will ensure that only authorized personnel/systems that have the right keys, will have access to that data.  

It’s worth bearing in mind that encryption certainly has its place in helping to protect data, but once the data is decrypted, you are no longer protecting the privacy of the data subject. Decrypting data means that you get to see all the raw, sensitive data values in all their glory!  

When to Use Tokenization

If you want to be able to use your sensitive data for reporting or analytical purposes, or share data in a way that ensures that even if it is intercepted or leaked, the privacy of the data subject will still be respected, then tokenization may be the right method for you.

In order to derive insights out of your data, you need to share it! That may mean sharing it with other teams and departments who can analyze and make use of that data. Sharing a tokenized value will allow you to use that data, even though the data subject has been de-identified. This is illustrated in the example below:

Analytics, reporting, and machine learning are just some examples in an enterprise where you may need to share data to be able to use it. Those are also situations where the end user needs to use as much of the data as possible to do their day job, but they don’t necessarily need to be able to see the raw values themselves.

Tokenization preserves the feel and format of the data (keeping it statistically relevant), and also preserves the privacy of the data subject, allowing the data to be utilized more freely throughout your organization.

Summary

Threats to your data can come from within or outside of your organization.  Security solutions can address who gets in, in the first place, but a data privacy solution will provide a complementary layer of protection that ensures you have a much greater level of control over what can be seen. Both are incredibly important and necessary to protect both data and privacy and to keep your data both safe and usable. You should consider how you will be using your data, before deciding which method to use in an individual situation.

Interested in learning more? Check out these blog posts for deeper dives into encryption and tokenization or read Data Privacy 101 for a detailed guide to de-identification methods. 

 

Data AnalyticsData EncryptionData PrivacyDe-IdentificationTokenization
Privitar