Self-service access to safe data
Protect data and manage risk
Analyze conversational chat data
Reduce the time and cost to comply
Right data in the right hands
Align control and business use
Controlled access to data
Flexibility, consistency, scalability
Our professional services
Power responsible use
From clinical to commercial
Optimize data tests
Open new revenue streams
Realize the potential of the cloud
Protect data from misuse
Transform your data
Opinion and industry insights
An A to Z of the industry
The podcast for data leaders
Press releases, awards, and more
Staying at the cutting edge
The team behind Privitar
A thriving partner ecosystem
Our story, values, and careers
Dedicated customer assistance
Oct 22, 2021
So how can an organization build out safe data pipelines, decide which methods of protecting data are right for their organization, and get started or advance in their use of safe data for analytics and insights?
How do you make data safe for analytics? This is one of the most common questions I hear, and at first glance, it seems quite simple…but the answer is actually quite complex.
First of all, you have to make sure you understand the big challenge. Data analysts need to be able to receive and use data sets so that they can work their “analyst magic” on it and return with results that their businesses can effectively use to make decisions.
For example, if you are working in healthcare and you need to analyze data to find potential cancer patients, you’d likely be looking at information on the patients, their procedures, complaints, and other sensitive, personal information. But providing raw data, and allowing analysts to have direct access to personally identifiable information in that data set is a huge breach of ethics and regulations.
So how can we give the analysts the data they need without compromising the privacy of the subjects of that data? That’s where data privacy comes in.
With data privacy, we can make sure that the only things that you can see are, in fact, the things that you absolutely NEED for your job, for a particular analytical scenario. In the example mentioned above, you don’t necessarily need to see full patient names, as that is unlikely to affect what you’re trying to do.We need to create a set of data that has the information you require and nothing more. Everything else is removed or blurred and transformed into a protected form so that it’s safe, keeps you within regulations, and compliant.
We’ve got to walk that fine line between what we call utilization, and the privacy of that data, so that there is enough information to be useful for analysis, and removing sensitive parts of the data exposed to risk.
But the trick is what do you keep? What do you blur or remove or redact from each data set to make sure that you can still do your job, as a data analyst?
There are many, many different methods to blur data in different ways, or to change it to prevent different types of attack or re-identification– from data masking to k-anonymization or linkage attacks. Each use case is going to be different and the best method is going to change based upon the use case.
Using the right method for a use case is critical to making sure you get the best results from the protected datasets you are going to share with your analyst.
Let’s go back to the example of the set of healthcare data. The problem is that if I gave you the raw unprotected data, you could identify actual people and know things about them that you shouldn’t. Some people may decide to use that information against them. We need to protect against that, so we would “de-identify” that data set – remove the personal identifiers from it.
There are different types of identifiers, including direct identifiers (e.g. a social security number or passport number) which are unique to an individual and directly identify them.
But there are also other types of information that may not directly identify you, but if you combine a few pieces of information, might start painting a picture of who you might be, which can then be narrowed down enough to identify a person. For example, if you had my first name, birthday, cars I drive, postal code, and the school I went to, suddenly you have narrowed down the number of people I could be. None of these pieces of data directly identify me, as there are others with my birthday and same first name and so on, but they help indirectly identify me when put together.
We need to manage both these types of PII or personally identifiable information and protect against re-identification.
If we can’t simply remove the data from the provided dataset because an analyst needs it for their work, we can blur the data in different ways. We could:
And that’s just the simple stuff – the direct identifiers!
For the indirect identifiers, we need to apply protection that stops you taking these 3 or 4 pieces of data and filtering them through the vast data sets that are out there (e.g. using LinkedIn and Facebook) and re-identifying you. We call this linkage attack protection or K-Anonymisation.
So how to decide which are the right methods of data protection to use? And who should decide? Enter the data guardian.
In many companies, there is at least one person that is tasked with finding the right protection methods to allow the data to be used safely for analytics. We call that person the “data guardian.”
Data guardians need to determine the most appropriate sets of rules for individual use cases as efficiently as possible, to protect the identities within the data as well as the company. In addition, they need to understand the relevant regulations that apply to their industries and in their locations, to make sure that their company complies effectively.
They create policies, or sets of rules, that apply to specific use cases and regulations so that they are saved for future use against any other similar data structures, and become an easily reusable asset that can simply be applied to similar data sets in the future. Policies allow you to understand how to be as compliant as possible, as quickly as possible. Using and reusing policies allow even advanced data protection techniques like k-anonymization or linkage attack protection to be applied both appropriately and quickly.
Remember, this is all done so that you can achieve the maximum value from your data analysis, without holding up access to that data unnecessarily. If you’ve got access to a system that can allow the data guardian to find and even apply complicated data policies quickly, it all gets so much easier for everyone. Data can be leveraged safely, more broadly, more effectively, and more efficiently– a huge win for everyone involved!
Our team of data security and privacy experts are here to answer your questions and discuss how modern data provisioning can fuel business growth.