Privacy isn’t just about the data itself. It’s about the whole data process – what happens to the data before it’s delivered, and who accesses it afterwards. But these two parts of the process are often disconnected, and this means data isn’t traced throughout the entire data pipeline. At In:Confidence 2019, Privitar’s Lee Bonham showed us three ways to help bridge this gap in the pipeline.
Ensuring privacy isn’t just about applying the right techniques to the data itself. It’s about ensuring traceability throughout your whole data process, from who’s requesting what data, to where it’s being kept, who can access it and for how long.
But there’s often a huge gap in an organisation’s data pipeline – splitting apart the initial request and the data provisioning.
A gap in the process
To help explain, let’s look at how data might flow through an organisation’s pipeline.
It starts with the initial data request. Following that, data governance assesses whether the request is appropriate, and whether the data is available. If it’s agreed, the data is then checked to see if data privacy laws apply to it – or whether there’s a policy that already exists for the data - and a specification then sent to IT to provide it.
Here’s where the problem arises - the data has to go through a range of tools as part of the provisioning process. For example, you could be modelling data through Hive (a data warehouse system for Hadoop), using Pig as a scripting language to create the data policy, and then another environment to generate the delivery.
But as the data goes through each tool, it’s usually not being tracked, so there’s no way to trace where the data has been. This means at the end of the process, there’s no audit trail to check that the data marries up with the requirement that was created at the beginning of the process.
So how can you bridge the gap?
Bridging the gap
There are three elements of a privacy platform that are essential in bridging the gap between initial request and data provisioning, ensuring you have full visibility across the data pipeline.
1. Zero-code privacy policies
Zero-code is a better way to design, store and audit privacy policies.
Using this method, those responsible for the data – whether it’s data governance, lawyers or legal teams – can create policies themselves. This removes the risk of miscommunication between business users and coders.
2. Protected Data
Once you’ve applied the right policies, Protected Data Domains can help you define, track and control the data.
The great benefit of using Protected Data Domains is that all the metadata stays attached throughout the process. IT can see the original request, who made it, when it was authorised, what restrictions the data has, and where it’s being used.
3. Process automation
Process automation is an easier way to control the data pipeline process and link policies to the data assets.
The source data is taken, the right policy is applied, and the data is delivered to the Protected Data Domain is.
Using these three processes, not only do you get a faster, more scalable process – you also get a rich audit trail of who’s requesting the data, what data they’re requesting, who authorised it, and what restrictions it might have.
Visibility, accountability and control
Using these three techniques, it’s possible to effectively bridge the gap in a data pipeline process – and gain a whole new level of visibility.
That visibility is invaluable, not only to preventing privacy fails, but to diagnosing them, and responding quickly and appropriately in the event of a breech. What’s more, it makes it much easier to ensure the data you’re providing matches the data that was originally asked for.
Privacy isn’t just about the data, it’s about the process the data belongs to – so it’s time to take more control.