Home Blog How to create a safe data pipeline for sensitive data in the cloud How to create a safe data pipeline for sensitive data in the cloud Dec 04, 2020 by Tom Kennedy, Director of Cloud Partnerships at Privitar If it wasn’t already, data is quickly becoming a strategic asset for every organisation. Data is coming from new sources, is quite diverse, and is growing at an exponential rate — and organisations are looking to that data to create more value. Creating business value from data requires organisations to convert data into actionable insights, typically through the analytics and artificial intelligence/machine learning (AI/ML) capabilities within the organisation. We see organisations moving to cloud data lake architectures frequently, in order to ingest, store, and catalog data at scale. There, data scientists and business analysts can access that data using their choice of tools and frameworks, including cloud analytics and ML services. The benefits derived from these architectures presuppose that all valuable data within the organisation is available and accessible on-demand for data users. Regrettably, this is rarely the case for organisations that handle sensitive data in large quantities. What are CSPs responsible for? Under the shared responsibility model that most cloud service providers (CSPs) use, the CSP ensures the security of the overall platform but the customer remains responsible for the data — including the privacy of information relating to individuals within that data. Heavily regulated industries, such as financial services, insurance, and health care, have tightly governed and constrained use of this data to manage risk and compliance related to the management and use of personal information. Any organisation seeking to leverage the benefits of a streamlined cloud data architecture for analytics and ML must take these challenges into account, but it’s also a problem for organisations looking to ingest sensitive data into the cloud or to share sensitive data with external partners or business units in different geographies. Because of these constraints, much value is unrealised, because data assets that are personal or otherwise sensitive cannot be easily, broadly, and efficiently accessed from a data lake and used by business analysts and data scientists. A safe data pipeline to the cloud To realise the promise of the data organisations are collecting using cloud technologies, businesses must implement and automate a “safe data pipeline” into and around the cloud. This enables data scientists and business lines to access safe, usable data — quickly. Many organisations have large, complex, or siloed data estates, so for those organisations, an important first step is knowing what data is available and where the sensitive data is, which can be achieved with data discovery and cataloguing systems. Any raw data must be treated as high risk until the scope of any personal information it contains is understood. It is critical to maintaining tight access controls on this data. Once you have located and properly catalogued your sensitive data, It’s important to apply privacy transformations to the data itself. Only a contextual combination of pseudonymisation, minimisation, and generalisation techniques allow you to strike the right balance between the privacy and utility of the data. These transformations can be applied to the data before migrating it to the cloud, on its way into the cloud, or once it is in the cloud. You can then make the data widely available for use from a “de-identified data lake” or any other cloud data repository. Key principles In implementing your safe data pipeline, here are a few key principles your organisation should adhere to: The safe data pipeline should flow through all data sources and environments, from on-premises systems to the cloud and across cloud platforms to enable hybrid and multi-cloud approaches.Privacy controls must be applied consistently across the entire architecture to enable safe data use at scale by multiple users simultaneously.The technology enabling the safe data pipeline should integrate seamlessly with cloud security tools.The whole process should be automated to enable immediate access to safe data. These high-level requirements serve as building blocks to ensure that your organisation can move your sensitive data into the cloud safely and rapidly. Once there, you can use advanced cloud services while complying with data protection laws and protecting the privacy of your consumers, patients, or citizens. Don’t leave value on the table — protect sensitive data There’s no excuse for leaving value on the table by limiting the use of your organisation’s sensitive data. Put privacy at the heart of your approach to cloud adoption and you can unleash the power of your data — and innovate and generate revenue safely and efficiently while remaining compliant. Learn how we’re working with AWS to empower organisations to use sensitive data to gain valuable insights and support data-driven decisions. Watch the video. A version of this post was originally published on techUK. Cloud Data Privacy Safe Data Pipeline Sensitive data