Organizations of all sizes have struggled to accelerate analytics projects and extract analytical value without compromising privacy for a long time. Data scientists can’t get access to sensitive data for analytics quickly, slowing their ability to extract insights from that data. Yet it is imperative that organizations protect the privacy of the consumers in the data. Technology holds the key to protecting the data that is the lifeblood of innovation while maximizing its analytical value.
Most organizations are aware of how important it is to protect data in analytics, and the way they’ve tried to achieve that is by separating it into two broad groups:
Unfortunately, by not using large portions of the data being collected, these organizations risk the opportunity cost of not using that data. Quite simply, organizations must create datasets that are safe to use and analyze without incurring significant risk or losing data utility.
You don’t have to look far to find legislation related to data privacy and data protection: the General Data Protection Regulation (GDPR) dominates the European Union; the California Consumer Privacy Act (CCPA) not only applies in California but has been used as a model for other states in the United States; the Brazilian General Data Protection Law (Lei Geral de Proteção de Dados Pessoais or LGPD) came into effect this year, and China proposed changes to their privacy laws as well. The challenge is understanding all these regulations, how best to comply with them, and indeed, how to demonstrate compliance.
We’ve already touched on the issues of privacy risk and how organizations have handled that — essentially in the simplest way. But, of course, it’s not actually that simple. Many datasets have already been stored, and they’re not available for analysis. An even bigger challenge is knowing what is allowed to be in use, what’s actually being used, and what is completely unused. To be truly effective in data analytics, data scientists and analysts need to access 100% of the data collected. To do that, organizations must build workflows that help them to manage data governance and data de-identification at scale.
Studies have shown that data scientists and machine learning experts spend about 80% of their time generating, preparing, and labeling data. That leaves just 20% of their time for building and training models. While it’s clear that obtaining and preparing data is part of the job and has significant implications on the performance of the final model, this gap between the time preparing the data and building the models is significant. Organizations can save time and money by maximizing their efficiency in the time spent during the data preparation stage.
The analytical value of data is significant. Some believe that data is actually one of an organization’s most valuable assets. Data can help organizations make better business decisions, innovate more rapidly, and increase the effectiveness of their engagement with customers. Yet the barriers created by privacy risks, compliance concerns, inefficient workflows, and complex manual processes that slow access to data have long been a challenge for most organizations. The value of sensitive data can only be maximized when that data becomes more easily available.
By applying workflows and simplifying processes to de-identify data, your organization can make sensitive data available more widely. This data democratization enables you to distribute data to larger groups, who in turn can use that data to find new insights. This kind of data democratization can’t be achieved, however, until you adopt policies and automation that enable you to maintain consistency across datasets and to manage access levels to data.
As work meetings, social interactions, education, exercise classes, and shopping shifted rapidly and dramatically to online environments. While originally many thought this might last just a few weeks, this transition is likely to last well into 2021, and some changes may be here to stay. It has served as a dramatic change catalyst for individuals and businesses alike, pushing many organizations to accelerate their migration to and usage of the cloud, adopting a blend of on-premises, hybrid, public, and multi-cloud environments to meet the new needs of their customers. We’re each creating a lot more data in our everyday lives, and organizations that want to succeed in the future will scale their data-driven services. To do that successfully, safely, and ethically, they’re going to need to think carefully about how to protect sensitive data.