Why Data Privacy is Indispensable in the Move Towards Cloud Adoption

By Tom Kennedy, Director of Cloud & Technology Partnerships at Privitar

 Amidst today’s general climate of uncertainty, one thing at least has become clear: the effects of the Covid-19 pandemic are accelerating a number of global tectonic shifts which were already underway, from de-globalisation, to automation of processes and robotics, and transformation enabled by new technologies.

These trends all pose existential questions for businesses in the immediate and long term. This post focuses on two important facets of accelerated technological adoption: data privacy and the move to the cloud. We explore how they are entwined and the issues businesses must successfully navigate to get on the front foot to position for a new future longer-term. 

The first aspect of this broader trend concerns the expanded use of personal data by organisations to derive insights and develop products. This is not a new phenomenon, however with the explosion of innovative energy directed at the development of technologies to fight COVID-19, we are seeing that many of these tools – from contact-tracing apps to mass healthcare data analytics – are not uncontroversial in their relation to the right to privacy. My colleague Guy Cohen has outlined these issues in detail, and the sorts of questions that we need to consider as we adapt to the demands of a new reality.  

The second is the accelerating adoption of cloud. Prior to COVID-19, businesses were already retiring their own data centers and maintained IT infrastructure in favour of renting computing, storage and databases from Amazon (Web Services), Microsoft (Azure) and Google (Cloud Platform), taking advantage of reduced costs, unlimited scalability and flexibility that the cloud offers. Beyond that, more forward-looking and disruptive companies are looking to utilise powerful cloud analytics and machine learning technologies to innovate and push boundaries.

Aside from the stark imperative to rationalise costs, the necessities that COVID-19 have imposed on business are forcing more flexible ways of working – from content and network streaming, sharing and co-editing documents across distributed teams, and real-time access to analytical insights from extraordinary volumes of data –  and all of these business-critical activities rely on cloud technology.  

Underlying this remains the fact that the lifeblood of modern business is data, which is pumping into and around the cloud on an ever-increasing scale. And when businesses which collect, manage and use vast amounts of personal or otherwise sensitive data, for example in financial services, healthcare & pharmaceuticals, retail and telecommunications, want to leverage cloud technology fueled by that data, privacy becomes a critical issue.

First, there’s the level of risk that businesses are exposing themselves to using raw data containing information relating to individuals for analytics and other secondary purposes. The privacy risks inherent in sensitive or personal data exist whether the data resides in the cloud or not. However the more data you’re dealing with, and the more you are looking to do with it, the greater the privacy risk you’re exposed to. The consequences, in terms of data breaches, regulatory fines and damage to brand trust and reputation are already sufficiently serious that they are a top boardroom concern, and don’t need repeating here. 

The second issue around using sensitive data in the cloud relates to an important distinction between privacy and security, and a more nuanced understanding of where responsibility lies for data in the cloud. Since its inception, cloud computing has been dogged by the notion that it is less secure than on-premises systems, because control of it is somehow “out of your hands”. Due to huge investments in the depth and scale of security resources, today the security of cloud platforms is as good, if not better, as most on-premises systems. Extensive capabilities around logging & auditing, identity & access management, network protection, compute protection, and encryption of data at rest in databases & storage all help achieve data security goals of confidentiality, integrity & availability.

Securing data and preserving privacy, although complementary, are two different things. Both are necessary to access and use data safely. Properly securing data ensures that access is limited to authorized users. Unfortunately, however, most data misuse occurs when authorized access is used inappropriately or compromised, whether through an insider threat or stolen credentials. 

Preserving privacy, on the other hand, means protecting the data subject. For example, even if the authorised individual accessing the data misuses it, maliciously or by accident, the identity of those in the dataset is still protected. 

It is here – in data privacy – where responsibility passes back from the cloud providers to the customer. Each of the three main cloud providers, Amazon Web Services, Microsoft Azure, and Google Cloud Platform, employ a “shared responsibility model” to define the split of responsibilities between the cloud providers and the customers using their services. 

For all three, responsibility for the customer’s data sits with the customer. 

AWS:

Microsoft Azure:

Google Cloud Platform:

To be clear – this isn’t because the cloud platforms are shirking responsibility. Rather, it relates to obligations set out under GDPR and other data protection laws: whereas the cloud platforms are the “data processors” a cloud customer would be viewed as a “data controller” if they determine the purposes for which and the manner in which the data is being processed. This means that the customer remains responsible for the processing of personal data, even if the processing takes place in the cloud. So although AWS, Microsoft and Google all provide extremely secure cloud platforms, customers still must take steps themselves to ensure the privacy of the data itself is protected.

 So what should businesses do to manage this challenge as they speed towards cloud adoption? It’s an urgent and critical issue, but is it clear how to manage it?

 To realise the promise of safe, usable data leveraging cloud technologies, businesses need to implement and automate a  “safe data pipeline” into and around the cloud. This will enable safe, quick access by data scientists and business lines that need it.

For many organisations with large, complex or siloed data estates, an important step is first  knowing what data they have and where their sensitive data is, which can be achieved with the help of data discovery and cataloguing systems. Raw data should be treated as high risk until the scope of any personal information it contains is understood. It is critical to maintain tight access controls on this data.

Once you know where your sensitive data is and have properly catalogued it, the next step is to apply privacy transformations to the data itself. Only a contextual combination of pseudonymisation, minimisation and generalisation techniques will allow the right balance to be struck between privacy and the continued utility of the data. These transformations can be applied to the data either before migrating it to the cloud, on its way into the cloud, or once it has landed in the cloud. The data can then be made widely available for use from a “de-identified data lake”, or from any other cloud data repository. 

In thinking about how to implement this, there are a few key principles that businesses should adhere to:

  • The safe data pipeline should flow through all data sources and environments, from on-premises systems to the cloud, and across cloud platforms, to enable hybrid and multi-cloud approaches. 
  • Privacy controls must be applied consistently across all the entire architecture, to enable safe data use at scale, by multiple users simultaneously.
  • The technology enabling the safe data pipeline should integrate seamlessly with cloud security tools.
  • The whole process should be automated, to enable immediate access to safe data.

 

These high-level requirements will serve as the key building blocks for ensuring that organisations can safely and quickly shift their sensitive data into the cloud, then utilise advanced cloud services while complying with data protection laws, and ultimately protecting the privacy of their own consumers, patients or citizens. 

In today’s environment there is no excuse for organisations to be leaving value on the table from their sensitive data. Putting privacy at the heart of their approach to cloud adoption, businesses can unleash the power of their data to innovate and generate revenue safely, efficiently and while remaining compliant.

N.b. Privitar proudly partners with Amazon Web Services, Microsoft Azure and Google Cloud Platform to help their customers maximise the value of their sensitive data utilising cloud services.

Interested in learning more about how to leverage your sensitive data in the cloud? Check out this webinar featuring experts from AWS and Privitar. 

Pseudonymization 101

by Shih Huei Tan, Solution Architect at Privitar

Editor’s Note: Privitar is launching a new series focused on demystifying some foundational, but often mis-understood, elements of data privacy. This week, we’ll explore pseudonymization. Each week, we’ll dig into a new topic, defining key terminology, explaining why it is important, and how you can implement it as part of your data privacy efforts. We’ll also provide some real life examples to demonstrate the concept in action, and help readers think about use cases that they can put into practice.

What is Pseudonymization?
One of the techniques used to de-identify data is called pseudonymization. When using pseudonymization, sensitive data fields are replaced with pseudonyms to hide the identity of the individuals. Consistent pseudonymization allows identical pseudonyms to be applied to the same individual throughout the dataset. This is very useful in longitudinal studies, or for other purposes where it is necessary to link data collected at different times relating to the same data subject (the customer in this situation). Pseudonyms can also retain the structure of the original data so that the format is retained and may be useful under some circumstances. 

Why Pseudonymization Matters
Data is a valuable resource to many organizations and essential to many data driven initiatives ranging from improving customer service, driving more effective marketing campaigns, enhancing healthcare delivery, improving customer service and organizational excellence.  

Often, data that is used for these purposes may contain personally identifiable information, or primary identifiers, of customers (e.g. names, email addresses, phone numbers, social security numbers, passport numbers). These are attributes that can direclty identify a person due to the nature of the information. There may also be secondary identifiers within the data that when used in isolation, may not reveal the identity of a person, but when coupled with other data points, re-identification can happen (e.g. birthdays, addresses, salary, age, job title and gender). For example, if you have an employee dataset which contains a person with a job title of Chief Executive Officer, that person’s identity will be quite obvious just based on that information without even looking at the primary identifiers.

So how do organizations use sensitive data and ensure that sufficient safeguards have been put in place to protect privacy, and also keep compliant with data protection regulations? 

Putting Pseudonymization Into Practice
Let us take the example of a bank that wants to analyze customer spending patterns over the month of June to determine their high value customers. In order to do this, they will need to use the customer transaction dataset. By looking at the dataset below, you will notice that it contains personally identifiable information such as the names, account IDs and email IDs. The analysts working with this data do not need to view these sensitive customer details in order to perform their tasks and can expose the bank to unnecessary risks and compliance issues by sharing that information. This is where pseudonymization comes in.

S/NO

Name 

Account ID

Email ID

Transaction Value 

Transaction Date

1

John 

AC4481245

john@gmail.com

59.45

05/06/20

2

Jenny 

AC1114455

jenny@hotmail.com

12.50

07/06/20

3

Tom 

AC1214445

tom@emal.com

9.50

11/06/20

4

John

AC4481245

john@gmail.com

52.50

13/06/20

5

Brian

AC4545553

brian@outlook.com 

18.50

15/06/20

6

John

AC4481245

john@gmail.com

34.50

18/06/20


De-identifying Data Through Pseudonymization
Below is an example of the same dataset that has been de-identified. Customer names have been pseudonymized to a string of 7 random characters so that the original names are no longer visible. Account ID and email fields have been pseudonymized consistently and therefore John (in records 1, 4 and 6) has the same values assigned to every occurrence of his record.  This will allow the analysts to find out the total transactions made by each customer because the data can be grouped together and summarised based on the account or email ID. The format preserving pseudonymized email addresses also makes it very easy to recognize that the column contains emails of customers without having to refer to the column headings.

S/NO

Name 

Account ID

Email ID

Transaction Value 

Transaction Date

1

DFJFSDF

X321343T

idrshdy@gmail.com

59.45

05/06/20

2

LKGJSHF

C125100C

jfhstey@hotmail.com

12.50

07/06/20

3

LGKKGJD

F454587T

kfjdhsh@emal.com

9.50

11/06/20

4

FKDHWDD

X321343T

idrshdy@gmail.com

52.50

13/06/20

5

FKSJFJD

F454587T

ofhstfj@outlook.com 

18.50

15/06/20

6

HSYGJEX

X321343T

idrshdy@gmail.com

34.50

18/06/20

Based on the scenario outlined above, we can see how personally identifiable information within the customer dataset has been de-identified through a process of pseudonymization. We have the option of applying it randomly or consistently, as well as making the pseudonyms retain the original format, as in the case of the email addresses.

Pseudonymization allows the privacy of the individuals within the dataset to be protected by obfuscating the identifiers, but also ensures that the information retains its utility, and enables  the data analysts to extract the necessary insights for analytical use cases.

Want to learn more about how pseudonymization and other forms of de-identification can help you keep your data safe and usable? Check out Privitar’s Complete Guide to Data De-Identification.

Training Data: A Focal Point for AI Regulation?

By Marcus Grazette, Europe Policy Lead at Privitar

AI and machine learning (ML) technologies are helping people do remarkable things and becoming more widely used. With the increased interest in using these technologies, there is also increased regulatory interest in AI and ML, with regulators increasingly engaging proactively with industry to help shape thinking. The Information Commissioner’s Office (ICO)’s Project ExplAIn, on AI explainability under the GDPR, and the European Commission’s White Paper on AI, a proposal for the future of AI regulation, are recent examples. 

Linked to Project ExplAIn, the ICO recently hosted a workshop on data minimisation and machine learning. Privitar participated alongside four leading technology companies including Google and Facebook. The challenges relating to data minimisation and machine learning are well documented and I’ve argued in a previous blog that applying effective data minimisation can improve the ML development process. 

But the recent workshop considered a different angle – how could an organisation demonstrate compliance with the data minimisation principle when running a ML project? The GDPR’s accountability principle requires that organisations are able to demonstrate compliance, meaning that it’s not enough to comply with data minimisation, you also have to be able to demonstrate that you have complied.

That brings us back to the data. A machine learning model looks for patterns in input or training data and applies those patterns to new data in order to make a decision – which could be a prediction or a classification. A model will  perform well when presented with new data that resembles the training data. That makes the training data used a hugely important part of understanding the model’s decisions. 

 With that in mind, the ICO’s guidance encourages organisations to collect and process training data in an “explanation aware” manner. European regulators take a similar view. The European Commission’s White Paper on AI frames training data as core to an AI system’s performance. The Commission proposes three requirements for training data:

  • Safety, the data should be sufficiently broad to ensure that the AI system can avoid dangerous situations.
  • Non-discrimination, the training data should be sufficiently representative. 
  • Privacy and personal data protection, linking back to the GDPR.

 
It also proposes that organisations document the training dataset (i.e. its characteristics, what values were selected for inclusion, etc.) and in some cases retain a copy of the training data itself to allow issues with the model’s performance to be traced and understood.

The White Paper is a consultation document, so it’s too early to say whether these specific recommendations will make it into law. However, it’s clear that the trend is towards a greater focus on training data as a key element of building compliant machine learning systems. 

There are a number of practical steps that organisations can take to help ensure compliance. They include carefully documenting any pre-processing – including transformation to protect individual privacy like pseudonymisation – and decisions about what data to include in the training dataset. Centralised privacy management can help. 

Centralising privacy management offers a number of advantages. First, it fosters a consistent approach across an organisation by creating a central forum for decisions about pre-processing to take place. In contrast, an ad hoc project-specific approach can be slow, inconsistent and complicated to audit. Second, centralisation allows you to document transformations applied to the data (e.g. tokenization). That can help to speed up data preparation for an ML project, because decisions on how to construct the training dataset can be taken once then applied consistently. Incidentally, documenting transformations supports compliance with the GDPR requirement to record processing (Article 30) and explainability in the context of the ICO’s guidance.

At a strategic level, a culture of accountability can help to drive innovation. Multidisciplinary teams of engineers, risk experts and business line leaders can work together on ML projects that use only the data they need in order to answer the most pressing business questions you face.

Interested in learning more on this topic? Check out this blog post on how data privacy can help data scientists

 

Five Minutes With Steve Coplan, Sr. Director of Product and Solution Marketing at BigID

Last week Privitar and BigID announced a new partnership and integration focused on achieving greater value and faster insights from sensitive data. We had the opportunity to catch up with Steve Coplan, Sr. Director of Product and Solution Marketing at BigID to discuss the partnership, hear his thoughts on challenges facing the financial services industry, as well his thoughts on the impact of COVID-19 on the data privacy landscape. The transcript of that conversation follows:


BigID and Privitar just announced a new partnership and integration with Privitar. Can you tell me more about that? 

SC: This partnership is the outcome of our mutual customers bringing us together to constructively address one of the core challenges to their key business initiatives. BigID pioneered the ability to discover and classify personal data, based on not just what data it is, but also whose data it is. Privitar has led in helping enterprises better enforce how that personal data is utilized and protected through a set of sophisticated policies. So, at a high level, customers saw the neat fit between knowing your data for privacy, and knowing what you can – or should not – do with that data for privacy-aware analytics.

This value proposition has gained more resonance as enterprises realize that solving for privacy translates into more productive data analytics initiatives. And, more broadly, that a more automated approach to managing privacy risks means that they can better understand their customers while still maintaining their trust and accountability for how they generate those insights. 

The initial interest in a product integration from the market was to address specific GDPR concerns about identifying an individual’s sensitive data for de-identification, or risk of re-identification through analytics. Over time we have seen demand broaden significantly beyond those relatively narrow use cases. Now that enterprises see data privacy protection as not just a set of compliance processes, and more as a core tenet of data management and data analytics, we felt it was appropriate to move toward a formalized partnership.  

How will the partnership and integration benefit businesses? 

SC: The integration reduces the amount of manual steps that highly skilled data scientists need to take to create safe, privacy-aware pipelines – so they can focus their attention on delivering insights. No more need to manually tag and confer what can be used for what purpose. Through the orchestration of BigID discovery insights and Privitar protection policies, the integration also helps mitigate the risk that personal or sensitive data can be either misused or inadvertently processed for a different purpose of processing than what was originally assigned. 

As the principle of data accountability becomes standard operating procedure, we also anticipate that the combination of our classification, consent correlation and cataloging with the Privitar watermarking capabilities for provenance will support sustainable auditing for our joint customers.

As we continue down the path of integration, we anticipate that the investment we have made in our metadata inventory and metadata exchange will allow customers to leverage Privitar’s extensive data protection and de-identification capabilities to not only automate policy enforcement, but also to explore innovative ways to generate value from their data.   

You’re hosting a webinar with Privitar and AWS on June 11th. What can attendees expect to learn? 

SC: Attendees will get to see the integration to enable cloud analytics on de-identified sensitive data in action – and they’ll see how the integration can facilitate the adoption of innovative and cost effective cloud services while managing data privacy risks. 

AWS has made enormous strides in providing customers cloud-based analytics services running on Lambda, for example, that can reduce the cost and accelerate time to insights. And, AWS has consistently invested in both their platform and tools to ensure that customers can secure their environments and comply with privacy regulations.

The challenge that many companies in regulated industries, like financial services, have faced is in building data pipelines that can allow them to take advantage of cloud services. As two AWS Advanced Partners whose joint customers see AWS as a strategic technology provider, we’ll demonstrate how we can solve for the data privacy risks in an automated, orchestrated and scalable way. 

Are there unique challenges financial organizations face when automating data analytics? What about unique benefits? 

SC: It almost goes without saying that financial services firms are highly regulated and are traditionally more risk averse. Privacy risks can compound some of the lingering operational risks that financial services firms might still see in adopting cloud services. 

But even if financial services firms have been relatively slow to adopt cloud-based services because of regulatory concerns in the past, there is no shortage of competitive pressures. Maintaining customer trust and delivering personalized experiences are key areas where competitive differentiation can play out. 

Finding that balance, and putting the right data at work, effectively, at scale and in accelerated timelines can make the difference. This is where our joint support and integration with AWS services comes into play for delivering value to financial services firms.

Is there anything else that you want to add?

SC: We are all navigating through a major shift across the globe, as we contend with the impact and repercussions of the Covid-19 pandemic. But what has emerged loud and clear, is that privacy concerns are front and center of any data-driven approach to contain the spread of the virus. 

This is an indication to us that it’s not just compliant use of sensitive data that is going to define the landscape – it is responsible use of the data. We see our partnership with Privitar as fundamental to helping out joint customers achieve that aim.

Accelerating Access to Safe and Usable Data with Automation in Financial Services

By Sean Butler, Director of Product Marketing at Privitar

Having efficient access to data-driven insights can be critical for any business looking to reach their goals. This is especially important in the financial services industry, where the desire to implement data-driven strategies for growth and innovation has been challenged by cumbersome data management practices, decentralized privacy governance, and mounting global privacy compliance regulations.

Organizations facing these challenges should look towards systematic and automated data privacy options to remove the friction between data sources and users while enforcing data privacy. Automating the process provides consistency, transparency, clear auditability, and reduces human error. Everything speeds up, but especially time to data.

Data privacy automation can be built into the process from the very beginning (like a vending machine for data) with a preset automation flow built right into the pipeline. This makes it easier to get data into the hands of data consumers faster.  It also makes it easier for organizations to adhere to and demonstrate adherence to industry regulations.


How automation applies to businesses that want to keep their sensitive data safe and usable

Automation changes how companies address the data request process. Specifically, automation allows organizations to broaden access to data both in terms of people who can access it as well as increasing the amount of data they have access to. Organizations should look to systems that allow them to standardize expensive parts of the data request process like the provisioning of sensitive data. This will allow them to accelerate their time to data enabling faster time to insights that drive value across the business.


Special considerations within the financial services industry
 

The financial services industry has a wealth of data about their customers. Leaders in the space are constantly analyzing that data to optimize their customer experience, product portfolio, assess risk, and many other aspects of the business. As organizations think about new ways to use their data one of the biggest risks is how to protect sensitive data. Automation of the execution of privacy policies and data lifecycle management will allow financial services organizations to better manage the associated risks. Associated risks are continually growing as regions around the world look to privacy with laws like GDPR and CCPA. Automating key aspects of provisioning sensitive data for analytical use will ultimately allow enterprises to consistently protect data and broaden its use.


Benefits of automating key pieces of the provisioning process for sensitive data

As data initiatives scale, they require a streamlined provisioning process that can meet the volume and breadth of data usage and stand up to regulations and audit. Manual approaches to provision sensitive, personal data cannot keep up with demand over time. Plus, manual processes are subject to human error, easily circumvented and ultimately fail under enterprise load. Automating key pieces of provisioning sensitive data allows organizations to accelerate request processing time, manage risk with increased efficiency, and broaden how data is used throughout their business. This is accomplished through a reduction of slow manual processes, and an increase in the amount of data that can be made available as a result of better data privacy policy management and application.

 

Tips for organizations looking to leverage sensitive data, faster:

  • Develop a working group that spans teams and focuses on the use of sensitive data inside the business. This group should focus on creating policies, process, and procedures that enable the organization to 
  • Identify requests for sensitive data that allow for the application of standard policies. These requests should be frequent in nature as it will allow you to both maximize the automation effort as well as proving the value of the project along the way.
  • Evaluate new platforms and processes with a long-term perspective in mind. Access to sensitive data will be an ongoing problem that touches teams across the organization. It is imperative that you procure products and build processes that are capable of servicing a broad range of use cases. 
  • Consider integrating tools and service providers that can help you with this process. Data privacy software companies like Privitar not only provide the software that can be tuned to situations and contexts, but also the privacy expertise to convene stakeholders, build a common understanding, and define policies to meet the range of needs in an enterprise.

Interested in learning more more about how data privacy automation can help your organization? Check out this webinar on June 4, 2020.

Why Data Privacy Must Be Prioritized as COVID-19 Accelerates the Shift to Digital

By Marcus Grazette, Europe Policy Lead at Privitar

Meetings, socialising, schooling, exercise classes, shopping. Adjusting to social distancing means shifting everyday activities online, where they are mediated by organisations operating digital platforms. These organisations will collect increasing volumes of data, much of it highly sensitive. This increases the risk of privacy harms for individuals, with corresponding risks of reputational damage and loss of trust for the organisations collecting that data. Building in privacy by design will help to protect individuals and maintain trust.

Covid-19 is pushing us to live digitally
The shift online, prompted by social distancing, means that organisations are collecting more data and using it in new ways. We can group those organisations into three broad categories. First, well-established digital platforms, including social media, whose customers are using their services more intensively. Second, organisations with some pre-existing digital footprint scaling up to meet unprecedented demand. Supermarkets with online delivery services fall into this category. Finally, organisations launching entirely new online services. 

For all three, building trust is paramount. But that may be more difficult for organisations without well-established practices for handling customer data. Data is powerful. Used well, it can provide the insights that fuel innovative organisations. But, in some cases, data use can feel ‘creepy’ for consumers, which could lead to a loss of trust and customer churn. 

This is often the case when data is used in ways that consumers do not expect, and worse when their information is exposed through breaches or inappropriate sharing. Predictability is crucial. NIST, the US National Institute for Standards and Technology, describes predictability as “core to building trust”.[1] Organisations have to make choices about how they use data, and what controls they apply.

Interacting online allows platforms to collect more data, and increases privacy risk
Shoshana Zuboff, author and Harvard professor, argues that we live in an age of surveillance capitalism. The surveillance element of her argument comes from the fact that some of the most valuable data is collected by observing how a user interacts with a platform. She calls this our ‘digital exhaust’. The times of day a user is active, the devices they use or the options they consider before adding an item to their shopping basket all provide rich insights. Zuboff points out that companies use information about us to build ever more detailed profiles and to nudge us towards actions that benefit the company, such as clicking on advertising. Collecting and using data carries two related risks. 

 First, the risk of privacy harm to the individual. In simple terms, this could be where data is used in a way that the consumer did not expect. It contributes to the sense of ‘creepiness’ and can lead to a loss of trust in the organisation. This can be particularly challenging where consumers feel that they have little choice but to use an online option (e.g.  to comply with social distancing measures in place to fight COVID-19). In addition, the risk increases with scale, as more people use online service, and as more aspects of a consumer’s life shifts to online and becomes linkable data points in their digital profile.

Similarly, organisations responding to demand from users outside of their traditional target market will need to tread carefully. For example, services usually found in corporate settings now host all types of digital interactions, from birthday parties to Parliamentary debates. Are existing privacy policies and protections appropriate for new types of users? The change in context matters and can have significant implications for brand and reputation. This is particularly relevant now as consumers explore their options for shifting online.

 Second the risk to the organisation, including a loss of trust and reputational damage. Techlash, the FT’s ‘year in a word’ in 2018, partly describes public frustration with organisations using data in ways that feel creepy or manipulative.[2] But techlash is not an automatic consequence of data use. The Information Technology and Innovation Foundation (ITIF), a US think tank, draws a parallel with the introduction of the automobile. First, there was excitement at new options for transport, then concern over issues like safely or the environment. In the automobile example policy responses, including regulation, have helped to address issues of concern.[3] In the data protection space, regulation also plays a key role in protecting privacy and building trust between individuals and organisations.

Platforms can build trust by focussing on privacy
As organisations respond to the shift to digital in unprecedented times, they should focus on building and maintaining customer trust. A clear, comprehensive approach to data privacy is key. Some of the new data being used will be highly sensitive. For example, the government has allowed online supermarkets some access to data on vulnerable people. Using that to prioritise deliveries to those who need it most is a useful outcome. Even when under pressure to move quickly, organisations should ensure that the core principles of data protection including transparency and purpose limitation, continue to govern their data use. Building trust, including by protecting privacy, will help to ensure that organisations respond successfully and emerge stronger.

[1] NISTIR 8062 Privacy Engineering and Risk Management, January 2017
[2] FT, Year in a Word: Techlash, December 2018
[3] ITIF, A Policymaker’s Guide to the “Techlash”—What It Is and Why It’s a Threat to Growth and Progress, October 2019 

 

In:Confidence Digital: Day Two Preview from Sean Butler, Privitar’s Director of Product Marketing

InConfidence Speakers

By Crystal Woody, Senior Director of Strategic Communications at Privitar

The first day of In:Confidence Digital is in the books– and what an amazing day it was! Members of the data privacy and analytics community came together to participate in interactive sessions from some of the industry’s pioneering speakers.  Bernardo Mariano Junior, CIO, at the World Health Organization stressed the importance of data to optimize the global response to the Covid-19 pandemic, while Alex Gladstein, Chief Strategy Officer at the Human Rights Foundation urged caution against the use of contact tracing. Industry leaders from AstraZeneca and BT gave their insights on safe and efficient data use, and we looked ahead to the future of the data privacy landscape.

On Thursday, May 21st, Privitar will host In:Confidence Digital day two, with privacy experts delivering in-depth workshops and demonstrating how to put the lessons learned on day one into practice. Sean Butler, Privitar’s Director of Product Marketing and host for In:Confidence Digital Day 2 offered a sneak peek into what is in store. The transcript of our interview follows. 

For more information or to register for free, visit: https://inconfidence.privitar.com/digital 

CW: You’ve had the opportunity to preview the content from the second day of In:Confidence Digital. What can attendees look forward to learning?
SB: Day 2 of In:Confidence Digital is all about the practitioners, the ones who are tasked with putting together a plan and executing on the steps required to bring privacy into modern, data-driven organization. The content is intriguing because it is designed to provide the audience with actionable takeaways that they can implement in their organization regardless of their current stage of privacy maturity. The topics will be brought to life with interesting examples that illustrate how important people, process, and technology are in achieving a best-in-class privacy organization. 

CW: Now, let’s dig into the topics that will be covered during Day two. How does leveraging cloud-based technologies and automation impact the ability to maximize the value of data-driven insights?
SB: Cloud-based technologies and the automation of process has allowed our customers to accelerate their time to data while also broadening their data access. They accomplish this through the systematic application of policies directly on the data that is being made available for consumption in their cloud environment. This allows them to take advantage of the advanced computing power available in the cloud after the data has been made safe for use.

CW: What should an organization consider when evaluating potential data privacy tools?
SB: Key things for an organization to think about when evaluating a data privacy solution are their long term data outlook, in other words not just what are the needs of today but what will their needs be in the next 2-5 years. This vantage point should allow companies to think about how many use cases they will have, how much data they will be consuming, and also and maybe most importantly how sensitive that data will be. After this evaluation, they will be able to make an informed investment as opposed to buying a solution that solves a single pain point but lacks the scalability to grow with your ongoing needs. This is something Mark Semenenko does an excellent job of covering on day 2 of In:Confidence Digital.

CW: What is a privacy center of excellence? Why would a business want to create one? What are the first things to consider when doing so? 
SB: The Privacy Center of Excellence (COE) is a team that is designed to elevate the overall privacy posture of your company. This group is tasked with defining privacy policies as well as a key stakeholder in any decision made around how to handle sensitive data. They are created as a way to strategically implement privacy across teams in order to standardize policies and technologies used across the organization. Our Senior Privacy Engineer Pat Bates does an excellent job of outlining how to get your own COE started on day 2!

CW: What piece of advice would you offer to an organization that is trying to balance their data utilization and data protection?
SB: The best piece of advice I can offer is to be customer-centric with your approach to privacy. Privacy and managing sensitive data is about mitigating the risk associated with that data should it be found someplace it shouldn’t be. Consumer trust in your brand can be put at risk constantly if you aren’t implementing a plan to manage privacy. Companies now more than ever need to take the steps to get their people, process, and technology in order with respect to privacy. We are seeing consumers across the globe become less and less tolerant of companies that don’t protect the personal information of their customers and I don’t see that trend changing.

In:Confidence Digital Sneak Preview: Insights from Christina Bechhold Russ, Director, Samsung NEXT

InConfidence

By Crystal Woody, Senior Director of Strategic Communications at Privitar

Recently, I had the opportunity to catch up with Christina Bechhold Russ, Director at Samsung NEXT, an early-stage venture capital fund investing in software and services. Christina also co-founded Empire Angels, a New York-based fund and angel network of young professionals investing in early-stage startups, with a focus on supporting millennial entrepreneurs. She is a regular contributor on startups and leadership for the Wall Street Journal, a mentor for Startup Sesame and the Entrepreneurial Refugee Network and sits on venture fund advisory boards in both the US and South America. Christina is also a TEDx speaker, and was recognized by the New York Business Journal as a 2016 Woman of Influence, by Business Insider as a Woman to Watch in Venture Capital in 2018 and by Management Today & The Daily Telegraph as one of Britain’s 35 Women Under 35 in 2019.

During our conversation, we discussed the balance of data utilization and consumer empowerment, how consumers can better protect their data, and how businesses can harness the power of technology to protect their customers. The transcript of our interview follows.

Christina will share additional insights on Data Privacy Technology and Consumer Empowerment on May 14th (5:30pm BST / 12:30pm EDT) during In:Confidence Digital. For more information about her session, or to register for free, visit: https://inconfidence.privitar.com/digital 

CW: How does Samsung NEXT define ‘consumer empowerment’ and what are you looking to invest in?
CBR: We believe in a not so distant future where consumers have the agency and control to determine how they interact with technology and how they leverage technology to interact with each other. In this regard, our Ventures team looks to invest in technologies and business models that give consumers more control of their data, their attention, their intention, and their time. 


CW: Can consumer empowerment and data utilization for businesses truly co-exist?
CBR: The short answer: yes. The reality is that today, too many companies wield extensive influence due to a primary business model built around personal data mining, tech addiction and surveillance advertising. We believe these companies are more vulnerable than they appear because their business model is under threat, from government regulation, antitrust scrutiny, and consumer backlash. As a result, a growing number of startups are emerging to take on these incumbents, and challenge their dominance. Last year, we invested in Scroll, which makes it easier and faster for consumers to navigate content on the web by partnering with publishers to show ad-free content. Instead of ad-blocking, Scroll employs a membership model, and measures the engaged time spent with that site to calculate how much that site should earn each month. It’s also peace of mind for the consumer to know that their data is never sold or given to anyone. 


CW: What can consumers do to better protect their privacy rights and data?
CBR: Individuals are realizing that the vast amounts of information being collected about them is not always used to their advantage. The expansive nature of this data collection, which originally made the problem difficult for consumers to grasp, has now instead engendered distrust and concern on how this information can be used against them. There are certainly different generational attitudes, though—my relationship with privacy as a consumer is very different from that of my parents; as a Millennial, I’m more likely to be comfortable trading my data in exchange for more personalization, for example.

In 2019, we announced the first cohort of the Samsung NEXT Stack Zero Grant program, a non-equity program to support early-stage teams building decentralized technologies. Grant recipients and a growing network of those concerned with privacy and data control gathered last summer where we tackled an array of topics, including this idea that one of the key problems with the things we build is that they might be used against us. And it’s because of this that many of us today choose to simply mitigate the amount of information about us that we put on the internet. It’s our job to consider how technology needs to be developed for the coming generations who will grow up in a world where living life in public is the norm—where trading privacy for convenience is all they know.


CW: How can businesses harness the power of technology to protect their customers?
CBR: In short, invest in data protection services. In the past months, we’ve seen an increase in enterprise companies viewing data as a liability and actually wanting to minimize how much user data they store. And it makes sense: banks, game developers and financial institutions topped the list of data breaches in 2019. The less data you hold, the less attractive you are as a target. It’s important for businesses to invest in solutions that help them comply with regulations while protecting their customers. In fact, it may even turn out to be less expensive than the alternative. 


CW: What is your favorite new privacy technology for businesses?
CBR: We’re most interested in solutions that favor a decentralized approach, especially on device. I’m quite interested in privacy preserving personalization—companies like Canopy, for example, that can use on-device machine learning to customize content recommendations rather than cookies that share your behavior with 30 affiliates.


CW: Anything else you’re paying attention to in the news or otherwise?
CBR: Consumer data privacy and decentralized solutions are front and center right now in the debate over COVID-19 contact tracking—Apple and Google have taken a privacy first approach with their API, while several governments, including the UK, have said they want centralized solutions. The debate will be further complicated as public health authorities evaluate ideas around regular testing and immunity passports. What is a reasonable amount of personal data for a consumer to give up to their government in a health crisis? Who decides? What do businesses need to know to allow consumers access? Can it be architected in a way that ensures, post-pandemic, governments and businesses no longer have that same access, or does this become a regular way of life? Will be very interesting to see how the public and private sectors tackle this. 

In:Confidence Digital Sneak Preview: Insights from Polly Sanderson, Policy Counsel at Future of Privacy Forum

InConfidence Speakers

By Crystal Woody, Senior Director of Strategic Communications at Privitar

Last week, I had the opportunity to catch up with Polly Sanderson, Policy Counsel at Future of Privacy Forum, where she focuses on legislative outreach and analysis, and privacy legislation at the federal and state level. FPF is a prominent D.C. based think-tank with expertise on emerging consumer privacy issues.

During our conversation, we discussed current the state of the data privacy landscape in the United States.  We also talked about some tips and insights she wanted to share with businesses trying to navigate this changing regulatory landscape. The transcript of our interview follows.

Polly will share additional insights on the US Data Privacy Landscape on May 14th (5:00pm BST / 12:00pm EDT) during In:Confidence Digital. For more information about her session, or to register for free, visit: https://inconfidence.privitar.com/digital

CW: What is driving the momentum for new privacy legislation in the United States?
PS: Momentum for US privacy legislation comes from a number of places – grassroots, the States, and external pressure for other jurisdictions implementing their own laws. After a series of high-profile scandals and data breaches involving personal data, this has become a mainstream issue. Equifax, Cambridge Analytica, and more recently Clearview AI have put the spotlight on whether individuals can trust companies with their data. In part, the California Consumer Privacy Act (CCPA) is a manifestation of the desire of individuals to increase legal protection. Since the enactment of the CCPA, many other states have introduced similar bills to give their own constituents similar or stronger protections. To increase consumer trust and adoption of digital products and services, and to prevent the emergence of inconsistent state laws, industry is supportive of implementing a uniform set of federal rules. Moreover, many companies have also already implemented internal compliance programs to comply with the EU’s General Data Protection Regulation (GDPR). 


CW: What are the major points of consensus and ongoing discussion in the US privacy debate?
PS: In principle, there is widespread agreement on the need for privacy legislation in the United States. Since the end of 2018, many proposals have been introduced to Congress from both Republicans and Democrats. At this stage of the privacy debate, the general legislative framework is fairly well-settled. It consists of a set of rights for individuals, obligations for covered entities, the Federal Trade Commission (FTC) as the primary regulator, and additional enforcement by State Attorneys General. The details vary between proposals, but although many of the issues are complex there is much room for compromise. At the crux of the debate are substantive processing limitations and issues involving automated decision-making, algorithmic bias and discrimination. These are hugely important aspects of the debate, with major privacy implications for individuals and groups, as well as commercial practices. Until these issues are worked out, some of the more political issues – preemption and private right of action – are unlikely to be resolved. I am optimistic that a nuanced and balanced solution is possible. 


CW: What are the biggest points of distinction between US data privacy legislation and international approaches to privacy protection? Is any country “getting it right?”
PS: What may be the “right” approach for one country can rarely be copied and pasted to another jurisdiction. Data privacy laws must be considered in the context of the cultural and constitutional backgrounds, values, and regulatory appetites from which they originate. One of the largest differences between US data privacy legislation and the European approach is that, under the GDPR, covered entities must have a “legal basis” to collect covered data. This requirement is anchored at the constitutional level in the EU. Meanwhile, the US’s constitutional protection of the freedom of speech has been interpreted by US courts to protect the free flow of information. It is therefore not surprising that most legislative proposals in the US do not include a requirement for covered entities to have a legal basis for the collection of personal information. Traditionally, the U.S. has regulated the processing of personal data in areas where there is a risk of harm. This has created fertile ground for data-driven innovation. However, the proliferation of data-driven innovation in modern society now calls for a general regulatory framework to promote consumer trust of techy products and services. Of course, an overly prescriptive law could have the unintended effect of benefitting large companies at the expense of small but innovative players and start-ups which lack the resources to hire large legal teams. In general, most US proposals take a more nimble, holistic regulatory approach than the GDPR.


CW: How do you expect COVID-19 to impact the US data privacy landscape?
PS: COVID-19 has underscored the need for a federal data privacy law in the United States. If there had been a law in place before the pandemic, then there would be less confusion among policymakers and companies about how to share and use data to combat the emergency and what safeguards to put in place. The US has been slower to act than the EU – there has been much guidance and clarity from EU DPA’s. However, the pandemic has also put the consumer privacy debate on hold temporarily. Before the outbreak, over a dozen states were considering their own privacy laws. Now, the focus of legislators has moved toward formulating urgent economic and social responses to the pandemic. But without adequate data protection, citizens are less likely to trust technological solutions, and there is a greater risk that measures put in place now to fight COVID-19 could have implications for surveillance now and in the future. It is important for legislators and companies to be cautious, and to learn lessons from how 9/11 impacted the balance between surveillance and human rights. 


CW: What piece of advice would you offer to businesses that are trying to navigate a rapidly evolving data privacy landscape?
PS: If I could give one piece of advice to businesses that are trying to navigate the rapidly evolving data privacy landscape, it would be that legislators are never going to be “done” dealing with the regulatory framework of consumer data and the issue of privacy. We are living in a new era. To remain competitive, to maintain the trust of consumers, and to continue to win contracts with other businesses, you need to “lean in” by demonstrating that your privacy and security practices are state-of-the-art. Where possible, businesses should employ Chief Privacy Officers to oversee the implementation of comprehensive privacy programs internally, even if it is not legally required. I cannot stress enough how important it is for there to be open lines of communication between your privacy team, IT team, and upper-level management. This is a board room level issue, this is a reputational issue, and this is an issue that is not going away. Start-ups will benefit from practicing privacy-by-design from the outset, and throughout the design, development, and deployment of their products and services. Regulators will look kindly upon organizations that are able to demonstrate a good-faith effort to practice good data practices, even in a rapidly changing landscape.

 

 

Why Data Privacy is Critical for Helping Data Scientists

By Javier Abascal Carrasco, Engineer at Privitar

 

Being a data scientist is hard for many reasons, a significant one being the famous 80/20 dilemma. Data scientists and machine learning experts spend about 80% of their time generating, preparing and labeling data and only 20% of their time building and training models! Isn’t it crazy? You hire someone because of their ability to build complex and sophisticated models but they barely spend time doing it.

 

 

Don’t get me wrong. Obtaining, crunching and preparing data is part of the job and has huge implications on final model performance. At the end of the day, a learning model is only going to be as good as the supporting data. It is crucial to pay attention and try to maximize efficiency on the time spent in the data preparation stage. For the rest of this post, I would like to highlight why privacy is correlated with the work of a data scientist and how an organization can accelerate the time to realizing the value of data.

As a Data Scientist, I Need to Explore and Understand What Is Inside these Tables

A model tends to start with an objective of what we want to achieve (i.e., predicting something or classifying a subset of a population). Once that is clear, we need to find relevant data sources which will help us realize those goals. In most of today’s cases, data sits in tables across a multitude of data warehouses, sometimes across several distinct environments. In the best case scenario, you will have a data catalog in place which can be used to identify the data. If not, you must reach out to different teams to understand what is available. At the end of the day, you will end up doing two main activities:

  1. Accessing a multitude of data tables, including ones with highly sensitive information, which will force you to request access
  2. Querying tables, to visualize a few rows and get a sense of what information is stored there.

There are a couple serious consequences for data scientists. First, there is friction in the access request process that can easily take days, weeks, or even months; this can depend on sensitivity of the data as well as the processes currently in place, technology limitations, and cross-departmental approvals. The data scientist will need to provide a justification for access or even have specific meetings with security and privacy individuals in order to gain approval. And if the data will be used in the cloud, there is likely an additional process to ensure the data is protected adequately from breach to minimize risk to the organization. 


Second, data scientists will get access to sensitive data,
including the ability to identify individuals and potentially harm the organization if they disclose certain details. Internal actors were responsible for 43% of data loss based on an well known 2015 Intel/McAfee report. Often they don’t need sensitive columns as part of their data analysis, but they can access them because the sensitive data sits together with the more useful pieces of information. 


So, how can you mitigate these consequences? 


Very simple. With data privacy.

Data Privacy Helps Democratize Access to Data, Reducing Risk While Keeping Utility

De-identifying data using data privacy techniques addresses the friction and risk around using sensitive data, enabling data scientists to minimize the time collating data and allowing them to spend more time running and analyzing models. There are several critical privacy aspects that organizations should aim to achieve when adopting data privacy to better empower data scientists:

  1. Discovery of sensitive data sources and personally identifiable information across the organization
  2. Creation of a data catalogue, which accelerates and empowers the search of useful information.
  3. Availability of advanced enhancing techniques, in form of rules to be applied, for protecting sensitive records. Allowing the creation of privacy policies mapped to datasets structures that do not reveal sensitive information.
  4. Capacity to control the data releases in specific domains and with traceability through the usage of data watermarking.


When well orchestrated, these previous points will let the security and privacy departments accelerate the acceptance of data access, allowing scientists to explore and visualize data faster and without friction. 


As a result, people accessing information won’t be working with raw data,
reducing the overall risk to the organization. Moreover, the fact that data is watermarked deters insider misuse and negligence since they can be easily identified and information revealed won’t have value outside of the organization.

 


Last but not least, the main reason data scientists are reluctant to work with protected data is because their past experiences used basic masking techniques that destroyed the utility of the data and hence, reduced the performance of the models trained. Applying advanced privacy policies gives data scientists the capacity to join data across tables, keeping the value of categorical variables and adjusting the level of privacy they want for numerical variables (inserting controlled noise that will keep the statistical value of it). These policies give them full control and flexibility, significantly reducing the trade-off in model performance versus risk mitigation.  In short,
the current bias that exists from data scientists against the use of protected data is because of past experiences using basic privacy techniques.


The use of a cutting-edge privacy platform, such as Privitar, will allow your organization to decrease the risk of friction when accessing sensitive data sources, enabling you to spend significantly less time organizing and collating data and more time gaining critical insights from the analysis.