Why It’s Essential to De-identify Chatlog Data Before Using it in Text Analytics

June 22, 2022

By Nilesh Parmar, Senior Sales Engineer at Privitar 

Digitalization and technological innovations are changing the way businesses communicate with their customers. It’s faster than ever for organizations to connect and engage through tools like live chat, chatbots, social media, and text messaging. These interactions are highly personalizable, and they increasingly become the preferred communication method for consumers. Businesses are taking notice. It is projected that by 2025, chatbots will handle 75-90% of customer queries. 

They also offer a treasure trove of information that can be used for analytics. 

Still, only 18% of organizations report being able to take advantage of this type of unstructured data

Why are so many organizations leaving so much of their data behind? 

For many, the answer is simple. Risk. 

While there is no doubt that the data contained within these channels is valuable, it is also very likely sensitive, personal data. Names. Addresses. Account numbers. Credit card or bank account numbers. Birth dates. The list goes on. This type of information is often essential to helping a customer via chatbot. But if that information were to get in the wrong hands, and if it is not handled correctly, the business can face significant risk. 

The good news is that most of this information is not actually needed in its entirety to be valuable. 

The key to making sure this data remains safe lies in de-identification while maximizing the impact of its analytical utility.

 

De-identifying chatlog data before using it for text analytics is essential 

De-identifying sensitive data enables you to safely leverage the data for analytics. You can have the best of both worlds – data that is both safe and highly usable.

De-identifying sensitive data found within chat text gives you the ability to analyze the valuable information contained within the chats to better understand your customers and their behavior—while at the same time upholding customer privacy and ensuring compliance with relevant regulations!  

Privitar recently introduced a new solution that helps make it easy to de-identify sensitive data with unstructured chat text: Privitar Data Privacy for Chat. While more basic chatlog solutions can only classify sensitive data, Privitar allows you to create policies around these classifications to enable more meaningful analytics. For example, rather than simply classifying both “London” and “New York” as just LOCATION, Privitar will apply a policy to the classifications, converting  “London” and “New York” to “UK” and “US” respectively. This immediately allows you to start grouping data and joining it with other de-identified datasets to gain a deeper understanding of the data you need to analyze.

Better yet, the policies you apply are not just standalone policies for protecting chat data. They are the same policies you can use to protect the structured data held in your data warehouses, data lakes, and other analytical systems, enabling you to take a holistic approach to data protection.

Additionally, since chat data is unstructured, it means that certain words (which could be slang, for example), may not get classified and protected during the first pass. Privitar has introduced a machine learning element to Data Privacy for Chat, enabling you to teach the solution to get better at recognizing those words. That, in turn, improves how you protect your data.  So over time—and with training—it actually gets better!

We developed Privitar Data Privacy Chat in partnership with our customers, based on real-world needs, and piloted it with great success over the last year. We’ve created in-depth case studies based on these pilots which you can read here, but you can have a glimpse into both below:

 

ABN AMRO unlocks the power of customer interaction data

ABN AMRO, a leading Dutch bank, with almost 7 million customers and offices throughout Europe, collects large volumes of unstructured data from customer interaction channels including online chat logs, social media posts, and call center transcripts.

They wanted to perform analytics on this unstructured data to gain valuable Voice of the Customer (VOC) insights, make data-driven decisions to improve customer experience, and drive operational efficiencies. However, the use of this data was restricted due to privacy concerns. ABN AMRO partnered with Privitar Labs to investigate how innovations in privacy-enhancing technologies could meet the bank’s need to protect identifying information in unstructured text.

The collaboration focused on unstructured customer interaction data from online chat logs and social media messages. Our prototype was used to redact identifying information within unstructured data, and trained to detect new entities specific to individual customer requirements (for ABN AMRO, the requirement was to learn currency as a new entity). 

The prototype exceeded ABN AMRO’s requirements, and dramatically reduced the amount of time required by a human to label data in order to train a model for machine learning. Our active learning strategy generated twentyfold gains in operational efficiencies for manual labeling, and the pilot successfully unlocked access to customer interaction data for ABN AMRO, enabling the bank to provision unstructured data for analytics, safely. 

Being able to detect and redact identifying information in their unstructured data removed privacy barriers and opened up the value of this data for ABN AMRO through analytics, as redaction allowed unstructured data to be provisioned to a wider group of analysts while upholding customer privacy and ensuring regulatory compliance.

Read the full ABN AMRO case study here

 

Discovery preserves privacy while optimizing Voice of Customer analytics

Discovery is a leading insurance and banking provider with a global reach, headquartered in South Africa and serving over five million customers. Driven by a mission to help customers to lead healthier lives, their operations are underpinned by first-class customer support. They leverage data analytics to better understand customer behavior, motivations, attitudes, and opinions, and to optimize the services that they offer. 

For Discovery, safeguarding customer data is of paramount importance, and data privacy is non-negotiable.

Discovery wanted a way to protect sensitive information in unstructured data sources so that their data scientists could be freed to use unstructured text in Voice of the Customer (VOC) analysis without restriction, and drive value for their customers safely and ethically. Discovery partnered with Privitar Labs to see how innovative privacy-enhancing technologies would enable safe data provisioning to their analytic tools by protecting sensitive information in unstructured data from call center transcripts, chat logs, and social media, while also preserving the analytical value of the data.

Privitar’s prototype demonstrated a significant reduction in the privacy risks arising from the use of unstructured data. Ninety-three percent of identifying information was successfully captured in tests with real customer interaction data. This high accuracy enabled Discovery to use unstructured data safely, with confidence that the associated privacy risks were significantly reduced. The analytical integrity of the data was not only preserved but increased when misleading values were removed—only five percent of sentiment scores changed in the redacted dataset, and Discovery confirmed that these changes resulted in more accurate sentiment scores. 

The partnership with Privitar removed privacy risk as a barrier to provisioning unstructured data and empowered Discovery to provision unstructured data on a much wider scale, leading to greatly increased value from their data, and unlocking the voice of the customer.

Read the full Discovery case study here. 

 

Learn more about how Privitar’s Data Privacy for Chat can help your organization leverage conversational chat data while mitigating privacy risks

 

data privacy for chat
Privitar

Ready to learn more about Privitar?

Our team of data privacy experts is here to answer your questions and discuss how data privacy can fuel your business.

Protected data demo