Self-service access to safe data
Protect data and manage risk
Analyze conversational chat data
Reduce the time and cost to comply
Right data in the right hands
Align control and business use
Controlled access to data
Flexibility, consistency, scalability
Our professional services
Power responsible use
From clinical to commercial
Optimize data tests
Open new revenue streams
Realize the potential of the cloud
Protect data from misuse
Transform your data
Opinion and industry insights
An A to Z of the industry
The podcast for data leaders
Press releases, awards, and more
Staying at the cutting edge
The team behind Privitar
A thriving partner ecosystem
Our story, values, and careers
Dedicated customer assistance
Jul 06, 2020
By David Bernstein, Data Privacy Engineer at Privitar
Data redaction is a data masking technique that enables you to mask (redact) data by removing or substituting all or part of the field value. This helps protect sensitive personally identifying data.
One of the first methods to protect sensitive information was to implement column based security. Column based security can ensure a sensitive column is not exposed to a user without the proper privileges. This method, while effective, can present issues to the calling application (like a BI tool), as it is expecting a certain number of columns to be returned from the query.
Redaction was one of the first methods to protect sensitive data, yet return a column value. Some redaction techniques can be referred to as ‘simple masking’ as it is a one-way substitution scheme.The most common use of the redaction technique is to ‘redact’ the entire column, and replace with a constant. In this method, the query returns the proper number of columns, but instead of the actual value, the column value is replaced with a constant. For example, when applying redaction to a Social Security Number (SSN), the result might be ‘N/A’ or ‘XXX-XX-XXXX.’
Another redaction technique is to have a ‘look-up’ to find a value to put in the resulting column, instead of a constant value.For example, a column ‘FirstName’ might have a value of “Susan.” and the look-up would get a name from a random list and replace it with “Cathy.”
Often, part of a sensitive column will have value. This part can be shown without exposing the entire column.
We’ve all been on the phone when we are asked to verify ourselves using the ‘last 4 digits’ of our SSN. In this case, it is likely that the person asking you that question can only see the last 4 digits of your SSN. So, the redacted SSN (last 4 digits) has value in the verification process, but has been redacted enough to not be a direct identifier. In data privacy terms, we have turned a direct identifier (SSN) into an indirect or quasi identifier (SSN last 4 digits).
Because it is not a direct identifier, we often are asked another question, like the last 4 digits of our phone number, and the same methodology applies. This technique of partial redaction goes beyond the SSN and phone number. Some examples are below:
In the example above, the letter “S” designates that the diagnosis relates to “Injuries, poisoning and certain other consequences of external causes.” The first three characters of the ICD-10 above (S86) would reveal “Injury of muscle, fascia and tendon at lower leg level.” More information, but again, not the full ICD-10 code information and the exact injury is not revealed.
When designing a data privacy strategy, data redaction is often considered as a first step. This entails reviewing your sensitive data, and determining:
While data redaction can be incredibly powerful, it is also important to note when it is not the correct de-identification technique.
Redaction is typically not reversible. For example: If you have redacted all but the last 4 digits of an SSN, and after some analysis decide you wish to have the full, actual SSN, that is not possible. If this reversal or re-identification to the real value is needed, other privacy techniques, such as tokenization, should be used.
If the redacted field is a unique or direct identifier, or a unique key in database terms, partial redaction can remove the ‘uniqueness.” Depending on the use of the field, that can be problematic. For example, if a report rolls up transactions by 8 digit account number, and the account number is redacted to the first 4 digits, that could be problematic.In this case, 12347777 and 12348888 both become 1234. So transactions for both accounts would roll up under the same number, which is not desired behavior.
Use of data redaction should be a key component to any data privacy strategy.
Want to learn more about how data masking, redaction, and other forms of de-identification can help you keep your data safe and usable? Check out Privitar’s Complete Guide to Data De-Identification.
Our team of data security and privacy experts are here to answer your questions and discuss how modern data provisioning can fuel business growth.