Data Generalization

Maintain control and maximize utility with Privitar’s data generalization capabilities

Download Guide

Why Generalization?

Tokenization, a de-identification technique, is an effective way to protect primary identifiers. However, it leaves datasets susceptible to more sophisticated attacks, such as linkage attacks. In a linkage attack, quasi-identifiers are used to join datasets and form a richer combined dataset that can re-identify individuals or reveal unintended private information. Data generalization allows you to replace a data value with a less precise one via binning, reformatting, rounding or truncating, which preserves data utility and protects against linkage attacks.

Data Generalization Examples

33

Age

30 - 40

Age

12/07/1978

Date

1978

Date

Los Angeles

Location

California

Location

Apply K-Anonymity to Protect Against Data Linkage

In a linkage attack, quasi-identifiers are used to join datasets and form a richer combined dataset that can re-identify individuals or reveal unintended private information.K-anonymity applies generalization to ensure that the smallest number of indistinguishable individuals is a group of size ‘k’. That means every individual in a dataset is indistinguishable from at least k-1 others.

Maintain Control with Manual Data Generalization

With Manual Generalization, you define the data generalization rules. Privitar’s advanced capabilities allow you to set the k-anonymity threshold in accordance with your tolerance for linkage and re-identification risks for each specific data use. The Privitar Platform automatically drops rows that are in clusters that fall below thresholds so that you know k-anonymity is achieved.

Maximize Utility with Automatic Data Generalization

With Automatic Generalization, you define k-anonymity cluster size and allow the Privitar Platform to dynamically determine the data generalization rules to use. Privitar adjusts the blurring used to achieve k-anonymity for all quasi-identifiers – without the need to drop records. You maintain the greatest precision and include all of the data to achieve maximum data utility.

Related Content