A property of a de-identified dataset whereby the dataset is defined as being k-anonymous if for every combination of indirectly identifying attributes there are at least k records. K-anonymity is often used as a property of de-identified datasets in which indirect identifiers have been obfuscated to make them less specific.

k-anonymity is a useful property when used in a de-identification process as it shows that the information for each person contained in the dataset cannot be distinguished from at least k-1 other individuals whose information appears in the dataset from the indirect identifiers.

For example, if a de-identified dataset contains two indirect identifiers for an individual (year of birth and two-character zip code). Such a dataset would have a property of k=5 if there were at least five individuals with the same combination of year of birth and two character zip code.

Return to glossary