Data masking refers to a number of techniques that hide original data with random characters or data, such as tokenization, perturbation, encryption, and redaction. It produces a similar version of the data, e.g. for software development and testing, or training of ML models. Masking maintains good data utility since it doesn’t alter anything but the identifiers. When masking data, it’s usually important to retain the complexity and patterns within the data – while masking sensitive values.
Masking is one of the most commonly used protection mechanisms for sensitive data in organizations. It protects the privacy of individuals by obscuring direct identifiers (such as name, address, account number, email, phone number etc.), thus making it impossible to look anybody up in the dataset by these identifiers. Masking of direct identifiers alone isn’t enough when you’re trying to protect data against more sophisticated attacks (e.g. a linkage attack, where background information is used to identify individuals).