Here you can find a short video interview with Theresa Stadler, Data Scientist at Privitar, in which she talks about some of the risks of Reconstruction Attacks.
[0:10] Is aggregated data safer than row-level data?
Usually yes; it is harder to learn sensitive information about an individual from aggregate data. However, one type of attack, a differencing attack, can expose the data about certain individuals. But there's also something which is called a reconstruction attack, which poses an even greater risk.
[0:33] Reconstruction attacks use a large number of statistics, such as summary statistics, histograms, or charts and they reconstruct the original raw data set. For example, the 2010 US year census was based on a raw data set about 300 million people, and the US census then released 5 billion summary statistics about this data.
Returning each statistics into a linear equation, and solving all of them together, all or most of the individual sensitive values can be recovered accurately.
You can think of this process intuitively as like Sudoku: through reasoning about all these constraints, one can determine the missing values.
[1:20] How can you prevent reconstruction attacks?
To prevent this, one can determine how many statistics can be released safely before risking reconstruction. One can also use noise addition to safely release more statistics about the same data.
This allows organisations to release insights about the data safely without having to worry about the serious privacy breakdown caused by reconstruction attacks.