Anonymizing Personal Data “Not Enough to Protect Privacy”: Study

Times journalists exposed Donald Trump’s 1985-94 tax returns in May 2019.

The research demonstrates for the first time how easily and accurately this can be done – even with incomplete datasets.

A Demonstration
Alongside the paper, the researchers published a machine learning tool to evaluate the likelihood for an individual’s characteristics to be precise enough to describe only one person in a population of billions.

They also developed an online tool, which doesn’t save data and is for demonstration purposes only, to help people see which characteristics make them unique in datasets.

The tool first asks you put in the first part of their post (UK) or ZIP (US) code, gender, and date of birth, before giving them a probability that their profile could be re-identified in any anonymized dataset.

It then asks your marital status, number of vehicles, house ownership status, and employment status, before recalculating. By adding more characteristics, the likelihood of a match to be correct dramatically increases.

Senior author Dr. Yves-Alexandre de Montjoye, of Imperial’s Department of Computing, and Data Science Institute, said: “This is pretty standard information for companies to ask for. Although they are bound by GDPR guidelines, they’re free to sell the data to anyone once it’s anonymized. Our research shows just how easily – and how accurately – individuals can be traced once this happens.

He added: “Companies and governments downplay the risk of re-identification by arguing that the datasets they sell are always incomplete. Our findings show this might not help.

“The results demonstrate that an attacker could easily and accurately estimate the likelihood that the record they found belongs to the person they are looking for.”

Co-author Professor Julien Hendrickx from UCLouvain said: “We’re often assured that anonymization will keep our personal information safe. Our paper shows that de-identification is nowhere near enough to protect the privacy of people’s data.”

The researchers say policymakers must do more to protect individuals from such attacks, which could have serious ramifications for careers as well as personal and financial lives.

Dr. Hendrickx added: “It is essential for anonymization standards to be robust and account for new threats like the one demonstrated in this paper.”

Dr. de Montjoye said: “The goal of anonymization is to help use data to benefit society. This is extremely important but should not and does not have to happen at the expense of people’s privacy.”

Imperial notes that the online demonstration tool does not save personal data and is for demonstration purposes only.