PrivacyAnonymizing Personal Data “Not Enough to Protect Privacy”: Study

Published 24 July 2019

Current methods for anonymizing data leave individuals at risk of being re-identified, according to new research. Researchers demonstrated that allowing data to be used — to train AI algorithms, for example — while preserving people’s privacy, requires much more than simply adding noise, sampling datasets, and other de-identification techniques.

Upgraded procedures may not provide sufficient anonymity // Source: navy.mil

The researchers say their paper, published in Nature Communications, demonstrates that allowing data to be used - to train AI algorithms, for example - while preserving people’s privacy, requires much more than simply adding noise, sampling datasets, and other de-identification techniques. They have also published a demonstration tool that allows people to understand just how likely they are to be traced, even if the dataset they are in is anonymized and just a small fraction of it shared.

The researchers say their findings should be a wake-up call for policymakers on tightening the rules for what constitutes truly anonymous data.

Wake-up Call
Companies and governments both routinely collect and use our personal data. The way our data is used is protected under relevant laws like GDPR or the US’s California Consumer Privacy Act (CCPA).

Imperial notes that data is ‘sampled’ and anonymized, which includes stripping the data of identifying characteristics like names and email addresses, so that individuals cannot, in theory, be identified. After this process, the data’s no longer subject to data protection regulations, so it can be freely used and sold to third parties like advertising companies and data brokers.

The new research shows that once bought, the data can often be reverse engineered using machine learning to re-identify individuals, despite the anonymization techniques.

In the paper, 99.98 per cent of Americans were correctly re-identified in any available ‘anonymized’ dataset by using just 15 characteristics, including age, gender, and marital status.

Co-author Dr Luc Rocher of UCLouvain said: “While there might be a lot of people who are in their thirties, male, and living in New York City, far fewer of them were also born on 5 January, are driving a red sports car, and live with two kids (both girls) and one dog.”

This could expose sensitive information about personally identified

individuals, and allow buyers to build increasingly comprehensive personal profiles of individuals.

For example, re-identifying anonymized data is how New York