PrivacyAI Could Help Solve the Privacy Problems It Has Created

By Zhiyuan Chen and Aryya Gangopadhyay

Published 22 June 2020

The stunning successes of artificial intelligence would not have happened without the availability of massive amounts of data, whether its smart speakers in the home or personalized book recommendations. These large databases are amassing a wide variety of information, some of it sensitive and personally identifiable. All that data in one place makes such databases tempting targets, ratcheting up the risk of privacy breaches. We believe that the relationship between AI and data privacy is more nuanced. The spread of AI raises a number of privacy concerns, most of which people may not even be aware. But in a twist, AI can also help mitigate many of these privacy problems.

The stunning successes of artificial intelligence would not have happened without the availability of massive amounts of data, whether its smart speakers in the home or personalized book recommendations. And the spread of AI into new areas of the economy, such as AI-driven marketing and self driving vehicles, has been driving the collection of ever more data. These large databases are amassing a wide variety of information, some of it sensitive and personally identifiable. All that data in one place makes such databases tempting targets, ratcheting up the risk of privacy breaches.

The general public is largely wary of AI’s data-hungry ways. According to a survey by Brookings, 49% of people think AI will reduce privacy. Only 12 percent think it will have no effect, and a mere 5% think it may make it better.

As cybersecurity and privacy researchers, we believe that the relationship between AI and data privacy is more nuanced. The spread of AI raises a number of privacy concerns, most of which people may not even be aware. But in a twist, AI can also help mitigate many of these privacy problems.

Revealing Models
Privacy risks from AI stem not just from the mass collection of personal data, but from the deep neural network models that power most of today’s artificial intelligence. Data isn’t vulnerable just from database breaches, but from “leaks” in the models that reveal the data on which they were trained.

Deep neural networks – which are a collection of algorithms designed to spot patterns in data – consist of many layers. In those layers are a large number of nodes called neurons, and neurons from adjacent layers are interconnected. Each node, as well as the links between them, encode certain bits of information. These bits of information are created when a special process scans large amounts of data to train the model.

For example, a facial recognition algorithm may be trained on a series of selfies so it can more accurately predict a person’s gender. Such models are very accurate, but they also may store too much information – actually remembering certain faces from the training data. In fact, that’s exactly what researchers at Cornell University discovered. Attackers could identify people in training data by probing the deep neural networks that classified the gender of facial images.