PrivacyHow to Protect Privacy When Aggregating Location Data to Fight COVID-19

By Jacob Hoffman-Andrews and Andrew Crocker

Published 7 April 2020

As governments, the private sector, NGOs, and others mobilize to fight the COVID-19 pandemic, we’ve seen calls to use location information—typically drawn from GPS and cell tower data—to inform public health efforts. Compared to using individualized location data for contact tracing—as many governments around the world are already doing—deriving public health insights from aggregated location data poses far fewer privacy and other civil liberties risks such as restrictions on freedom of expression and association. However, even “aggregated” location data comes with potential pitfalls.

As governments, the private sector, NGOs, and others mobilize to fight the COVID-19 pandemic, we’ve seen calls to use location information—typically drawn from GPS and cell tower data—to inform public health efforts. Among the proposed uses of location data, one of the most widely discussed is analyzing aggregated data about which locations people are visiting, whether they are traveling less, and other collective measurements of individuals’ movement. This analysis might be used to inform judgments about the effectiveness of shelter-in-place orders and other social distancing measures. Projects making use of aggregated location data have graded residents of each state on their social distancing and visualized the travel patterns of people on returning from spring break. Most recently, Google announced that it would publish ongoing “COVID-19 Community Mobility Reports,” which draw on the company’s store of location data to report on changes at a community level in people’s travel to various locations such as grocery stores, parks, and mass transit stations.

Compared to using individualized location data for contact tracing—as many governments around the world are already doing—deriving public health insights from aggregated location data poses far fewer privacy and other civil liberties risks such as restrictions on freedom of expression and association. However, even “aggregated” location data comes with potential pitfalls. This post discusses those pitfalls and describes some high-level best practices for those who seek to use aggregated location data in the fight against COVID-19.

What Does “Aggregated” Mean?

At the most basic level, there’s a difference between “aggregated” location data and “anonymized” or “deidentified” location data. Practically speaking, there is no way to deidentify individual location data. Information about where a person is and has been itself is usually enough to reidentify them. Someone who travels frequently between a given office building and a single family home is probably unique in those habits and therefore identifiable from other readily identifiable sources. One widely cited study from 2013 even found that researchers could uniquely characterize 50% of people using only two randomly chosen time and location data points.