How to Protect Privacy When Aggregating Location Data to Fight COVID-19
Aggregation to preserve individual privacy, on the other hand, can potentially be useful. Aggregating location data involves producing counts of behaviors instead of detailed timelines of individual location history. For instance, an aggregation might tell you how many people’s phones reported their location as being in a certain city within the last month. Or it might tell you, for a given area in a city, how many people traveled to that area during each hour in the last month. Whether or not a given scheme for aggregating location data works to improve privacy depends deeply on the details: On what timescale is the data aggregated? How large of an area does each count cover? When is a count considered too low and dropped from the data set?
For example, Facebook uses differential privacy techniques such as injecting statistical noise into the dataset as part of the methodology of its “Data for Good” project. This project aggregates Facebook users’ location data and shares it with various NGOs, academics, and governments engaged in responding to natural disasters and fighting the spread of disease, including COVID-19.
There is no single magic formula for aggregating individual location data such that it provides insights that might be useful for some decisions and yet still cannot be reidentified. Instead, it’s a question of tradeoffs. As a matter of public policy, it is critical that user privacy not be sacrificed when creating aggregated location datasets to inform decisions about COVID-19 or anything else.
How Do We Evaluate the Use of Aggregated Location Data to Fight COVID-19?
Because aggregation reduces the risk of revealing intimate information about individuals’ lives, we are less concerned about this use of location data to fight COVID-19 compared to individualized tracking. Of course, the choice of the aggregation parameters generally needs to be done by domain experts. As in the Facebook and Google examples above, these experts will often be working within private companies with proprietary access to the data. Even if they make all the right choices, the public needs to be able to review these choices because the companies are sharing the public’s data. For the experts doing the aggregation, there’s often pressure to reduce the privacy properties in order to generate an aggregate data set that a particular decision-maker claims must be more granular in order to be meaningful to them. Ideally, companies would also consult outside experts before moving forward with plans to aggregate and share location data. Getting public input on whether a given data-sharing scheme sufficiently preserves privacy can help reduce the bias that such pressure creates.
As a result, companies like Google that produce reports based on aggregated location data from users should release their full methodology as well as information about who these reports are shared with and for what purpose. To the extent they only share certain data with selected “partners,” these groups should agree not to use the data for other purposes or attempt to re-identify individuals whose data is included in the aggregation. And, as Google has already done, companies should pledge to end the use of this data when the need to fight COVID-19 subsides.
For any data sharing plan, consent is critical: Did each person consent to the method of data collection, and did they consent to the use? Consent must be specific, informed, opt-in, and voluntary. Ordinarily, users should have the choice of whether to opt-in to every new use of their data, but we recognize that obtaining consent to aggregate previously acquired location data to fight COVID-19 may be difficult with sufficient speed to address the public health need. That’s why it’s especially important that users should be able to review and delete their data at any time. The same should be true for anyone who truly consents to the collection of this information. Many entities that hold location information, like data brokers that collect location from ads and hidden tracking in apps, can’t meet these consent standards. Yet many of the uses of aggregated location data that we’ve seen in response to COVID-19 draw from these tainted sources. At the very least, data brokers should not profit from public health insights derived from their stores of location data, including through free advertising. Nor should they be allowed to “COVID wash” their business practices: the existence of these data stores is unethical, and should be addressed with new consumer data privacy laws.
Finally, we should remember that location data collected from smartphones has limitations and biases. Smartphone ownership remains a proxy for relative wealth, even in regions like the United States where 80% of adults have a smartphone. People without smartphones tend to already be marginalized, so making public policy based on aggregate location data can wind up disregarding the needs of those who simply don’t show up in the data, and who may need services the most. Even among the people with smartphones, the seeming authoritativeness and comprehensiveness of large scale data can cause leaders to reach erroneous conclusions that overlook the needs of people with fewer resources. For example, data showing that people in one region are traveling more than people in another region might not mean, as first appears, that these people are failing to take social distancing seriously. It might mean, instead, that they live in an underserved area and must thus travel longer distances for essential services like groceries and pharmacies.
In general, our advice to organizations that consider sharing aggregate location data: Get consent from the users who supply the data. Be cautious about the details. Aggregate on the highest level of generality that will be useful. Share your plans with the public before you release the data. And avoid sharing “deidentified” or “anonymized” location data that is not aggregated—it doesn’t work.
Jacob Hoffman-Andrews is senior staff technologist, and Andrew Crocker is senior staff attorney, at the Electronic Frontier Foundation (EFF).This articleis published courtesy of the Electronic Frontier Foundation (EFF)