We know where you live

They were also recruited in Oxford, to eliminate biasing that might result from familiarity with Boston geography. Similarly, they had no information about the content of the tweets.

The data were presented in three different forms. One was a static Google map, in which tweet locations were marked with virtual pins; one was an animated version of the same map, in which the pins appeared on-screen in chronological order; and the third — the resolutely low-tech version — was a table listing geographical coordinates, street names, and times of day.

The maps featured only street names, with no names of businesses, parks, schools, or other landmarks. Pins and table rows were, however, color coded to indicate general time of day — morning, afternoon, or evening.

The researchers also varied the volume of data that the participants were asked to consider: one day’s, three days’, or five days’ worth. To avoid biasing, there was no overlap between data sets of different sizes.

Bottom line
Predictably, participants fared better with map-based representations, correctly identifying Twitter users’ homes roughly 65 percent of the time and their workplaces at closer to 70 percent. Even the tabular representation was informative, however, with accuracy rates of just under 50 percent for homes and a surprisingly high 70 percent for workplaces.

In general, participants also fared better with five days’ worth of data than with three or one. Across all three representations, participants with five days’ worth of data could correctly identify workplaces, for example, with more than 85 percent accuracy.

Interestingly, the participants’ performance with three days’ worth of data was generally worse than it was with only one. It could be that, while a single day’s data is likely to be representative of a user’s typical patterns of movement, three days’ worth introduces the possibility of confounding variations, which are ironed out over five days.

“We want to investigate that,” Liccardi says. “When we asked participants ‘Which amount of data do you prefer?’ most of them said ‘medium,’ even though it was the one that they got the least right. So you never know about perceptions.”

“Ilaria’s new paper puts two significant bricks in the wall of our privacy understanding,” says Latanya Sweeney, professor of government and technology in residence at Harvard University and a former chief technology officer of the U.S. Federal Trade Commission. “First, her survey shows how people can learn sensitive information from seemingly innocuous facts, and, second, people will easily share information they believe is innocuous.”

Reprinted with permission of MIT News