PrivacyNew tool reveals which online personal data is being used by advertisers

Published 19 August 2014

The Web can be an opaque black box: it leverages our personal information without our knowledge or control. When, for instance, a user sees an ad about depression online, she may not realize that she is seeing it because she recently sent an e-mail about being sad. A new tool reveals which data in a Web account, such as e-mails, searches, or viewed products, are being used to target which outputs, such as ads, recommended products, or prices.

The Web can be an opaque black box: it leverages our personal information without our knowledge or control. When, for instance, a user sees an ad about depression online, she may not realize that she is seeing it because she recently sent an e-mail about being sad. Roxana Geambasu and Augustin Chaintreau, both assistant professors of computer science at Columbia Engineering, are seeking to change that, and in doing so bring more transparency to the Web.

Along with their PhD student, Mathias Lecuyer, the researchers have developed XRay, a new tool that reveals which data in a Web account, such as e-mails, searches, or viewed products, are being used to target which outputs, such as ads, recommended products, or prices. A Columbia University release reports that they will be presenting the prototype, which is designed to make the online use of personal data more transparent, at USENIX Security on 20 August. The researchers have posted the open source system, as well as their findings, online for other researchers interested in studying how Web services use personal data to leverage and extend.

Today we have a problem: the Web is not transparent. We see XRay as an important first step in exposing how Web sites are using your personal data,” says Geambasu, who is also a member of Columbia’s Institute for Data Sciences and Engineering’s Cybersecurity Center.

We live in a “big data” world, where staggering amounts of personal data — our locations, search histories, e-mails, posts, photos, and more — are constantly being collected and analyzed by Google, Amazon, Facebook, and many other Web services. While harnessing big data can certainly improve our daily lives (Amazon offerings, Netflix suggestions, emergency response Tweets, etc.), these beneficial uses have also generated a big data frenzy, with Web services aggressively pursuing new ways to acquire and commercialize the information.

It’s critical, now more than ever, to reconcile our privacy needs with the exponential progress in leveraging this big data,” says Chaintreau, a member of the Institute for Data Sciences and Engineering’s New Media Center. Geambasu adds, “If we leave it unchecked, big data’s exciting potential could become a breeding ground for data abuses, privacy vulnerabilities, and unfair or deceptive business practices.”