SurveillanceTelephony metadata: Matching numbers to names

Published 30 December 2013

Explaining why American should not be worried about the NSA collection of telephony metadata, President Obama, in a PBS interview, said: “You have my telephone number connecting with your telephone number…. [T]here are no names … in that database.” Two Stanford graduate students set out to discover just how much effort it would take to identify the names of phone number owners. Their answer: a trivial amount of effort. themselves the task out to find out. Querying the Yelp, Google Places, and Facebook directories, and running their sample numbers with Intelius, a cheap consumer-oriented service, they matched 91 percent of the sample numbers with the number owners. “If a few academic researchers can get this far this quickly, it’s difficult to believe the NSA would have any trouble identifying the overwhelming majority of American phone numbers,” they write.

Explaining why American should not be worried about the NSA collection of telephony metadata, President Obama, in a PBS interview, said: “You have my telephone number connecting with your telephone number…. [T]here are no names … in that database.”

Stanford grad student Jonathan Mayer writes (with coauthor Patrick Mutchler) on his Web Policy blog that versions of this argument have appeared frequently in debates over the NSA’s domestic phone metadata program. The factual premise is that the NSA only compels disclosure of numbers, not names. One might conclude, then, that there is not much cause for privacy concern.

This line of reasoning has drawn sharp criticism. In a declaration for the ACLU, Ed Felten noted:

 …Although officials have insisted that the orders issued under the telephony metadata program do not compel the production of customers’ names, it would be trivial for the government to correlate many telephone numbers with subscriber names using publicly available sources. The government also has available to it a number of legal tools to compel service providers to produce their customer’s information, including their names.

When Judge Richard Leon granted a preliminary injunction against the program last week, he expressed a similar view:

…The Government maintains that the metadata the NSA collects does not contain personal identifying information associated with each phone number, and in order to get that information the FBI must issue a national security letter (“NSL”) to the phone company… . Of course, NSLs do not require any judicial oversight … meaning they are hardly a check on potential abuses of the metadata collection. There is also nothing stopping the Government from skipping the NSL step altogether and using public databases or any of its other vast resources to match phone numbers with subscribers.

(Senator Dianne Feinstein issued a statement in response, reiterating that “no names” are coerced from the phone companies in bulk.)

So, Mayer and Mutchler ask, just how easy is it to identify a phone number?

Trivial, they found (in earlier posts, yhey reported how automated analysis of call and text activity can reveal private relationships, as well as how phone subscribers are closely interconnected).

They randomly sampled 5,000 numbers from our crowdsourced MetaPhone dataset and queried the Yelp, Google Places, and Facebook directories. With little marginal effort and just those three sources — all free and public — they matched 1,356 (27.1 percent) of the numbers. Specifically, there were 378 hits (7.6 percent) on Yelp, 684 (13.7 percent) on Google Places, and 618 (12.3 percent) on Facebook.

What about if an organization were willing to put in some manpower? To conservatively approximate human analysis, they randomly sampled 100 numbers from their dataset, and then ran Google searches on each.

In under an hour, they were able to associate an individual or a business with 60 of the 100 numbers.

When they added in their three initial sources, they were up to 73.

How about if money were no object? They write that they do not have the budget or credentials to access a premium data aggregator, so they ran their 100 numbers with Intelius, a cheap consumer-oriented service. The matched seventy-four of the numbers (they note that the results they obtained from Intelius were seemingly spottier than from Yelp, Google Places, and Facebook).

Between Intelius, Google search, and their three initial sources, they associated a name with ninety-one of the 100 numbers.

“If a few academic researchers can get this far this quickly, it’s difficult to believe the NSA would have any trouble identifying the overwhelming majority of American phone numbers,” Mayer and Mutchler write.