MetaPhone: The sensitivity of telephone metadata

Methodology
We began by identifying the MetaPhone participants’ contacts. We used the same approach as in our prior work on number identifiability, matching phone numbers against the public Yelp and Google Places directories. In total, our 546 participants contacted 33,688 unique numbers. 6,107 of those numbers (18 percent) resolved to an identity.

Next, we labeled the contacts that appeared related to a sensitive activity or trait. In most instances, an organization’s line of business was apparent from its name. Where there was ambiguity, we used simple Google queries to learn more.

We present two sets of results. First, we provide an analysis of individual calls to sensitive numbers. Second, we relate several patterns of calls to emphasize the detail available in telephone metadata.

Individual Call Results
Many organizations have a narrow purpose, such that an individual call gives rise to sensitive inferences. If a person reaches out to a political campaign, for example, it seems highly probable that the person supports the candidate. Similarly, if a person speaks at length with a religious institution, it appears likely that the person is of that faith. A further inference could also be made, that the person worships at that particular institution.

We found numerous calls within our dataset that give rise to these sorts of straightforward inferences. The following table presents the proportion of participants who had at least one call with each category of sensitive organization.

Category

Participants with ≥ 1 Calls

Health Services

57%

Financial Services

40%

Pharmacies

30%

Veterinary Services

18%

Legal Services

10%

Recruiting and Job Placement

10%

Religious Organizations

8%

Firearm Sales and Repair

7%

Political Officeholders and Campaigns

4%

Adult Establishments

2%

Marijuana Dispensaries

0.4%

The case of religious organizations gave us an opportunity to check the precision of our inferences. Since the MetaPhone app collects a user’s religion from his or her Facebook profile, we could compare phone metadata inferences against ground truth. There were 15 participants with both a well-defined religious status on Facebook (including atheism) and phone contact with a religious organization. Using just the naïve assumption that a person’s most-called religion is their own religion, we accurately identified the religious status of 11 of the 15 (73 percent).

Many numbers were associated with specialized products or services, particularly within professional fields. In medicine, for example, we were able to easily categorize phone numbers by specialty practice area.

Category

Participants with ≥ 1 Calls

Dentistry and Oral Health

18%

Mental Health and Family Services

8%

Ophthalmology and Optometry

6%

Sexual and Reproductive Health

6%

Pediatrics

5%

Orthopedics

4%

Chiropractic Care

3%

Rehabilitation and Physical Therapy

3%

Medical Laboratories

2%

Emergency or Urgent Care

2%

Cardiology

2%

Dermatology

1%

Ear, Nose, and Throat

1%

Neurology

1%

Oncology

1%

Substance Abuse

1%

Cosmetic Surgery

1%

The degree of sensitivity among contacts took us aback. Participants had calls with Alcoholics Anonymous, gun stores, NARAL Pro-Choice, labor unions, divorce lawyers, sexually transmitted disease clinics, a Canadian import pharmacy, strip clubs, and much more. This was not a hypothetical parade of horribles. These were simple inferences, about real phone users, that could trivially be made on a large scale.

Pattern Results
A pattern of calls will often, of course, reveal more than individual call records. During our analysis, we encountered a number of patterns that were highly indicative of sensitive activities or traits. The following examples are drawn directly from our dataset, using number identification through public resources. Though most MetaPhone participants consented to having their identity disclosed, we use pseudonyms in this report to protect participant privacy.

  • Participant A communicated with multiple local neurology groups, a specialty pharmacy, a rare condition management service, and a hotline for a pharmaceutical used solely to treat relapsing multiple sclerosis.
  • Participant B spoke at length with cardiologists at a major medical center, talked briefly with a medical laboratory, received calls from a pharmacy, and placed short calls to a home reporting hotline for a medical device used to monitor cardiac arrhythmia.
  • Participant C made a number of calls to a firearm store that specializes in the AR semiautomatic rifle platform. They also spoke at length with customer service for a firearm manufacturer that produces an AR line.
  • In a span of three weeks, Participant D contacted a home improvement store, locksmiths, a hydroponics dealer, and a head shop.
  • Participant E had a long, early morning call with her sister. Two days later, she placed a series of calls to the local Planned Parenthood location. She placed brief additional calls two weeks later, and made a final call a month after.

We were able to corroborate Participant B’s medical condition and Participant C’s firearm ownership using public information sources. Owing to the sensitivity of these matters, we elected to not contact Participants A, D, or E for confirmation.

Conclusion
The dataset that we analyzed in this report spanned hundreds of users over several months. Phone records held by the NSA and telecoms span millions of Americans over multiple years. Reasonable minds can disagree about the policy and legal constraints that should be imposed on those databases. The science, however, is clear: phone metadata is highly sensitive.

Jonathan Mayer is a doctoral student in computer science and a cybersecurity fellow at the Center for International Security and Cooperation at the Freeman Spogli Institute for International Studies. Patrick Mutchler is a doctoral student in computer science at Stanford. This story is published courtesy of Web Policy (under Creative Commons Attribution 3.0 Unported License).