PrivacyFacebook’s Likes expose intimate details, personality traits of millions

Published 13 March 2013

Research shows that intimate personal attributes can be predicted with high levels of accuracy from “traces” left by seemingly innocuous digital behavior, in this case Facebook Likes. Study raises important questions about personalized marketing and online privacy.

New research, published today in the journal PNAS, shows that surprisingly accurate estimates of Facebook users’ race, age, IQ, sexuality, personality, substance use, and political views can be inferred from automated analysis of only their Facebook Likes — information currently publicly available by default.

In the study, researchers describe Facebook Likes as a “generic class” of digital record — similar to Web search queries and browsing histories — and suggest that such techniques could be used to extract sensitive information for almost anyone regularly online.

A University of Cambridge release reports that Researchers at Cambridge’s Psychometrics Center, in collaboration with Microsoft Research Cambridge, analyzed a dataset of over 58,000 U.S. Facebook users, who volunteered their Likes, demographic profiles, and psychometric testing results through the myPersonality application.

Users opted in to provide data and gave consent to have profile information recorded for analysis. Facebook Likes were fed into algorithms and corroborated with information from profiles and personality tests.

Researchers created statistical models able to predict personal details using Facebook Likes alone. Models proved 88 percent accurate for determining male sexuality, 95 percent accurate distinguishing African-American from Caucasian American and 85 percent accurate differentiating Republican from Democrat. Christians and Muslims were correctly classified in 82 percent of cases, and good prediction accuracy was achieved for relationship status and substance abuse — between 65 percent and 73 percent.

Few users, however, clicked Likes explicitly revealing these attributes. For example, less that 5 percent of gay users clicked obvious Likes such as Gay Marriage. Accurate predictions relied on “inference” —  aggregating huge amounts of less informative but more popular Likes such as music and TV shows to produce incisive personal profiles.

Even seemingly opaque personal details such as whether users’ parents separated before the user reached the age of 21 percent were accurate to 60 percent, enough to make the information “worthwhile for advertisers,” suggest the researchers.

While they highlight the potential for personalized marketing to improve online services using predictive models, the researchers also warn of the threats posed to users’ privacy. They argue that many online consumers might feel such levels of digital exposure exceed acceptable limits — as corporations, governments, and even individuals could use predictive software to accurately infer highly sensitive information from Facebook Likes and other digital “traces.”

The researchers also tested for personality traits including intelligence, emotional stability, openness, and extraversion. While such latent traits are far more difficult to gauge, the accuracy