Real or Fake Text? We Can Learn to Spot the Difference

In standard methods, participants are asked to indicate in a yes-or-no fashion whether a machine has produced a given text. This task involves simply classifying a text as real or fake and responses are scored as correct or incorrect.

The Penn model significantly refines the standard detection study into an effective training task by showing examples that all begin as human-written. Each example then transitions into generated text, asking participants to mark where they believe this transition begins. Trainees identify and describe the features of the text that indicate error and receive a score.

The study results show that participants scored significantly better than random chance, providing evidence that AI-created text is, to some extent, detectable.

“Our method not only gamifies the task, making it more engaging, it also provides a more realistic context for training,” says Dugan. “Generated texts, like those produced by ChatGPT, begin with human-provided prompts.”

The study speaks not only to artificial intelligence today, but also outlines a reassuring, even exciting, future for our relationship to this technology.

“Five years ago,” says Dugan, “models couldn’t stay on topic or produce a fluent sentence. Now, they rarely make a grammar mistake. Our study identifies the kind of errors that characterize AI chatbots, but it’s important to keep in mind that these errors have evolved and will continue to evolve. The shift to be concerned about is not that AI-written text is undetectable. It’s that people will need to continue training themselves to recognize the difference and work with detection software as a supplement.”

“People are anxious about AI for valid reasons,” says Callison-Burch. “Our study gives points of evidence to allay these anxieties. Once we can harness our optimism about AI text generators, we will be able to devote attention to these tools’ capacity for helping us write more imaginative, more interesting texts.”

Ippolito, the Penn study’s co-leader and current Research Scientist at Google, complements Dugan’s focus on detection with her work’s emphasis on exploring the most effective use cases for these tools. She contributed, for example, to Wordcraft, an AI creative writing tool developed in tandem with published writers. None of the writers or researchers found that AI was a compelling replacement for a fiction writer, but they did find significant value in its ability to support the creative process.

“My feeling at the moment is that these technologies are best suited for creative writing,” says Callison-Burch. “News stories, term papers, or legal advice are bad use cases because there’s no guarantee of factuality.”

“There are exciting positive directions that you can push this technology in,” says Dugan. “People are fixated on the worrisome examples, like plagiarism and fake news, but we know now that we can be training ourselves to be better readers and writers.”

Learn to spot generated text and contribute to this ongoing research by playing Real or Fake Text here!

Devorah Fischler is Senior Science Writer at Penn Engineering. The article was originally posted to the website Penn Engineering Today.