DEEPFAKESBeyond Watermarks: Content Integrity Through Tiered Defense

By Kevin Klyman and Renée DiResta

Published 16 May 2024

Watermarking is often discussed as a solution to the problems posed by AI-generated content. However, watermarking is inadequate without other methods of detecting and sorting out AI-generated content.

The United States, European Union, and China have all taken major steps towards requiring that developers of large AI models watermark their outputs—meaning, requiring that the outputs include invisible signatures which indicate that they are AI-generated. The momentum for watermarking requirements has grown in large part due to concerns about information integrity and disinformation campaigns in an election year: sixty four countries are holding elections in 2024, even as digital platforms have cut funding for content moderation and reduced the headcount of their integrity teams. In response, companies like OpenAI, Meta, and Google have committed to label AI-generated images and contribute to common standards. But the uncomfortable reality is that watermarking is not a solution to content provenance— government-mandated watermarking will not be the thing that prevents AI-generated deepfakes from having a significant effect on elections this year.

There are three gaps that make watermarking an inadequate remedy for addressing AI-generated content intended to manipulate audiences. First, watermarking methods that are embedded into models can usually be removed by downstream developers. Second, bad actors can create fake watermarks and trick many watermark detectors. Third, some open-source models will continue to lack watermarks even after the adoption of watermarking requirements.

Watermarking is an appealing solution to policymakers seeking to prevent safeguard democratic elections and restore online trust. It sounds like a quick fix: AI content producing tools will simply indicate that their responses are AI generated, and social media companies will be able to surface this fact within the social media feeds where users are likely to encounter the content. Unfortunately, things are not quite so simple: building up public trust in watermarks as the determinant of “real” versus “fake” content will likely spur the adversarial creation of fake watermarks (so the real will be contested), even as there will be an ongoing proliferation of unwatermarked AI-generated content that comes from open source models (suggesting that the fake is real). A better approach is to regulate the harms from AI models, such as non-consensual intimate imagery, and adopt a tiered defense against inauthentic content that does not overly rely on promoting trust primarily through technological means.