Deepfake myths: Common misconceptions about synthetic media

Myth 1: “Deepfakes” currently allow people to easily create fake videos of anyone doing anything
This is not quite accurate. Current technology enables very specific types of operations on video, and only if you have particular kinds of data. One technique is face swapping, in which a target’s face is used to replace a face in an existing video. Of course, this requires a video of someone else doing whatever one wants the target to do, and it currently leads to unrealistic results if the body type and hair of the source and target don’t match. This was the process behind the first widely-known instance of deepfakes, in which Reddit users swapped the faces of celebrities onto the bodies of pornographic actors.

Another operation is called facial reenactment. This means using an actor or existing footage to make facial expressions and having those expressions applied to the video of your unwitting target. Up until fairly recently, most implementations of facial reenactment needed significant footage of the person being targeted, and they still generally require a static background and many trial runs to get decent results. Facial reenactment for the whole body and similar operations for audio are also possible. As with video, the best audio results come from professional readers for which there are many hours of recordings with no background noise. New techniques can also be used to realistically add objects to a video scene, change weather conditions, and even generate fairly realistic text.

This field is rapidly evolving, and these limitations are quickly becoming out of date. But it’s crucial to understand that none of these operations allow you to create a specific video from whole cloth. There is no way (short of a Hollywood style studio) to create a video of Nancy Pelosi or Mitch McConnell doing backflips on an elephant without building on existing, potentially traceable footage. This will likely be true for at least the next few years.

That said, there is still significant harm that can come from operations that are possible with current technology. And as more limitations are overcome, producing a very convincing synthesis with no more than a cell phone will become possible. We need to be ready.

Myth 2: Image editing like Photoshop didn’t cause any harm, so synthetic media won’t either
Even Adobe Photoshop—and the democratization of image editing more generally—has had significant negative impacts in the hands of malicious actors. In 2017, a photo of American football player Michael Bennet was edited to appear as if he were burning the American flag after he kneeled during the national anthem, and fact checkers around the world spend much of their time addressing such simply edited images. Manipulated static images have been a significant boon to misinformation and hate purveyors around the world.

Synthetic media is very different in potential scale, scope, and psychological impact. Video and audio are often more persuasive and have a bigger impact on memory and emotion. Worse, synthetic media tools could be far easier to use than Photoshop as the technology becomes more accessible. This doesn’t mean that Photoshop should be outlawed, but it’s critical to understand that even the cheap fakes, created by simple image editing tools, can have significant societal impact. The best defenses against deepfakes are also defenses against cheap fakes, such as ensuring that platforms like Facebook and YouTube avoid rewarding any form of “outrage bait” fakery with attention and revenue.

Myth 3: The most significant harm of synthetic media is that people will be tricked by fakes
There are many potential impacts of synthetic media, both good and bad, and the direct impacts of fakery are only one side of the coin. Perhaps even more worrying is that people are becoming less willing to believe real media. If any video might be the result of manipulation, there is nothing to stop a politician from disavowing a legitimate, but damaging video, for example.

More generally, synthetic media is a challenge to our epistemic capacity—our ability to make sense of the world and make competent decisions. Especially concerning is the growth of reality apathy—where people give up on determining real from fake—and reality sharding—where people selectively choose what to believe, forming deeper and deeper like-minded clusters. These are much broader societal issues, and they could be supercharged by a growing ability to manipulate audio and video.

Just like image editing technology, synthetic media technology holds immense promise, from helping us train safe autonomous cars to bringing history to life for a new generation of students. But there are a number of crucial next steps we can take to minimize the negative impacts and maximize the positive ones.

We need to build a solid foundation if we want to preserve the epistemic capacity needed to run our democracy.

Aviv Ovadya is a non-resident fellow at GMF’s Alliance for Securing Democracy. The article, originally posted to the website of the German Marshall Fund of the United States, is published here courtesy of the GMFUS.