DeepfakesDetecting “Deepfake” Videos by Checking for the Pulse

With video editing software becoming increasingly sophisticated, it’s sometimes difficult to believe our own eyes. Did that actor really appear in that movie? Did that politician really say that offensive thing?

Some so-called “deepfakes” are harmless fun, but others are made with a more sinister purpose. But how do we know when a video has been manipulated?

Researchers from Binghamton University’s Thomas J. Watson College of Engineering and Applied Science have teamed up with Intel Corp. to develop a tool called FakeCatcher, which boasts an accuracy rate above 90 percent.

FakeCatcher works by analyzing the subtle differences in skin color caused by the human heartbeat. Photoplethysmography (abbreviated as PPG) is the same technique used for a pulse oximeter put on the tip of your finger at a doctor’s office, as well as Apple Watches and wearable fitness tracking devices that measure your heartbeat during exercise.

“We extract several PPG signals from different parts of the face and look at the spatial and temporal consistency of those signals,” said Ilke Demir, a senior research scientist at Intel. “In deepfakes, there is no consistency for heartbeats and there is no pulse information. For real videos, the blood flow in someone’s left cheek and right cheek — to oversimplify it — agree that they have the same pulse.”

Working with Demir on the project is Umur A. Ciftci, a PhD student at Watson College’s Department of Computer Science, under Professor Lijun Yin’s supervision at the Graphics and Image Computing Laboratory, part of the Seymour Kunis Media Core funded by donor Gary Kunis ‘73, LHD ‘02 It builds on Yin’s 15 years of work creating multiple 3D databases of human faces and emotional expressions. Hollywood filmmakers, video game creators and others have utilized the databases for their creative projects.

At Yin’s lab in the Innovative Technologies Complex, Ciftci has helped to build what may be the most advanced physiological capture setup setup in the United States, with its 18 cameras as well as in infrared. A device also is strapped around a subject’s chest that monitors breathing and heartrate. So much data is acquired in a 30-minute session that it requires 12 hours of computer processing to render it.

“Umur has done a lot of physiology data analysis, and signal processing research started with our first multimodal database,” Yin said. “We capture data not just with 2D and 3D visible images but also thermal cameras and physiology sensors. The idea of using