Voice recognitionAutomated voice imitation can defeat voice-recognition security

Published 30 September 2015

Voice biometrics is based on the assumption that each person has a unique voice that depends not only on his or her physiological features of vocal cords but also on his or her entire body shape, and on the way sound is formed and articulated. Researchers have found that automated and human verification for voice-based user authentication systems are vulnerable to voice impersonation attacks. Using an off-the-shelf voice-morphing tool, the researchers developed a voice impersonation attack to attempt to penetrate automated and human verification systems.

University of Alabama at Birmingham researchers have found that automated and human verification for voice-based user authentication systems are vulnerable to voice impersonation attacks. This new research is being presented at the European Symposium on Research in Computer Security, or ESORICS, in Vienna, Austria.

Using an off-the-shelf voice-morphing tool, the researchers developed a voice impersonation attack to attempt to penetrate automated and human verification systems.

A person’s voice is an integral party of daily life. It enables people to communicate in physical proximity, as well as in remote locations using phones or radios, or over the Internet using digital media.

“Because people rely on the use of their voices all the time, it becomes a comfortable practice,” said Nitesh Saxena, Ph.D., the director of the Security and Privacy In Emerging computing and networking Systems (SPIES) lab and associate professor of computer and information sciences at UAB. “What they may not realize is that level of comfort lends itself to making the voice a vulnerable commodity. People often leave traces of their voices in many different scenarios. They may talk out loud while socializing in restaurants, giving public presentations or making phone calls, or leave voice samples online.”

A person with potentially malicious intentions can record a person’s voice by being in physical proximity of the speaker, by making a spam call, by searching and mining for audiovisual clips online or even by compromising servers in the cloud that store audio information.

UAB notes that this study from researchers within the UAB College of Arts and SciencesDepartment of Computer and Information Sciences and Center for Information Assurance and Joint Forensics Research explores how an attacker in possession of audio samples from a victim’s voice could compromise the victim’s security, safety, and privacy.

Advances in technology, specifically those that automate speech synthesis such as voice morphing, allow an attacker to build a very close model of a victim’s voice from a limited number of samples. Voice morphing can be used to transform the attacker’s voice to speak any arbitrary message in the victim’s voice.