Voice recognition capabilities at the FBI -- from the 1960s to the present
a long sampling period ranging from tens of seconds to a few minutes to create a statistical meaningful speaker model. Voice is more time intensive in contrast to other biometric modalities like fingerprint, iris, face, DNA, SMT, etc.
Challenge #2 is the fragility of human speech, which is susceptible to different recording environment and equipment used for capture. Challenge #3 is the susceptibility of human speech which is affected by different state of emotions or different speaking styles. The researchers in the speaker recognition community are well aware of these challenges, and have resolved some of them.
In contrast to these challenges, voice recognition has its undeniable strength as well. Automatic speaker recognition is a highly scientific forensic process, using solid mathematical and statistical foundations. In that sense I do not foresee any serious issue in fusing voice with other modalities to create a multi-modal biometrics database. The FBI’s BCOE at the Criminal Justice and Information Service Division has sponsored such projects to collect multimodal biometric data including face, iris, face, fingerprint, and voice.
Archer: What plans are in place for capturing voice recordings in line with the FBI’s Next Generation Identification (NGI project)? What are the short and long term aims?
Nakasone: The U.S. government inter-agency collaboration began to consider an implementation of a voice data collection for voice biometric application about three years ago, in March of 2009, by establishing the Symposium for Investigatory Voice Biometrics (SIVB). The SIVB activity over the past three years culminated in drafting of the ANSI/NIST ITL Type-11 Voice Record which is meant to enable the interoperability of voice records among laboratories, field offices, and government agencies for investigative and intelligence purposes. Type-11 still needs to go through the open vetting process by the ANSI/NIST Standards office before it becomes an A/N standard. It is anticipated that Type-11 is ratified in six months to a year time frame. The short-term aim and the long-term aim next are based on the presumption of successful completion of Type-11.
As short-term aims we will (1) complete Type-11 Voice Record; (2) develop best practices for voice data transactions using Type-11, other ANSI/NIST Types, and Electronic Biometric Transmission Specification (EBTS); (3) build community consensus; (4) implement an interoperability concept of operation within a scaled-down pilot study; and (5) establish a new Scientific Working Group for Voice to guide engineers and scientists for creation of the consensus standards for voice biometrics for the intelligence and law enforcement communities. I envision this will take three to five years before we can see the fruits.
As long-term aims we will plan to (1) design and study for voice collection from the individuals at booking stations, during criminal interviews, prisons, etc.; (2) maintain those voice databases in a centralized reservoir (FBI’s NGI) for future search purpose.
Archer: Speak to how you foresee voice biometrics being used within the military and commercially in the next decade? Do you think voice will be more or less popular than other biometric indicators?
Nakasone: The military application is seen in a variety of voice biometrics initiatives and projects in many U.S. DOD agencies such as Army, Air Force, Navy, etc. for identity management, force protection, or counterterrorism purposes. I foresee the similar growth rate of voice biometric exploitation in military as in non-DoD government agencies. In the commercial domain, I foresee more speech recognition technology applications than voice biometrics technology. I tend to think that voice biometrics will find its best use within the government and military as investigative or intelligence tool, and be less popular in the commercial world where privacy issues of non-criminal, innocent citizens are more often involved. Voice is currently not as popular as other modalities (like fingerprint, DNA, iris and face, but due to increasing interest and other factors like the voice transaction record, and incremental maturity associated with rigorous voice biometric research, the gap with other modalities is expected to narrow.
Hirotaka Nakasone, a senior scientist in the FBI’s Voice Recognition Program; Chris Archer, the online content editor at IDGA (the Institute for Defense &Government Advancement)