British researchers marry lip-reading to video analytics

Published 23 January 2007

Government hopes that software will enable them to solve crimes based on conversations gleaned from CCTV; tracking the head and lip remain a challenge, but progress is being made; Asian and African languages present difficulties

Here is a technology for which Gallaudet University will have no use: a researcher at the the University of East Anglia (using British government funds) is developing a CCTV software program that will permit automate lip-reading in a variety of languages. Although there are many non-security related applications — such a camera built into a mobile phone could assist in cleaning up garbled voices — the main purpose at present is to assist police in identifying speech captured only by video. “Lip-reading from surveillance footage, for example, has been used to solve crimes. In some situations it may not be safe or feasible to place a microphone close enough to hear voices, but a long-range camera might still be able to see faces,” said professor Richard Harvey.

Not that it is easy. “We have to track the head accurately over a variety of poses then extract numbers, or features, that describe the lips and then learn what features correspond to what text,” Harvey explained. Once this is accomplished, additional software will have to be developed to watch and recognize the lips’s fast and subtle movements. Two approaches, called ‘active shape model’ and ‘grey-scale sieving’, will be combined to track the changes in the shape of the lips. Nevertheless, some languages are beyond the software’s ken. “There’s no way we could do African click and whistle languages or Chinese and Japanese,” says Harvey. “But we hope to do a selection of European languages and standard modern Arabic.”

-read more in Max Glaskin’s The Engineer report