MIT researchers develop powerful object recognition system

Published 14 October 2009

The new object recognition system could allow computers in the future automatically to search through hours of video footage for a particular two-minute scene; intelligence analysts should be happy

MIT researchers have developed a computer that can recognize objects using a newly developed algorithmic technique. Their object recognition system could allow computers in the future automatically to search through hours of video footage for a particular two-minute scene.

The researchers also envision smart-phone applications where a tourist walking down a street in a strange city could take a photo of an unmarked monument and immediately find out what it was.

It could also improve Internet searches of images. For example, if a search engine was told to look for “Shakespeare,” it would pull up pictures of Shakespeare and not pictures of Gwyneth Paltrow in the movie Shakespeare in Love.

Typically, object recognition algorithms need to be trained using digital images of objects that have been outlined and labeled. By looking at a million pictures of cars labeled “car,” an algorithm can learn to recognize features shared by images of cars. The problem is that for every new class of objects (trees, buildings, telephone poles) the algorithm has to be trained all over again.

The object recognition system developed by the MIT researchers does not require any training. Yet, it is claimed it still identifies objects with 50 per cent greater accuracy than any previous algorithm.

The system uses a modified version of what is known as a motion estimation algorithm, a type of algorithm common in video processing.

Since consecutive frames of video usually change very little, data compression schemes often store the unchanging aspects of a scene once, updating only the positions of moving objects.

The motion estimation algorithm determines which objects have moved from one frame to the next. In a video, that is a relatively easy task because most objects don’t move very far in one-30th of a second.

The algorithm also does not need to know what the object is. It just has to recognize, say, corners and edges, and how their appearance typically changes under different perspectives.

The MIT researchers’ new system essentially treats unrelated images as if they were consecutive frames in a video sequence. When the modified motion estimation algorithm tries to determine which objects have moved between one image and the next, it usually picks out objects of the same type. It will guess, for instance, that the 2006 Infiniti in image two is the same object as the 1965 Chevy in image one.

If the first image comes from the type of database used to train computer vision systems, the Infiniti will already be labeled car. The new system will transfer the label to the Chevy.

The greater the resemblance of the labeled and unlabeled images, the better the algorithm works.

The MIT team has developed a demonstration web-based system called LabelMe that lets online volunteers tag objects in digital images. They also created a website called 80 Million Tiny Images that sorts the images according to subject matter.

It is claimed that when confronted with an unlabeled image, the new object recognition algorithm will find something similar in the computer database. As the database grows larger, that likelihood has the potential to increase.