SurveillanceSmart camera to describe what it sees -- and reason about what it cannot see

Published 30 October 2012

Army scouts are commonly tasked with covertly entering uncontrolled areas, setting up a temporary observation post, and then performing persistent surveillance for twenty-four hours or longer; what if instead of sending scouts on high-risk missions the military could deploy taskable smart cameras? A truly “smart” camera would be able to describe with words everything it sees and reason about what it cannot see

Army scouts are commonly tasked with covertly entering uncontrolled areas, setting up a temporary observation post, and then performing persistent surveillance for twenty-four hours or longer. What if instead of sending scouts on high-risk missions the military could deploy taskable smart cameras? 

A truly “smart” camera would be able to describe with words everything it sees and reason about what it cannot see.  These devices could be instructed to report only on activities of interest, which would increase the relevancy of incoming data to users.  Thus, smart cameras could permit a single scout to monitor multiple observation posts from a safe location.

The enabling technology for such a smart camera is machine-based visual intelligence.  A DARPA release reports that its Mind’s Eye program seeks to develop the capability for visual intelligence by automating the ability to learn generally applicable and generative representations of action between objects in a scene directly from visual inputs, and then reason over those learned representations. 

A key distinction between this research and the state of the art in machine vision is that the latter has made continual progress in recognizing a wide range of objects and their properties — what might be thought of as the nouns in the description of a scene.  The focus of Mind’s Eye is to add the perceptual and cognitive underpinnings for recognizing and reasoning about the verbs in those scenes, enabling a more complete narrative of action in the visual experience.

DARPA says that in the first eighteen months of the program, Mind’s Eye demonstrated fundamentally new capabilities in visual intelligence, including the ability of automated systems to recognize actions they had never seen, describe observed events using simple text messages, and flag anomalous behaviors.  

Additional work involves improving precision and accuracy, filling temporal gaps (answering “What just happened?” and “What might happen next?”), and answering questions about events in a scene. 

The program also seeks to lower the computational requirements of visual intelligence to address operational use constraints, such as power requirements for unmanned ground vehicles.

Mind’s Eye is sponsored by DARPA’s Information Innovation Office.

— Read more in Alessandro Oltramari and  Christian Lebiere, “Using Ontologies in a Cognitive-Grounded System: Automatic Action Recognition in Video Surveillance