Big dataExtracting useful insights from a flood of data is hard to do

Published 7 June 2017

A mantra of these data-rife times is that within the vast and growing volumes of diverse data types, such as sensor feeds, economic indicators, and scientific and environmental measurements, are dots of significance that can tell important stories, if only those dots could be identified and connected in authentically meaningful ways. Getting good at that exercise of data synthesis and interpretation ought to open new, quicker routes to identifying threats, tracking disease outbreaks, and otherwise answering questions and solving problems that previously were intractable.

A mantra of these data-rife times is that within the vast and growing volumes of diverse data types, such as sensor feeds, economic indicators, and scientific and environmental measurements, are dots of significance that can tell important stories, if only those dots could be identified and connected in authentically meaningful ways. Getting good at that exercise of data synthesis and interpretation ought to open new, quicker routes to identifying threats, tracking disease outbreaks, and otherwise answering questions and solving problems that previously were intractable.

Now for a reality check. “Today’s hardware is ill-suited to handle such data challenges, and these challenges are only going to get harder as the amount of data continues to grow exponentially,” said Trung Tran, a program manager in DARPA’s Microsystems Technology Office (MTO). To take on that technology shortfall, MTO last summer unveiled its Hierarchical Identify Verify Exploit (HIVE) program, which has now signed on five performers to carry out HIVE’s mandate: to develop a powerful new data-handling and computing platform specialized for analyzing and interpreting huge amounts of data with unprecedented deftness. “It will be a privilege to work with this innovative team of performers to develop a new category of server processors specifically designed to handle the data workloads of today and tomorrow,” said Tran, who is overseeing HIVE.

DARPA says that the quintet of performers includes a mix of large commercial electronics firms, a national laboratory, a university, and a veteran defense-industry company: Intel Corporation (Santa Clara, California), Qualcomm Intelligent Solutions (San Diego, California), Pacific Northwest National Laboratory (Richland, Washington), Georgia Tech (Atlanta, Georgia), and Northrop Grumman (Falls Church, Virginia).

“The HIVE program is an exemplary prototype for how to engage the U.S. commercial industry, leverage their design expertise, and enhance U.S. competitiveness, while also enhancing national security,” said William Chappell, director of MTO. “By forming a team with members in both the commercial and defense sectors, we hope to forge new R&D pathways that can deliver unprecedented levels of hardware specialization. That can be a boost for commercial players but also can advance our military electronics supply to make sure the national defense infrastructure is empowered with the best capabilities in the world.”

Central to HIVE is the creation of a “graph analytics processor,” which incorporates the power of graphical representations of relationships in a network more efficiently than traditional data formats and processing techniques. Examples of these relationships among data elements and categories include person-to-person interactions as well as seemingly disparate links between, say, geography and changes in doctor visit trends or social media and regional strife. In combination with emerging machine learning and other artificial intelligence techniques that can categorize raw data elements, and by updating the elements in the graph as new data becomes available, a powerful graph analytics processor could discern otherwise hidden causal relationships and stories among the data elements in the graph representations.

If HIVE is successful, it could deliver a graph analytics processor that achieves a thousand-fold improvement in processing efficiency over today’s best processors, enabling the real-time identification of strategically important relationships as they unfold in the field rather than relying on after-the-fact analyses in data centers. “This should empower data scientists to make associations previously thought impractical due to the amount of processing required,” said Tran. These could include the ability to spot, for example, early signs of an Ebola outbreak, the first digital missives of a cyberattack, or even the plans to carry out such an attack before it happens.

The words in the program’s name, Hierarchical Identify Verify Exploit, indicate a sequence that begins with the multi-layer graphical representations of data. This opens the way for graph analytic processing to identify relationships between data within and perhaps between the layers. A key next step is the applications of verification filters and tests that can distinguish between relationships that have causal connections from meaningless correlations among the data. “Taken together, these elements should allow us to exploit the enormous amount of data being generated today, to make better decisions about if, when, and how to act in furtherance of the public good and national security,” Tran said.

Read more about the HIVE program in the Broad Agency Announcement: DARPA-BAA-16-52.