Addressing the Thin Data Problem in National Security

“A problem is that neural networks trained on data gathered in the presence of a background tend to make predictions that are only valid when the background is the same. The network often chooses ways of identifying the signal that don’t work when the background is different,” said Grimes.

Explainable AI: Sorting Significant from Spurious
Grimes’ solution is to teach a machine learning system to find ways to identify the signals of interest that are not affected by changes in the background. Instead of providing more and more data to learn from every possible scenario, he teaches the network to learn how to identify and discount unwanted patterns from its decision-making process.

“You want to ignore the irrelevant information and focus on what’s relevant. This approach works well for finding really small signals buried in large backgrounds,” Grimes said.

The approach came about because of a renewed focus at PNNL on explainable AI, an effort to understand and explain how an AI system makes decisions. The work dovetails with NNSA’s Advanced Data Analytics for Proliferation Detection (ADAPD) program, which aims to detect very faint signals of nuclear proliferation activity. Both ADAPD and PNNL’s work on explainable AI are supported by DNN R&D.

The analysis of how systems sift such information allowed Grimes to generate insights about what information should be used for decision-making and what information can be discarded without penalty.

To understand the need to differentiate meaningful signals from less important background signals, consider the challenge a nuclear inspector faces when confronting thousands upon thousands of inputs. It’s a challenge that PNNL scientist Ben Wilson, who leads the Laboratory’s efforts on ADAPD, wrestled with during 12 years as a nuclear safeguards inspector for the International Atomic Energy Agency.

Inspectors synthesize information from detectors, photographs, shipping manifests, publicly available information, and myriad other sources. The ability to sift reams of complex data and key in on the most relevant details is central to the ability to grasp the status of a nuclear program. For example, two tiny data points—perhaps a seemingly unimportant piece of equipment observed during an inspection, coupled with the slightest discrepancy in declared nuclear material inventory—might open the door to significant inconsistencies about a country’s nuclear activities. Understanding how streams of information add up to a consistent, comprehensive view is what inspectors do, and what afficionados of explainable AI seek to understand and emulate.

Grimes and colleagues are exploring the benefits of explainable AI to make sure nuclear materials are used only for peaceful purposes. In one experiment, the team challenged an AI network to determine specific properties about the use of an instrument in different settings on the PNNL campus—in an isolated environment, in a laboratory one door over, or in an adjacent building. The different environments translated to highly varied background conditions that were constantly changing.

Researchers trained the network to automatically identify and use patterns in the data indicative of the instrument and not use ones that also indicated the background. Scientists distinguished the patterns based largely on the timing and consistency of their appearance during a week’s worth of experiments.

The team found, not surprisingly, that the system’s performance in training was poorer when some patterns—those that indicated both instrument activity and also background activity—were removed from consideration.

But importantly, even though the system did not perform as well on the training data, the system actually performed better when presented with real data for analysis. When multiple runs on the data were taken into account, the system trained to ignore background signals reduced the error rate of a conventional AI system by about 25 percent.

“While this improvement seems surprising, it’s actually predicted by theory,” added Grimes. “Focusing more precisely on the true signal in the data you have is key.”

And that, said Sheffield, is crucial.

“The national security mission demands the next generation in artificial intelligence—techniques like these are exactly what we need to capitalize on AI to transform national security,” she added. “We need ways to build good models from noisy and insufficient data. That’s not science fiction; we do it by better understanding and exploiting the data we have.”