Scavenger hunt for simulated nuclear materials

“We had a pretty healthy lead on the public scoreboard, but it turned out to be incredibly close,” said Joshi. The Berkeley Lab team had submitted about 204 entries over the course of the competition, and their final submission – which they sent in the final 20 minutes of the competition – put them ahead of the second-place team by just 1.3 points.

“It was down to the wire,” said Brian Quiter, an applied nuclear physicist in Berkeley Lab’s Nuclear Science Division who managed the competition and led the development of the platform with Shreyas Cholia, who heads up a software systems group in Berkeley Lab’s Computational Research Division.

Statistical sciences teams from Los Alamos and Lawrence Livermore national labs placed second and third in the competition, respectively.

The competition data was divided into public and private sets, and competitors did not know which was grouped into each category. For the public scoring, rankings were instantly updated for the public portion of the data sets. Meanwhile, participants’ submissions covered by the private data set were scored only at the completion of the competition.

By separating the data into public and private sets and in limiting the number of algorithm submissions per team to 1,000, the competition was designed to prevent teams from “gaming” the competition. For example, if there was no cap on submissions one team might gain an advantage by submitting a huge volume of slightly varied algorithms until one of them, by chance, earns a top ranking. No team neared the submission limit, and by the end of the competition the teams had sent a total of 1,024 submissions.

“The teams were graded on false positives – whether they reported environmental ‘background’ radiation as human-made sources, for example – and also on the likelihood of detecting the sought-after radiation sources, the precision in time at which a particular source was reported, and on whether they could specifically identify a particular type of radioactive source based on a list of six possible sources, from weapons-grade plutonium to materials used for nuclear medicine,” Quiter said.

To further complicate the challenge, there was no GPS or location-based information to inform participants about the layout of buildings, for example, and participants also had very limited information about the speed of the vehicle and the length of each of the paths traveled. The virtual streetscape used in the challenge was loosely based real streets, and the travel time of the detector along each path in the simulated data sets varied from 45 seconds to just over 12 minutes.

“We were told that the speeds were variable, from 1 to 13 meters per second,” Joshi said. “We were also told that the radiation sources could be shielded in certain situations, which changes the energy spectrum that you can measure. How you handled that in the algorithm was an important part of doing well.”

Joshi said that members of his team had already been working on similar algorithms prior to the challenge. “We have been building this capability up over the last year,” he said. “We have collected a lot of data from radiological measurements in urban areas and have been investigating the relationships between these measurements and the radiological compositions of our surroundings.”

While teams were not limited to using a single algorithm, Joshi said that ultimately the team developed one algorithm – they named it Berkeley Anomaly Detection – Factorized Matrices, or BAD-FM – to handle most of the data, with some refinements toward handling particular phenomena such as different speeds of the vehicle within the simulations.

The model was informed by data for naturally occurring radiation – which can vary based on local geography and building materials, for example, and can fluctuate over time even at the same location – as well as data from test sources of human-made nuclear materials.

Joshi said that the computing power of his desktop computer was adequate for initial prototyping of the algorithm; for the more advanced work, the team used a single node of a Lab supercomputer. They used visualization tools, which color-coded the simulated radiation sources based on the detector’s energy readings, to help interpret and analyze the data.

The top three teams in the competition – with the second- and third-place teams composed of data scientists from Los Alamos and Lawrence Livermore national labs, respectively, will be recognized at a July 11 meeting and will receive follow-up funding, Quiter noted. There will also likely be a public competition.

“The next big plan is to repeat this challenge on an established open platform where we can ideally tap into a large community of data scientists,” Quiter said.