Expanding the AI Toolbox of Cybersecurity Defenders

It’s similar to how people learn many tasks. A child who does their chores might receive positive reinforcement with a desired playdate; a child who doesn’t do their work gets negative reinforcement, like the takeaway of a digital device.

“It’s the same concept in reinforcement learning,” Chatterjee said. “The agent can choose from a set of actions. With each action comes feedback, good or bad, that becomes part of its memory. There’s an interplay between exploring new opportunities and exploiting past experiences. The goal is to create an agent that learns to make good decisions.”

Open AI Gym and MITRE ATT&CK
The team used an open-source software toolkit known as Open AI Gym as a basis to create a custom and controlled simulation environment to evaluate the strengths and weaknesses of four deep reinforcement learning algorithms.

The team used the MITRE ATT&CK framework, developed by MITRE Corp., and incorporated seven tactics and 15 techniques deployed by three distinct adversaries. Defenders were equipped with 23 mitigation actions to try to halt or prevent the progression of an attack.

Stages of the attack included tactics of reconnaissance, execution, persistence, defense evasion, command and control, collection and exfiltration (when data is transferred out of the system). An attack was recorded as a win for the adversary if they successfully reached the final exfiltration stage.

“Our algorithms operate in a competitive environment—a contest with an adversary intent on breaching the system,” said Chatterjee. “It’s a multistage attack, where the adversary can pursue multiple attack paths that can change over time as they try to go from reconnaissance to exploitation. Our challenge is to show how defenses based on deep reinforcement learning can stop such an attack.”

DQN Outpaces Other Approaches
The team trained defensive agents based on four deep reinforcement learning algorithms: DQN (Deep Q-Network) and three variations of what’s known as the actor-critic approach. The agents were trained with simulated data about cyberattacks, then tested against attacks that they had not observed in training.

DQN performed the best.

·  Least sophisticated attacks (based on varying levels of adversary skill and persistence): DQN stopped 79 percent of attacks midway through attack stages and 93 percent by the final stage.

·  Moderately sophisticated attacks: DQN stopped 82 percent of attacks midway and 95 percent by the final stage.

·  Most sophisticated attacks: DQN stopped 57 percent of attacks midway and 84 percent by the final stage—far higher than the other three algorithms.

“Our goal is to create an autonomous defense agent that can learn the most likely next step of an adversary, plan for it, and then respond in the best way to protect the system,” Chatterjee said.

Despite the progress, no one is ready to entrust cyber defense entirely up to an AI system. Instead, a DRL-based cybersecurity system would need to work in concert with humans, said coauthor Arnab Bhattacharya, formerly of PNNL.

AI can be good at defending against a specific strategy but isn’t as good at understanding all the approaches an adversary might take,” Bhattacharya said. “We are nowhere near the stage where AI can replace human cyber analysts. Human feedback and guidance are important.”

Tom Rickey is a Senior Science Writer at the Pacific Northwest National Laboratory.