AI3 Questions: Modeling Adversarial Intelligence to Exploit AI’s Security Vulnerabilities

By Alex Shipps

Published 30 January 2025

MIT Principal Research Scientist Una-May O’Reilly discusses how she develops agents that reveal AI models’ security weaknesses before hackers do.

If you’ve watched cartoons like Tom and Jerry, you’ll recognize a common theme: An elusive target avoids his formidable adversary. This game of “cat-and-mouse” — whether literal or otherwise — involves pursuing something that ever-so-narrowly escapes you at each try.

In a similar way, evading persistent hackers is a continuous challenge for cybersecurity teams. Keeping them chasing what’s just out of reach, MIT researchers are working on an AI approach called “artificial adversarial intelligence” that mimics attackers of a device or network to test network defenses before real attacks happen. Other AI-based defensive measures help engineers further fortify their systems to avoid ransomware, data theft, or other hacks.

Here, Una-May O’Reilly, an MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) principal investigator who leads the Anyscale Learning For All Group (ALFA), discusses how artificial adversarial intelligence protects us from cyber threats.

Q: In what ways can artificial adversarial intelligence play the role of a cyber attacker, and how does artificial adversarial intelligence portray a cyber defender?
A: Cyber attackers exist along a competence spectrum. At the lowest end, there are so-called script-kiddies, or threat actors who spray well-known exploits and malware in the hopes of finding some network or device that hasn’t practiced good cyber hygiene. In the middle are cyber mercenaries who are better-resourced and organized to prey upon enterprises with ransomware or extortion. And, at the high end, there are groups that are sometimes state-supported, which can launch the most difficult-to-detect “advanced persistent threats” (or APTs).

Think of the specialized, nefarious intelligence that these attackers marshal — that’s adversarial intelligence. The attackers make very technical tools that let them hack into code, they choose the right tool for their target, and their attacks have multiple steps. At each step, they learn something, integrate it into their situational awareness, and then make a decision on what to do next. For the sophisticated APTs, they may strategically pick their target, and devise a slow and low-visibility plan that is so subtle that its implementation escapes our defensive shields. They can even plan deceptive evidence pointing to another hacker! 

My research goal is to replicate this specific kind of offensive or attacking intelligence, intelligence that is adversarially-oriented (intelligence that human threat actors rely upon). I use AI and machine learning to design cyber agents and model the adversarial behavior of human attackers. I also model the learning and adaptation that characterizes cyber arms races.