AI Model Aims to Plug Key Gap in Cybersecurity Readiness

From CVE to CWE to CAPEC: a Path to Better Cybersecurity
The new AI model uses natural language processing and supervised learning to bridge information in three separate cybersecurity databases:

·  Vulnerabilities—the specific piece of computer code that could serve as an opening for an attack. These 200,000+ “common vulnerabilities and exposures” or CVEs, are listed in a National Vulnerability Database maintained by the Information Technology Laboratory.

·  Weaknesses—a slimmer set of definitions that classify the vulnerabilities into categories based on what could happen if the vulnerabilities were acted upon. There are about 1,000 “common weakness enumerations” or CWEs listed in the Common Weakness Enumeration database maintained by MITRE Corp.

·  Attacks—what an actual attack exploiting vulnerabilities and weaknesses might look like. More than 500 potential attack routes or “vectors,” known as “CAPECs,” are included in the Common Attack Pattern Enumeration and Classification resource maintained by MITRE.

While all three databases have information crucial for cyber defenders, there have been few attempts to knit all three together so that a user can quickly detect and understand possible threats and their origins, and then weaken or prevent these threats and attacks.

“If we can classify the vulnerabilities into general categories, and we know exactly how an attack might proceed, we could neutralize threats much more efficiently,” said Halappanavar. “The higher you go in classifying the bugs, the more threats you can stop with one action. An ideal goal is to prevent all possible exploitations.”

The work received the best paper award at the IEEE International Symposium on Technologies for Homeland Security in November. The work was funded by DOE’s Office of Science and by PNNL’s Data-Model Convergence Initiative.

In addition to Halappanavar, the team includes first author Siddhartha Shankar Das of Purdue University, who was an intern at PNNL; former PNNL scientist Ashutosh Dutta, now at Amazon; Sumit Purohit of PNNL; Edoardo Serra of Boise State University and a joint appointee at PNNL; and Alex Pothen of Purdue.

In previous work, the team used AI to link two of the resources, vulnerabilities and weaknesses. That work, resulting in the model V2W-BERT, earned the team—Das, Pothen, Halappanavar, Serra and Ehab Al-Shaer from Carnegie Mellon University—a best application paper award at the 2021 IEEE International Conference on Data Science and Advanced Analytics.

AI Links Computer Bugs to Potential Cyberattacks Automatically
The new model, VWC-MAP, extends the project to a third category, attack actions.

“There are thousands upon thousands of bugs or vulnerabilities out there, and new ones are created and discovered every day,” said Das, a doctoral student at Purdue who has led development of the work since his internship at PNNL in 2019. “And more are coming. We need to develop ways to stay ahead of these vulnerabilities, not only the ones that are known but the ones that haven’t been discovered yet.”

The team’s model automatically links vulnerabilities to the appropriate weaknesses with up to 87 percent accuracy, and links weaknesses to appropriate attack patterns with up to 80 percent accuracy. Those numbers are much better than today’s tools provide, but the scientists caution that their new methods need to be tested more widely.

One hurdle is the dearth of labeled data for training. For example, currently very few vulnerabilities—less than 1%—are linked to specific attacks. That’s not a lot of data available for training.

To overcome the lack of data and perform the work, the team fine-tuned pretrained natural language models, using both an auto-encoder (BERT) and a sequence-to-sequence model (T5). The first approach used a language model to associate CVEs to CWEs and then CWEs to CAPECs through a binary link prediction approach. The second approach used sequence-to-sequence techniques to translate CWEs to CAPECs with intuitive prompts for ranking the associations. The approaches generated very similar results, which were then validated by the cybersecurity expert on the team.

“We’re putting this out there for others to test, to go through the vulnerabilities and make sure the model bins them appropriately,” said Halappanavar. “We really hope that cybersecurity experts can put this open-source platform to the test.”

Tom Rickey covers science at the Pacific Northwest National Laboratory (PNNL). The article was first published on the PNNL website.