Using Machine Learning to Hunt Down Cybercriminals

In 1998, the U.S. Senate’s first-ever cybersecurity hearing included a team of hackers that claimed that they could use IP hijacking to take down the internet in under 30 minutes. “More than 20 years later, the lack of deployment of security mechanisms in BGP is still a serious concern,” said CAIDA’s Dainotti.

To better pinpoint serial attacks, the group first pulled data from several years’ worth of network operator mailing lists, as well as historical BGP data taken every five minutes from the global routing table. From that they observed particular qualities of malicious actors and then trained a machine learning model to automatically identify such behaviors.

The system flagged networks that had several key characteristics, particularly with respect to the nature of the specific blocks of IP addresses they use:

·  Volatile changes in activity: hijackers’ address blocks seem to disappear much faster than those of legitimate networks. The average duration of a flagged network’s prefix was under 50 days, compared to almost two years for legitimate networks.

·  Multiple address blocks: serial hijackers tend to advertise many more blocks of IP addresses, also known as “network prefixes.” 

·  IP addresses in multiple countries: most networks don’t have foreign IP addresses. In contrast, for the networks that serial hijackers advertised that they had, they were much more likely to be registered in different countries and continents.

Testart said that one challenge in developing the system was that events that look like IP hijacks can often be the result of human error, or otherwise legitimate. For example, a network operator might use BGP to defend against distributed denial-of-service (DDoS) attacks in which there’s huge amounts of traffic going to their network. Modifying the route is a legitimate way to shut down the attack, but it looks virtually identical to an actual hijack.

Because of this issue, the team often had to manually jump in to identify false positives, which accounted for roughly 20 percent of the cases identified by their classifier. Moving forward, the researchers are hopeful that future iterations will require minimal human supervision and could eventually be deployed in production environments.

“The authors’ results show that past behaviors are clearly not being used to limit bad behaviors and prevent subsequent attacks,” according to David Plonka, a senior research scientist at Akamai Technologies who was not involved in the work. “One implication of this work is that network operators can take a step back and examine global Internet routing across years, rather than just myopically focusing on individual incidents.”

As people increasingly rely on the internet for critical transactions, Testart expects IP hijacking’s potential for damage to only get worse. But she’s also hopeful that it could be made more difficult by new security measures. In particular, large backbone networks such as AT&T  have recently announced the adoption of resource public key infrastructure (RPKI), a mechanism that uses cryptographic certificates to ensure that a network announces only its legitimate IP addresses. 

“This project could nicely complement the existing best solutions to prevent such abuse that include filtering, anti-spoofing, coordination via contact databases, and sharing routing policies so that other networks can validate it,” said Plonka. “It remains to be seen whether misbehaving networks will continue to be able to game their way to a good reputation. But this work is a great way to either validate or redirect the network operator community’s efforts to put an end to these present dangers.”

The paper, titled “Profiling BGP Serial Hijackers: Capturing Persistent Misbehavior in the Global Routing Table,” can be viewed here.