Keeping Web-Browsing Data Safe from Hackers

A Side-Channel Surprise
Cook launched the project while taking Yan’s advanced seminar course. For a class assignment, he tried to replicate a machine-learning-assisted side-channel attack from the literature. Past work had concluded that this attack counts how many times the computer accesses memory as it loads a website and then uses machine learning to identify the website. This is known as a website-fingerprinting attack.

He showed that prior work relied on a flawed machine-learning-based analysis to incorrectly pinpoint the source of the attack. Machine learning can’t prove causality in these types of attacks, Cook says.

“All I did was remove the memory access and the attack still worked just as well, or even better. So, then I wondered, what actually opens up the side channel?” he says.

This led to a research project in which Cook and his collaborators embarked on a careful analysis of the attack. They designed an almost identical attack, but without memory accesses, and studied it in detail.

They found that the attack actually records a computer’s timer values at fixed intervals and uses that information to infer what website is being accessed. Essentially, the attack measures how busy the computer is over time.

A fluctuation in the timer value means the computer is processing a different amount of information in that interval. This is due to system interrupts. A system interrupt occurs when the computer’s processes are interrupted by requests from hardware devices; the computer must pause what it is doing to handle the new request.

When a website is loading, it sends instructions to a web browser to run scripts, render graphics, load videos, etc. Each of these can trigger many system interrupts.

An attacker monitoring the timer can use machine learning to infer high-level information from these system interrupts to determine what website a user is visiting. This is possible because interrupt activity generated by one website, like CNN.com, is very similar each time it loads, but very different from other websites, like Wikipedia.com, Cook explains.

“One of the really scary things about this attack is that we wrote it in JavaScript, so you don’t have to download or install any code. All you have to do is open a website. Someone could embed this into a website and then theoretically be able to snoop on other activity on your computer,” he says.

The attack is extremely successful. For instance, when a computer is running Chrome on the macOS operating system, the attack was able to identify websites with 94 percent accuracy. All commercial browsers and operating systems they tested resulted in an attack with more than 91 percent accuracy.

There are many factors that can affect a computer’s timer, so determining what led to an attack with such high accuracy was akin to finding a needle in a haystack, Cook says. They ran many controlled experiments, removing one variable at a time, until they realized the signal must be coming for system interrupts, which often can’t be processed separately from the attacker’s code.

Fighting Back
Once the researchers understood the attack, they crafted security strategies to prevent it.

First, they created a browser extension that generates frequent interrupts, like pinging random websites to create bursts of activity. The added noise makes it much more difficult for the attacker to decode signals. This dropped the attack’s accuracy from 96 percent to 62 percent, but it slowed the computer’s performance.

For their second countermeasure, they modified the timer to return values that are close to, but not the actual time. This makes it much harder for an attacker to measure the computer’s activity over an interval, Cook explains. This mitigation cut the attack’s accuracy from 96 percent down to just 1 percent.

“I was surprised by how such a small mitigation like adding randomness to the timer could be so effective. This mitigation strategy could really be put in use today. It doesn’t affect how you use most websites,” he says.

Building off this work, the researchers plan to develop a systematic analysis framework for machine-learning-assisted side-channel attacks. This could help the researchers get to the root cause of more attacks, Yan says. They also want to see how they can use machine learning to discover other types of vulnerabilities.

“This paper presents a new interrupt-based side channel attack and demonstrates that it can be effectively used for website fingerprinting attacks, while previously, such attacks were believed to be possible due to cache side channels,” says Yanjing Li, assistant professor in the Department of Computer Science at the University of Chicago, who was not involved with this research. “I liked this paper immediately after I first read it, not only because the new attack is interesting and successfully challenges existing notions, but also because it points out a key limitation of ML-assisted side-channel attacks — blindly relying on machine-learning models without careful analysis cannot provide any understanding on the actual causes/sources of an attack, and can even be misleading. This is very insightful and I believe will inspire many future works in this direction.” 

This research was funded, in part, by the National Science Foundation, the Air Force Office of Scientific Research, and the MIT-IBM Watson AI Lab.

Paper: “There’s Always a Bigger Fish: A Clarifying Analysis of a Machine-Learning-Assisted Side-Channel Attack”

Adam Zewe is a writer at Massachusetts Institute of TechnologyThis stroy  is reprinted with permission of MIT News.