Cloud security reaches silicon

In earlier work, researchers in Devadas’ group were able to prove that pulling data from a single path was as confounding to an adversary as if the chip had pulled data from every single memory address in use — every node of the tree.

Breaking the logjam
After reading data from a path, however, the chip also has to write data to the whole path; otherwise, an adversary could determine which node was the one of interest. But the chip rarely stores data in the same node that it read it from.

 Nodes often lie on multiple paths: To take the most basic example, the single node at the top, or root, of the tree lies on every path. When the chip writes a block of data to memory, it pushes it as far down the tree as it can, which means finding the last vacancy before the block’s assigned path branches off from path that was just read.

“The root of the tree is a lot smaller than the bottom of tree,” says Albert Kwon, an MIT graduate student in electrical engineering and computer science and one of the papers’ co-authors. “So intuitively, you want to push down as far as you can toward the bottom, so that there’s no congestion at the top.”

In writing data, the chip still has to follow the sequence of nodes in the path; otherwise, again, an adversary might be able to infer something about the data being stored. In previous attempts at similar systems, that meant sorting the memory addresses according to their ultimate locations in the tree.

“Sort is not easy to do in hardware,” says Chris Fletcher, another graduate student in Devadas’ group and first author on the new paper. “So by the time you’ve sorted everything, you’ve taken a real performance hit.”

In the chip described in their latest paper, Fletcher, Devadas, Kwon, and their co-authors — Ling Ren, also an MIT graduate student in electrical engineering and computer science, and colleagues at the University of Connecticut, the University of California at Berkeley, and the Qatar Computing Research Institute — took a different approach. They gave their chip an extra memory circuit, with storage slots that can be mapped onto the sequence of nodes in any path through the tree. Once a data block’s final location is determined, it’s simply stored at the corresponding slot in the circuit. All of the blocks are then read out in order.

Stockpiled secrets
The new chip features another trick to improve efficiency: Rather than writing data out every time it reads data in, it writes only on every fifth read. On the other reads, it simply discards all of the decoy data. When it finally does write data back out, it will have, on average, five extra blocks of data to store on the last path it read. But there are generally enough vacancies in the tree to accommodate the extra blocks. And when there aren’t, the system’s ordinary protocols for pushing data as far down the tree as possible can handle the occasional logjam at the top.

Today’s chips have small, local memory banks called caches in which they store frequently used data; for applications that use caching efficiently, all that extra reading and writing generally increases computation time by only about 20 percent. For applications that don’t use caching efficiently, computation time can increase fivefold, or even more.

But according to the researchers, one of the advantages of their scheme is that the circuits that implement it can simply be added to existing chip designs, without much retooling. The extra layer of security can then be switched on and off as needed. Some cloud applications may use it all the time; others may opt against it entirely; still others may activate it only when handling sensitive information, such as credit card numbers.

“This is groundbreaking work,” says Elaine Shi, an assistant professor of computer science at the University of Maryland who has studied similar security schemes. “For many years, this kind of secure algorithm has been prohibitive. This is basically the first time they’ve shown that you can achieve this 2x overhead. Previously, the overhead would be ridiculous, maybe in the tens of thousands. They built this thing and show that for, a class of benchmarks, the average slowdown is only 2x.”

“If you think about Java versus C, the slowdown is probably more than 2x,” she says. “Two-x is nothing. Plus, you only have to incur it for part of the code that touches sensitive data, like credit card numbers or genomic data.”

Reprinted with permission of MIT News