Protecting web users’ privacy

“We see a shift toward people wanting private queries,” Wang says. “We can imagine a model in which other services scrape a travel site, and maybe they volunteer to host the information for you, or maybe you subscribe to them. Or maybe in the future, travel sites realize that these services are becoming more popular and they volunteer the data. But right now, we’re trusting that third-party sites have adequate protections, and with Splinter we try to make that more of a guarantee.”

Division of labor
Splinter uses a technique called function secret sharing, which was first described in a 2015 paper by a trio of Israeli computer scientists. One of them, Elette Boyle, earned her PhD at MIT studying with RSA Professor of Computer Science and Engineering Shafi Goldwasser, a 2013 recipient of the Turing Award, the highest award in computer science. Goldwasser, in turn, is one of Wang’s co-authors on the new paper, along with Vinod Vaikuntanathan, an MIT associate professor of electrical engineering and computer science (EECS); Catherine Yun, an EECS graduate student; and Matei Zaharia, an assistant professor of computer science at Stanford.

Systems for disguising database queries have been proposed in the past, but function secret sharing could make them as much as 10 times faster. In experiments, the MIT and Stanford researchers found that Splinter could return a result from a database with millions of entries — including a duplicate of the Yelp database for selected cities — in about a second.

With function secret sharing, a database query is converted into a set of complementary mathematical functions, each of which is sent to a different database server. On each server, the function must be applied to every record in the database; otherwise, a spy could determine what data the user is interested in. Every time the function is applied to a new record, it updates a value stored in memory. After it’s been applied to the last record, the final value is returned to the user. But that value is meaningless until it’s combined with the values reported by the other servers.

Splinter represents several key elaborations on previous work on function secret sharing. Whereas earlier research focused on concealing simple binary-comparison and addition operations, Splinter executes more complex operations typical of database queries, such as finding a specified number of records with the highest or lowest values for some variable — such as the 10 lowest fares for a particular flight itinerary. The MIT and Stanford researchers had to devise cryptographic functions that could perform all the comparing and sorting required for ranking results without betraying any information.

Practical considerations
Splinter has also been engineered to run efficiently on real database systems. Most modern computer chips, for instance, are hardwired to implement the encryption scheme known as AES. Hardwiring makes AES hundreds of times faster than it would be if it were implemented in software, but AES has some idiosyncrasies that make it less than ideal for function secret sharing. Through a clever combination of software processes and AES encryption, the MIT and Stanford researchers were able to make Splinter 2.5 times as efficient as it would be if it used the AES circuits alone.

“There’s always this gap between something being proposed on paper and actually implementing it,” Wang says. “We do a lot of optimization to get it to work, and we have to do a lot of tricks to get it to support actual database queries.”

“When you look at a lot of these systems that purport to provide various security properties, they work very nicely in theory, but user experience often comes down to performance, and the performance is not there,” says James Mickens, an associate professor of computer science at Harvard University. “What’s nice about Splinter is that they use these realistic applications and realistic workloads to show that, yeah, users would probably interact with this system. The system isn’t quite as a fast as a normal, non-privacy-preserving system, but there’s no free lunch. I think that the system does quite a good job of providing that additional privacy protection while still being reasonably performant.”

Reprinted with permission of MITNews