CybersecurityMore efficient security for cloud-based machine learning

By Rob Matheson

Published 20 August 2018

A novel encryption method devised by MIT researchers secures data used in online neural networks, without dramatically slowing their runtimes. This approach, a combination based on two encryption techniques,  holds promise for using cloud-based neural networks for medical-image analysis and other applications that use sensitive data.

A novel encryption method devised by MIT researchers secures data used in online neural networks, without dramatically slowing their runtimes. This approach holds promise for using cloud-based neural networks for medical-image analysis and other applications that use sensitive data.

Outsourcing machine learning is a rising trend in industry. Major tech firms have launched cloud platforms that conduct computation-heavy tasks, such as, say, running data through a convolutional neural network (CNN) for image classification. Resource-strapped small businesses and other users can upload data to those services for a fee and get back results in several hours.

But what if there are leaks of private data? In recent years, researchers have explored various secure-computation techniques to protect such sensitive data. But those methods have performance drawbacks that make neural network evaluation (testing and validating) sluggish — sometimes as much as million times slower — limiting their wider adoption.

In a paper presented at this week’s USENIX Security Conference, MIT researchers describe a system that blends two conventional techniques — homomorphic encryption and garbled circuits — in a way that helps the networks run orders of magnitude faster than they do with conventional approaches.

The researchers tested the system, called GAZELLE, on two-party image-classification tasks. A user sends encrypted image data to an online server evaluating a CNN running on GAZELLE. After this, both parties share encrypted information back and forth in order to classify the user’s image. Throughout the process, the system ensures that the server never learns any uploaded data, while the user never learns anything about the network parameters. Compared to traditional systems, however, GAZELLE ran 20 to 30 times faster than state-of-the-art models, while reducing the required network bandwidth by an order of magnitude.

One promising application for the system is training CNNs to diagnose diseases. Hospitals could, for instance, train a CNN to learn characteristics of certain medical conditions from magnetic resonance images (MRI) and identify those characteristics in uploaded MRIs. The hospital could make the model available in the cloud for other hospitals. But the model is trained on, and further relies on, private patient data. Because there are no efficient encryption models, this application isn’t quite ready for prime time.