Wednesday, June 06, 2012

Verification in the Cloud [Guest Post by Justin Thaler]

[Justin talks about his upcoming work, to be presented at HotCloud.]

For the past few years, Michael and I, along with our awesome collaborators, have worked towards developing practical protocols for verifying outsourced computations. In roughly chronological order, the relevant papers are here , here , here , here , and most recently here . In this last paper (joint work with Mike Roberts and Hanspeter Pfister ), we really tried to push these protocols into practice, largely by taking advantage of their inherent parallelizability: we'll be presenting it at HotCloud next week, and it is the main impetus for this blog post. My hope here is to give a (somewhat) brief, unified overview of what we've accomplished with this line of work, and how it relates to some exciting parallel lines of inquiry by other researchers.

Our main motivation is that of Alice, who stores a large data set on the cloud, and asks the cloud to perform a computation on the data (say, to compute the shortest path between two nodes in a large graph, or to solve a linear program defined over the data). Of course, Alice may be a company or an organization, rather than an individual. The goal is to provide Alice with a guarantee that the server performed the requested computation correctly, without requiring Alice to perform the requested computations herself, or even to maintain a local copy of the data (since Alice may have resorted to the cloud in the first place because she has more data than she can store). In short, we want to save Alice as much time and space as possible, while also minimizing the amount of extra bookkeeping that the cloud has to do to prove the integrity of the computation.

Alice may want such integrity guarantees because she is concerned about simple errors, like dropped data, hardware faults, or a buggy algorithm, or she may be more paranoid and fear that the cloud is deliberately deceptive or has been externally compromised. So ideally we'd like our protocols to be secure against arbitrarily malicious clouds, but sufficiently lightweight for use in more benign settings. This is an ambitious goal, but achieving it could go a long way toward mitigating trust issues that hinder the adoption of cloud computing solutions.

Surprisingly powerful protocols for verifiable computation were famously discovered within the theory community several decades ago, in the form of  interactive proofs , PCPs , and the like. These results are some of the brightest gems of complexity theory, but as of a few years ago they were mainly theoretical curiosities, far too inefficient for actual deployment (with the notable exception of certain zero-knowledge proofs).

We've been focusing on interactive proof methods, and have made substantial strides in improving their efficiency. One direction we've focused on is the development of highly optimized protocols for specific important problems, like reporting queries (what value is stored in memory location x of my database?), matrix multiplication, graph problems like perfect matching, and certain kinds of linear programs. Many of these are provably optimal in terms of space and communication costs, consist of a single message from the cloud to Alice (which can be sent as an email attachment or posted on a website), and already save Alice considerable time and space while imposing minimal burden on the cloud, both in theory and experimentally. But for the rest of this post I will focus on *general-purpose* methods, which are capable of verifying arbitrary computations.

The high-order insights of this line of work are as follows. The statements below have precise theoretical formulations, but I'm referring to actual experimental results with a full-blown implementation. Note that a lot of engineering work went into making our implementations fast, like choosing the "right" finite field to work over, and working with the right kinds of circuits.

1) We can save Alice substantial amounts of space essentially for free. The reason is that existing interactive proof protocols (such as Interactive Proofs for Muggles by Goldwasser, Kalai, and Rothblum, which is the protocol underlying our implementation) only require Alice to store a fingerprint of the data. This fingerprint can be computed in a single, light-weight streaming pass over the input (say, while Alice uploads her data to the cloud), and serves as a sort of "secret" that Alice can use to catch the cloud in a lie. The fingerprint doesn't even depend on the computation being outsourced, so Alice doesn't need to know what computation she's interested in until well after she's seen the input, and she never needs to store the input locally.

2) Our implementation already saves Alice a lot of time relative to doing the computation herself. For example, when multiplying two 512x512 matrices, Alice requires roughly a tenth of a second to process the input, while naive matrix multiplication takes about seven times longer. And the savings increase substantially at larger input sizes (as well as when applying our implementation to more time-intensive computations than matrix multiplication) since Alice's runtime in the protocol grows roughly linearly with the input size. So I'd argue that verifiable computation is essentially a solved problem in settings where the main focus is saving Alice time, and the runtime of the cloud is of secondary importance. At least this the case for problems solvable by reasonably small-depth circuits, for which our implementation is most efficient.

3) We've come a long way in making the prover more efficient. Theoretically speaking, in our ITCS paper with Graham Cormode , we brought the runtime of the cloud down from polynomial in the size of a circuit computing the function of interest, to quasilinear in the size of the circuit. Practically speaking, a lot of work remains to be done on this aspect (for example, our single-threaded cloud implementation takes about 30 minutes to multiply two 256 x 256 matrices, and matrix multiplication is a problem well-suited to these sorts of protocols), but we are in much better shape than we were just a few years ago.

4) All of the protocols (special-purpose and general-purpose alike) are extremely amenable to parallelization on existing hardware. This holds for both Alice and the cloud (although Alice runs extremely quickly even without parallelization, see Point 1). For example, using GPUs we can bring the runtime of the cloud to < 40 seconds when multiplying two 256 x 256 matrices. Obviously this is still much slower than matrix multiplication without integrity guarantees, but we're now just one or two orders of magnitude away from undeniable usefulness.

The extended abstract  appearing in HotCloud (which should be viewed largely as an advertisement for the arxiv version) can be found here . We tried hard to give an accessible, if very high level, overview of the powerful ideas underlying interactive proofs, which I hope will be useful for researchers who are encountering verifiable computation for the first time.  Slides describing the entirety of this line of work in more detail can be found here .

I want to close by mentioning two exciting lines of work occurring in parallel with our own. First, Ben-Sasson, Chiesa, Genkin, and Tromer are working toward developing practical PCPs, or probabilistically-checkable proofs (see their new paper here ). The PCP setting is much more challenging than the interactive proof setting we have been working in above: in a PCP, there is no interaction to leverage (i.e. the cloud sends a single message to Alice), and moreover Alice is only permitted to look at *a few bits* of the proof. The latter may seem like an artificial constraint that doesn't matter in real outsourcing scenarios, but it turns out that building a practical PCP system would buy you quite a bit. This is because one can throw certain cryptographic primitives on top of a PCP system (like collision-resistant hash functions) and get a wide variety of powerful protocols, such as succinct arguments for all of NP (i.e., protocols requiring very little communication, which are secure against computationally bounded adversaries). The work of BSCGT appears to still be in the theoretical stage, but is very exciting nonetheless. Check out their paper for more details.

Second is work of Setty, McPherson, Blumberg, and Walfish, from NDSS earlier this year (see their project page here ). They implemented an argument system originally due Ishai, Kushilevitz, and Ostrovsky, and bring the runtime of the cloud down by a factor of 10^20  relative to a naive implementation (yes, I said 10^20; this again highlights the considerable engineering work that needs to be done on top of the theory to make proof or argument systems useful). Our protocols have several advantages not shared by SMBW (like security against computationally unbounded adversaries, and the ability to save the verifier time even when outsourcing a single computation), but this is another big step toward a practical implementation for verified computation. It looks like related work by Setty, Vu, Panpalia, Braun, Blumberg, and Walfish will be presented at USENIX Security 2012 as well (see the conference page here ).

The golden age for negative applications of interactive proofs and PCPs (such as hardness of approximation results) arrived over 15 years ago, and continues to this day. Perhaps the time for positive applications is now.


Anonymous said...

Is that HotCould conference pronounced "hot-cold"?

Michael Mitzenmacher said...

Ha ha. Thanks! (Fixed.)

Michael Walfish said...

Justin (and Michael), congratulations on these advances, and thanks so much for the mention! This made my day! As you say, it is indeed exciting to see these different approaches to verified outsourced computation start to gain traction!

Also, thanks for noticing our upcoming Usenix Security paper. We're really excited about it. It'll report on improvements to our base scheme and also general-purpose work (which should apply to any of these base protocols) to handle a broader class of computations under the arithmetic circuit (really constraint) model. For instance, we can handle rational numbers and compile a subset of C (a la Fairplay) to our constraints.

In any case, good luck at your talk next week, and thanks again!