Friday, January 13, 2017

RWC 2017 - Secure MPC at Google

This talk was given by Ben Kreuter and its focus was on the apparent disparity between what we research in academia versus what is required in the real world, specifically in the field of multi-party computation (MPC). MPC is the idea of allowing multiple parties to compute some function on their combined input without any party revealing anything about their input to the other parties (other than what can be learnt from the output alone).

While significant work has been done on making MPC efficient in practice (for example, the work of Yehuda Lindell et al. on high-throughput MPC which was presented by Lindell in the preceding talk), the focus tends to be on generic protocols (e.g. general logic circuits) with strong security guarantees (e.g. malicious security), which invariably leads to large computational overhead. In practice, we usually require only specific protocols, which can therefore be optimised, and comparatively weak security guarantees.

In the real world, network cost is the salient factor, rather than the speed of the protocol, since the parties who are involved in a computation often have to use networks (such as the Internet) which are being used by many other people at the same time and cannot make the best use of the network's full capabilities. The MPC at Google is about computation amongst, for example, mobile phones, laptops and servers; this introduces issues like battery constraints and the possibility of the computation not completing; these considerations, firmly grounded in the real world, are important when developing MPC techniques in research.

Business applications

A large portion of Google's revenue is generated by advertising: the tech giant, well-known for its aptitude for accurately determining users' desired search results even when queries are expressed ineloquently, specialises in creating personalised adverts to its wide spectrum of users. The efficacy of an advert is generally measured by the proportion of viewers of it who later become customers. Clearly this can be done by businesses comparing their database of customers' transactions with Google's databases of who has been shown which adverts. This, however, would be an invasion of privacy: instead, Google and the business can do MPC: more specifically, a private set intersection protocol.

In a private set intersection protocol, the parties involved compute how large the intersection is amongst the sets input by each party, or even some function on those elements in the intersection. So if the business and Google compute a private set intersection protocol on their data, they can determine how well the advertising went.

Roughly speaking, the MPC Google does in the real world is as follows: Google has a set $\{g_1,g_2,...,g_n\}$ of field elements which encodes a set of people who have been shown an advert for a certain product, and a business has a set $\{b_1,b_2,...,b_m\}$ of field elements which encodes a set of people who have been sold the product in question; Google raises each of its elements to a power $G$ and sends the set $\{g_1^G,g_2^G,...,g_n^G\}$ to the business. The business does the same with its elements for some exponent $B$ to get $\{b_1^B,b_2^B,...,b_m^B\}$, encrypts a set of binary vectors under Paillier encryption (which is additively homomorphic), one corresponding to each element in its set, encoding some other property of the sales (like the amount paid), and also computes the set $\{g_1^{GB},g_2^{GB},...,g_n^{GB}\}$. The business sends Google the set of pairs $\{(b_1^B,P(v_1)),(b_2^B,P(v_2)),...,(b_m^B,P(v_m))\}$ along with $\{g_1^{GB},g_2^{GB},...,g_n^{GB}\}$, and Google computes $\{b_1^{GB},b_2^{GB},...,b_m^{GB}\}$ and adds together all encrypted vectors $P(v_i)$ for which there exists some $j$ such that $g_i^{GB}=b_j^{GB}$. It sends this ciphertext back to the business, which decrypts and interprets the result.

This protocol is very simple, and it is only passively secure (in which players are assumed to execute the protocol faithfully but will possibly try to learn things by inspecting their communication transcripts). An interesting, perhaps somewhat orthogonal concern, to how we approach research from an academic point of view is that it is important that we can convey the security and efficiency of our protocols to lawyers, managers and software engineers who will eventually be sanctioning, authorising or implementing the protocols. "The lawyers are interesting because you can show them a proof, and two plus two equals four is a negotiable statement here... managers usually trust your expertise...and software engineers are the worst because they already assume [the protocol] is impossible."

An alternative solution using garbled circuits was explored in the recent past, but it turned out that their use required some subtle assumptions regarding the computation and communication which would have made the protocol impractical.

Future work would involve getting a (not too much more expensive) maliciously secure protocol and developing the use of the homomorphic encryption to allow different functions to be computed on the data in the intersection.

Consumer applications

The Android keyboard app by Google, Gboard, logs what a user types so that it can guess words for auto-completing in the future. This data could be used for training machine learning models, and merging results from many local models would enable the formation of guessing algorithms that work well for everyone. However, to do this, the server would need to receive a set large dataset of words typed by a user from each phone so that this processing could be done. Clearly there is an issue of privacy here; moreover, there is also potentially a differential privacy issue.

This is clearly a good situation in which to use MPC. Each party masks their data using a basic additive secret-sharing scheme: if each party has a vector to input, for every coordinate, every pair of parties agrees on some random field element, one subtracts and one adds this to that coordinate of their vector. When the parties send this to Google, the masks will therefore cancel when added together.

In practice,they use a PRG and perform a key exchange (in which one key is given to each pair of parties, for every possible pair) at the beginning to achieve the same effect but with much smaller communication overhead. They also have a trick for dealing with device failures (which is important given the application).

This talk provided helpful and relevant insight into the the importance of matching what we research with what we require in the real world, which is, after all, one of the main reasons for having conferences such as Real World Crypto. Many of the talks are available to watch online here, and I would highly recommend doing so if interested.

No comments:

Post a Comment