Thursday, April 5, 2012

TCC Invited Talk II

The second invited talk at TCC 2012 was given by Dr. Sergey Yekhanin from the MIcrosoft Research. The talk was on Locally Decodable Codes (LCD), an important research topic in Coding theory as well in Cryptography. In the talk the speaker brilliantly explained the concept of LCD and its relation with one of the fundamental cryptographic primitives, namely Private Information Retrieval (PIR). This was followed by some technical constructions of the LCD codes.

To see the motivation of LCD codes, consider the well known problem of data storage, where we want to reliably store some data and at the same time, we would like the data to be readily available for the users. One simple and common approach would be to replicate the data at several places. This approach provides us with local recovery, as the data can be locally accessed by the users from any one of the several places, where the data is stored. But this will be an overkill for large sized data. An alternative approach would be to use erasure coding, where we encode the data using an erasure correcting code and store the various components of the codeword at various places. Though this approach is cheaper than the idea of replicating the whole data, it no longer provides us the local recovery property. This is because now to access the data, the users have to retrieve several components of the codeword to perform the decoding. What we would like now is to have an erasure code which can provide us with local decoding.

Consider the following example: let x be the message, consisting of 3 bits x1, xand x3. Suppose we encode the message into the 7 bit codeword C = (x1, x2, x3, x1 ⊕ x2, x1 ⊕ x3, x2 ⊕ x3, x1 ⊕ x2 ⊕ x3). This is nothing but the well known Hadamard code. An interesting feature of this code is that any message bit can be recovered with locality 2. For example, suppose we want to recover the bit x1. We can recover x1 from the codeword C if at least one of the following 4 pairs of components of the codeword is not erased:

  • {x1}
  • {x1 ⊕ x2 , x2} 
  • {x1 ⊕ x3 , x3}
  • {x2 ⊕ x3 , x1 ⊕ x2 ⊕ x3}
In other words, even if 3 components of the codeword are erased, still we will have one decoding tuple to recover x1. Notice that the same holds for every other message bit. For example, the decoding tuples for x2 are:
  • {x2}
  • {x1 ⊕ x2 , x1}
  • {x3 ⊕ x2 , x3}
  • {x1 ⊕ x3 , x1 ⊕ x2 ⊕ x3}
And the decoding tuples for x3 are:
  •  {x3}
  •  {x1 ⊕ x3 , x1}
  •  {x3 ⊕ x2 , x2}
  •  {x1 ⊕ x2 , x1 ⊕ x2 ⊕ x3}
The above code can tolerate 3 erasures and we say that the code is an LCD code with query complexity/locality 2. Here query complexity means how many components of the codeword need to be queried to retrieve a particular message bit. In the above example, for every message bit, there are 4 decoding tuples, where the size of each tuple is at most 2.

Another interesting property of the above LCD is the following: each component of the codeword occurs uniformly in the decoding tuples, corresponding to each message bit. That is, each component of the codeword occurs same number of times in the decoding tuples of the message bits. For example, consider the fourth component of the codeword, namely x1 ⊕ x2 . It occurs in the decoding tuples of x1 once, in the decoding tuples of x2 once and in the decoding tuples of x3 once. The same holds for every other component of the codeword. This property of the LCD codes, called as the smoothness property makes it an important tool for cryptographic primitives like PIR.

The important parameters of the LCD codes are the codeword length n and the query complexity r. It is desirable to have both these parameters as small as possible. However, there is a trade-off between these parameters and determining the true shape of this trade-off is a major open problem in this area.

Now lets see how the LCD codes can be used to design information theoretically secure PIR scheme in the multiple server settings. We will again consider the above example. Before that, let me informally state the problem of PIR: there is a public database, replicated across several servers and available for the user access. A user can access any data in the database by querying the server. However, we would like to preserve the user privacy in that each server should
be completely oblivious of the data accessed by the user (we assume that the servers do not collude with each other). This problem can be considered as a weaker form of another important cryptographic primitive, namely Oblivious Transfer (OT). In OT, we also require that the user should not get any information about the other data items which he has not accessed; where as in a PIR scheme, the user can retrieve other items along with the item that he wants to access.

We can design a 2 server PIR scheme using the above example: let x be the data base, containing 3 bits and let S1 and S2 be the servers. Each server applies the LCD encoding on x and stores the codeword. Now suppose a user wants to access a particular data bit, say the bit x3 . The user randomly selects one of the 4 decoding tuples for retrieving x3. For example, suppose he selects the tuple {x3 ⊕ x2 , x2}. Then he queries the server S1 for the component x3 ⊕ x2 and the server S2 for the component x2. Once he obtains them, he can obtain x3 (notice that in this process he also obtains x2 but that is allowed in PIR). On the other hand, each server has no information about the data bit that the user has retrieved. From the view point of S1, the component x3 ⊕ x2 might have been queried for x1 or x2 or x3 with the same probability. Similarly, from the view point of S2, the component x2 might have been queried for x1 or x2 or x3 with the same probability.

There are several interesting open problems in the area of LCD. The interested readers can refer to the PhD thesis of Sergey (recently published as a book).

No comments:

Post a Comment