Saturday, July 16, 2016

ECRYPT-NET Workshop on Crypto Design for IoT: talk on physical security

IoT stands for "Internet of Things": thousands of interconnected devices sharing (sensitive) information and personal data. As many of them are small and embedded (not all: during a summary talk, Florian Böhl pointed out the existence of connected Caterpillars, for instance...these, not these), this directly translates to the need for a careful evaluation of threats due to side-channel attacks.

Benedikt Gierlichs gave a talk about such a crucial aspect of IoT deployment. He introduced the subject by means of possible applications and use-cases (many of which were the main focus of other talks) and by explaining common issues when securing IoT nodes. In particular, he gave a nice "equation" that succinctly describes them.

IoT device = embedded device + network

Although being a quite simplistic representation of nodes, the equation suggests a very interesting peculiarity of IoT devices within the security framework: the possible points of failure are more in number and also more dangerous than usual non-connected devices. As Ingrid Verbauwhede also remarked during the discussion phase, many of those devices are secure by themselves; it's the fact of being part of a network that raises security issues. Indeed network security adds a non-trivial challenge to the already tough work of securing an embedded device. Since such a discussion is prohibitively broad for a workshop talk (in fact spanned the whole workshop), Benedikt outlined three essential components in IoT security. The nodes need:

  1. good cryptography: self-explanatory;
  2. secure interfaces: nodes need to communicate among each other and with hubs, the cloud, servers, smartphones. Each of these exchange of information must happen in a formatted and standardised way, using protocols for instance. In these regards Johan Stokking, co-founder of The Things Network, said in his talk that many devices can't even support the IP protocol because it's too complicated. On top of this, at some point all the data should reach final users, for which secure GUIs and access points are needed;
  3. secure implementations.

Taken the first two points for granted, the remaining of the talk focused on the third one by providing introductory notions on side-channels analysis, in particular an overview on possible attacks (active/passive, invasive/non-invasive). The speaker remarked the number of things that can possibly go wrong even in the situation in which good crypto and secure interfaces are deployed. If such an assumption is dropped, the scenario is even scarier. In the end the moral was that, within the framework of IoT, "protecting against physical attacks is perhaps not the most pressing issue". Arguably the most pressing issue is depicted by the following picture.


The graph is not based on real data, making it somewhat informal (and accordingly, it's been drawn with an informal graphic tool). The x-axis represents the number of IoT devices and the y-axis carries an extremely informal notion of "percentage of security". The story told is that the majority of devices come with almost no security, and a very small part delivers very strong security. A lot of effort has been put to target the left-most part of the graph: developing really secure protocols and algorithms to make the latter, perhaps already reasonably robust, better. What it should be done (more) in order to ship secure products in every house is pushing the overall "percentage of security" up in (almost) all devices.

Thursday, July 7, 2016

WHEAT 2016: The Pre-history of Lattice-based Cryptanalysis

On the second day of the Workshop HEAT, Antoine Joux gave an invited talk on the history of cryptanalytic applications of  lattice-basis reduction algorithms.

Lattice is a discrete subgroup of $\mathbb{R}^n$. Equivalently, it is a span of integer-linear combinations of $\mathbb{R}$-linearly independent vectors in $\mathbb{R}^n$. Finding a short basis in $\mathbb{R}^2$ is relatively easy and Gauss' lattice-basis reduction algorithm is effective for this purpose. As the speaker emphasised, things become more complicated when we move to higher dimensions. He then went on to discuss the details of the famous LLL algorithm, invented by Lenstra, Lenstra and Lovász in 1982, for lattice-basis reduction in arbitrary dimensions. LLL is a polynomial-time algorithm but returns a short vector within an exponential approximation factor of the length of the shortest vector in the lattice. This algorithm, which performs well in practice, may be viewed as a combination of the Gauss lattice reduction and the Gram-Schmidt orthogonalisation methods.

An early application of lattice-basis reduction is the cryptanalysis of knapsack-based cryptosystems. The knapsack problem, a.k.a. the subset-sum problem, is given integers $a_1, \ldots, a_n$, and $S$, find $\epsilon_1,\ldots,\epsilon_n \in \{0,1\}$ such that 
\[S = \overset{n}{\underset{i=1}{\sum}} \epsilon_i \cdot a_ i.\]
This problem is NP-hard though some cases are easy, and it served as a basis for cryptosystems such as Merkle-Hellman cryptosystem. The basic idea is to hide an easy knapsack in a hard-looking one. However, at CRYPTO 1982, Shamir broke this cryptosystem using lattice-basis reduction. This attacks works for super-increasing knapsacks, where we have $a_i > \sum_{j=1}^{i-1} a_j$.  Other works starting with that of Lagarias-Odlyzko in 1985 lead to improved attacks on low-density knapsacks.

The speaker also briefly spoke about the application of lattie-basis reduction to the problems of

  •   reconstructing small linear dependencies: let  $(x_1,\ldots,x_n)$ be a sequence of real numbers, we need to find small coefficients $v_i$ such that $\sum v_i\cdot x_i =0$,
  • constructing polynomials for use in the Number Field Sieve for solving the Discrete Logarithm Problem, and
  • finding small roots of univariate and bivariate integer polynomials modulo composite numbers: Coppersmith's attack.

Tuesday, July 5, 2016

WHEAT Workshop 2016, Paris

The second HEAT workshop on Fully Homomorphic Encryption (otherwise known as WHEAT) kicked off this morning in (disappointingly cloudy) Paris.

The first talk of the day was Martin Albrecht's invited talk on solving short secret LWE instances. This is joint work with Rachel Player and Sam Scott. He provided a clear summary of the state of the art in terms of algorithms available and thus set the grounds for the rest of the day. Attacks using parameter choices inspired by HElib and the LP model were discussed and where possible, estimate running times were provided. A partial conclusion was the latter should not be assumed. For a more detailed read, see https://eprint.iacr.org/2015/046.pdf.

Also discussed was RLWE security in the FHE case, a talk given by Guillaume Bonnoron. This is joint work with Caroline Fontaine. They present an attack on specific FHE parameter choices and use the FV scheme for the reason that they wish to avoid doing a Modulus Switch Operation (see for example https://eprint.iacr.org/2015/889.pdf for a description). The conclusion of this talk is that their attack does work, but FHE and RLWE are not broken, because it does not scale up. A bigger parameter $n$ leads to bigger errors. Whilst they took one day to complete the attack versus a previous 3.5 days (https://eprint.iacr.org/2015/176.pdf), this talk's final conclusion was that more cryptanalysis is needed.

The second invited talk in the afternoon was Leo Ducas', joint work with Martin Albrecht and Shi Bai. This discussed the subfield lattice attack on "over stretched NTRU assumptions"; for the full technical read, see https://eprint.iacr.org/2016/127.pdf. This attack is based on using a subfield of the field we are working in - they assume the latter is a power of two cyclotomic, but claim it can be done using any field which is Galois. We map down our RLWE instances using the (partial) norm map in order to solve a (hopefully) easier problem in the subfield. Given the right conditions, we can then map the solution we find back up and recover a short enough solution to carry through with the attack. The practicality grows with the modulus $q$, hence the "over stretched" condition. Although this does not seem (yet?) to affect the NTRU schemes used in practice, this talk's conclusion was to drop the NTRU assumption altogether.

Tuesday, June 7, 2016

From IND-CPA to Robust AEAD: Security Notions for Symmetric Encryption

Edit 09/06/16: When I originally wrote this post, I was under the impression that the Robust AE notion had been widely adopted in the symmetric community. Since this is not yet the case, I have edited the post to emphasise that this is a new notion of Rogaway et al. and not a standard one.

This week I'm at the Summer School on Real-World Crypto and Privacy in a beautiful beach resort near Šibenik, Croatia. Obviously spending time by the hotel pool / Mediterranean sea is terribly dull and exhausting, so I was delighted that today I could instead attend a two-hour lecture by Phil Rogaway on security notions for symmetric encryption.

In all seriousness, Phil gave an excellent tour of the (recent) history of symmetric encryption research and I'll try to sketch some of the key points here.

OK, let's first recall the classical notions for symmetric encryption. We assume that, for any key $k$, encryption is a probabilistic function $E_k$ from the message space $\mathcal{M}$ to the ciphertext space $\mathcal{C}$ and that decryption is a deterministic function $D_k: \mathcal{C} \to \mathcal{M}$. Then the advantage of an adversary is the adversary's ability to distinguish between the oracles $E_k(\cdot)$ and $E_k(\$(\cdot))$, where $\$(\cdot)$ returns a random string of the same length as the input. This is called IND-CPA security.

The above notion captures privacy, but not authenticity. But in practice, people want to use a single encryption scheme to achieve both of these goals. So the security definition evolved to probabilistic Authenticated Encryption (pAE): first we allow the decryption function to return $\bot$, meaning the ciphertext is inauthentic. Then we define the privacy advantage just as in the IND-CPA game, but additionally consider the authenticity advantage, which is the probability that an adversary with access to $E_k(\cdot)$ outputs a forgery: a ciphertext that did not come from the encryption oracle and that is decrypted by $D_k$ to something other than $\bot$.

This definition still has practical problems. Firstly, it still requires probabilistic encryption and - as we all know - truly random coins are hard to come by. So we want to swap true randomness with nonces: numbers that should only be used once. Then encryption can be implemented using a deterministic function that takes a nonce as an extra input. Secondly, practitioners often need to transmit some header data along with the ciphertext, and this associated data needs to be authenticated but not encrypted. So here's yet another definition(!): a nonce-based AEAD scheme is a function that takes a key, a nonce, some associated data and a message and returns a ciphertext. The authenticity notion is defined essentially as in pAE, but the privacy notion is strengthened: the adversary is asked to distinguish between $E_k(\cdot,\cdot,\cdot)$ and $\$(E_k(\cdot,\cdot,\cdot)),$ i.e. the ciphertexts need to look like random strings, not just encryptions of random strings. These definitions assume that the adversary is nonce-respecting: the adversary never repeats the nonce in their encryption queries.

Of course, nonces are not always used just once, as we would like. To move even closer to the messy world of practical security, we introduce Misuse-Resistant Authenticated Encryption (MRAE) where the adversary can repeat nonces, as long as they don't repeat the entire nonce-data-message triple (otherwise they could trivially distinguish the corresponding identical ciphertexts from independent random strings).

"Surely that's the last one!", I hear you cry. Not quite. It's easy to show that MRAE requires significant ciphertext expansion: ciphertexts must be significantly longer than the plaintexts. This is a problem in certain lightweight applications like in the Internet of Things. So, accepting that there's a tradeoff between the amount of ciphertext expansion and the level of security, Phil and others have recently introduced robust AE (RAE), where the encryption function now has an additional argument specifying the amount of ciphertext expansion. One then tries to obtain the best possible security with that amount of expansion. I'll omit the details in the interest of space.

What was most interesting about Phil's talk was to learn how the evolution of the theoretical notions of security was driven by practical considerations. On the other hand, since the security goalposts seem to be moving all the time, perhaps practitioners will just stop trying to reach them.

Workshop on the Theory and Practice of Secure Multi-Party Computation 2016: Efficient Constant-Round Multiparty Computation

Inspired by Yehuda's talk, I decided to render the history 'flow' which lead today to efficient MPC against malicious parties with a $O(1)$ number of rounds and ending with descriptions of some current state of the art techniques.

First, let's begin with Yao's protocol [1]: $2$ parties - Alice and Bob - hold a function $f$ and they want to compute the output of $f$ with secret inputs. How is this possible? For this we introduce the notion of garbling a circuit of some $f$, $\mathcal{C}_f$ by writing $G(\mathcal{C}_f)$. A nice picture of how this process goes is found in these slides! After you have finished with the nice pictures, you can come back to see an example:

Suppose $f$ is a simple function which computes the addition of $2$ bits, with Alice's input 0, Bob's input 1. Alice computes the garbled circuit of $f$, $G(\mathcal{C}_f)$ and sends it to Bob along with the key corresponding to her garbled input: $k_{A_0}$. In order for Bob to compute the output of $f$ he needs the key corresponding to his input $k_{B_1}$, but he doesn't want for Alice to see his input. Luckily, there is a tool called OT (oblivious transfer) so the problem is solved.

Now, for the general case we can do all of these in $5$ rounds of communication: Alice sends to Bob $G(\mathcal{C}_f)$ (1), Bob does all the OT's in parallel (3) and at the end sends the output back to Alice (1). Great, let's go home now...well, not so fast: what if Charlie wants to join the party? Or Alice garbles a different circuit? Apparently we need something else for $3,4,\dots$ party computation.

Another approach appeared in 1990 and it's called the BMR protocol [2]. Assuming we have access to a protocol which generates random shared coins within $O(1)$ rounds of communication. BMR uses the coin generation to construct 'super-seeds' for wires and bits to mask parties inputs. Then, for each output wire is applied an one time pad using outputs from a PRG seeded by the input wire coins. Then some equations arise by treating individually each outcome of the gate depending on - $4$ possible - input wire labels $A_g, B_g, C_g, D_g$, but for more details we recommend the reader to go through the original article [2].

In 2015 it was published at Crypto the SPDZ-BMR variant [3] which in a nutshell replaces the random coins with calls to the SPDZ random functionality transforming for the first time BMR into a constant round protocol resistant to dishonest majority. Of course, there are many other security subtleties and optimizations presented in [3] such as using PRF's in prime fields instead PRG's in binary fields. SPDZ-BMR is organized in 2 phase offline step:
  1. Key wire distribution and input bit mask generation.
  2. Compute the wire labels $A_g, B_g, C_g, D_g$ of circuit $\mathcal{C}_f$.
The first step of the offline phase is very similar to the SPDZ offline phase for generating triples, bits, squares which will be later used in the online phase as well as in the step 2) of the offline phase! The computation of wire labels can be seen as evaluating a circuit in the MPC land with constant depth.

In the online phase the labels are revealed to all parties and then each locally computes the outcome of $G(\mathcal{C}_f)$. These all look nice but the scheme's major drawback is that it's cubic in the number of players which means $30$ parties is already too much. In these cases, GMW online phase based - like SPDZ - outperform the BMR in low circuit depth setting. The latest experiment using [4] proves that a vickrey auction with $100$ parties is practical.

In [5] the depth of the circuit shrunk with a slightly more computational overhead using a variant of the Gentry FHE-MPC protocol to only 3! Also the overall time complexity is now quadratic in the number of players.

As Yehuda said in his talk these approaches to constant round protocols are like 'low hanging fruits'. As a researcher, are you ready to pick one by combining different things a novel way? Can you find a scheme which is linear complexity in the number of players?

[1] Yao, Andrew C. "Protocols for secure computations." Foundations of Computer Science, 1982. SFCS'08. 23rd Annual Symposium on. IEEE, 1982.

[2] Beaver, Donald, Silvio Micali, and Phillip Rogaway. "The round complexity of secure protocols." Proceedings of the twenty-second annual ACM symposium on Theory of computing. ACM, 1990.

[3] Lindell, Yehuda, et al. "Efficient Constant Round Multi-Party Computation Combining BMR and SPDZ." Advances in Cryptology--CRYPTO 2015. Springer Berlin Heidelberg, 2015. 319-338.

[4] Keller, Marcel, Emmanuela Orsini, and Peter Scholl. MASCOT: Faster Malicious Arithmetic Secure Computation with Oblivious Transfer. Cryptology ePrint Archive, 2016. http://eprint.iacr.org/2016/505 .

[5] Lindell, Yehuda, Nigel P. Smart, and Eduardo Soria-Vazquez. "More Efficient Constant-Round Multi-Party Computation from BMR and SHE."

Tuesday, May 31, 2016

Workshop on the Theory and Practice of Secure Multi-Party Computation 2016: Stable Matching

Suppose we have two groups of people, $A$ and $B$, and each person in group $A$ lists each person in group in order of their preference for being partnered with, and vice versa. Is there a way of ‘optimally’ pairing each person in $A$ with a person in $B$? This was the focus of abhi shelat’s talk at the Workshop on the Theory and Practice of Secure Multi-Party Computation 2016 at the University of Aarhus, Denmark.

The problem is known as the Stable Marriage Problem and has wide field of application. Perhaps surprisingly, it can be shown that optimal solutions always exist. In fact, David Gale and Lloyd Shapely came up with an algorithm which constructs this matching (the achievement contributing in part to Shapely’s joint win of the Nobel Prize in Economics in 2012).

There are certain cases where it would be useful for the preferences made by each party to be kept secret. The application to the world of secure MPC then becomes clear. We were provided with two motivating examples.
  • Matching prospective medical residents with hospitals.
  • Matching women to sorority houses.

In these two cases, the data should be kept private. The latter example is based on a real-world instance of the problem in which, to avoid awkward social situations in which sorority houses received women whom they had not preferred, it transpired that one university had exported all of the data comprising lists of preferences to an impartial third-party in Texas, who could sort through it for them and make the assignment obliviously.

To perform the Gale-Shapely algorithm, we must have $O(n^2)$ arrays holding all of the preferences (where $n$ is the number of participants), and also an array holding the current matching. Additionally, loops in the algorithm need $O(n^2)$ time. As such, using garbled circuits turns out to be quite inefficient. The use of oblivious RAM (ORAM - you can think of this as something that interfaces between the CPU and physical RAM in a way that hides the precise data access pattern), enables a straightforward implementation, but again, not an efficient one. In an attempt to improve on this, abhi showed how to optimise the algorithm between 40 and 100 times.

First, we remove the necessity for ORAM arrays, which are used for: (1) checking the preferences, and (2) finding the next unmatched element. Analysing the algorithm and searching for optimisations allows (2) to be done through the use of a queue, which saves a lot in the implementation. The main optimisation, however, comes from a cheap way of performing (1), which in the original algorithm requires $O(n^2)$ space.

The contribution of the presented work consists of three particular insights leading to significant reductions in complexity of the algorithms:
  1. Operations on the matrix of preferences are only ever ‘read-only’ instructions. The ORAM data structure contains extra space in case of rewriting - this is not needed! Noticing this means we can asymptotically reduce the complexity of searching through these preferences.
  2. The preferences are only ever read in a specific order. Noticing this allows us to avoid the recursive lookups into the smaller ORAMs containing the preferences, which is expensive. This observation was made by Craig Gentry (here) for more general data structures.
  3. Asymptotically better initialisation of the data can be achieved. This is through the use of what they define as oblivious multi-lists, which, roughly, are (obliviously) sorted lists of preferences of each pair in $A\times B$. This multi-list allows a much faster matching to be made, and the cost is: $\Theta(n^2 \log^2 n)$ for $n$ sorts on $n$ elements, and then $\Theta(n^2 \log n)$ for the random permutation on $n^2$ elements.

These optimisations means it takes less than 2 minutes for 600 participants for the sorority house example given above, which is at least 40 times faster than a straightforward ORAM application.


In practice, the sizes of the sets $A$ and $B$ will not often be the same. This generalised problem was studied by Roth and Peranson and somewhat complicates things, but can still be dealt with. It was suggested that this could be an interesting avenue for further research.

Sunday, May 29, 2016

37th IEEE Symposium on Security and Privacy

The 37th IEEE Symposium on Security and Privacy took place last week in San Jose, California. Although there wasn't an awful lot to see in San Jose itself (direct quote from the Uber driver who took me to the hotel), the conference was full of people from all different backgrounds and interests, which made networking very interesting indeed.

This was my first big conference I had attended, so I wasn't quite sure what to expect. I definitely didn't expect six hundred people to turn up, which was quite overwhelming at first. It also meant the biscuits at break time went very quickly. However, the atmosphere was very enjoyable and I had a fantastic time listening to all the presentations and speeches from various members of the IEEE community.

The papers covered all different aspects of Security and Privacy, from a paper on the Formal Treatment and Implications for TLS 1.3 [1], to a Survey of Techniques against Telephone Spam [2]. After the three days of the conference, it split into six 'workshops'. I mostly attended MoST - the Mobile Security Technologies Workshop - but went to a couple of talks on the BioStar stream - the Workshop on Bio-inspired Security, Trust, Assurance and Resilience.

One of the most enjoyable talks of the conference was on a paper titled "Users Really Do Plug In USB Drives They Find" [3], which follows academics from the University of Illinois littering USB drives round the University campus, containing labels like 'Top Secret Information', or even 'All Exam Answers' (this experiment took place just before finals). Inside each drive were a variety of 'inconspicuous' documents, all html files in disguise linking to a survey that logged the time and filename to a server when users clicked on one of the files. They dropped 297 USB drives around, and 290 of these were picked up (or rather, no longer in their original position). From these 290 that were picked up, 135 people opened a file on them. This is a big security risk, as these USB drives could have malicious files on them, that could infect the host machine if plugged in and run.

The most interesting talk for me was "Dedup Est Machina: Memory Deduplication as an Advanced Exploitation Vector" [4]. Memory deduplication is a technique (default in Windows 8.1 and 10) that 'maps multiple identical copies of a physical page onto a single shared copy with copy-on-write semantics'. However, as writing to a shared page takes longer than writing to a normal page, this is a target for timing side channel attacks. The paper not only exploits this, but goes on to develop a JavaScript-based attack against the Microsoft Edge browser, that allows arbitrary memory read and write access in the browser using a reliable Rowhammer exploit.

Overall, the conference was rewarding and worthwhile experience, and I recommend the conference to anyone interested in the fields of Security and Privacy, or who want some free t-shirts from the various sponsors.



[1] http://www.ieee-security.org/TC/SP2016/papers/0824a470.pdf
[2] http://www.ieee-security.org/TC/SP2016/papers/0824a320.pdf
[3] http://www.ieee-security.org/TC/SP2016/papers/0824a306.pdf
[4] http://www.ieee-security.org/TC/SP2016/papers/0824a987.pdf