Sunday, April 26, 2015

52 Things: Number 29: What is the UF-CMA security definition for digital signatures?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. In this week we look at the security definition for signatures.

So Number 16 gave the details of the DSA, Schnorr and RSA-FDH signature schemes, but what is a signature scheme and what security properties should it achieve?

A signature scheme $\mathsf{S}$ is a tuple of algorithms $(\mathsf{KG},\mathsf{Sign},\mathsf{VRFY})$ such that:

  • $\mathsf{KG}$ is a randomised algorithm which outputs a secret key $sk$ and a public key $pk$.
  • $\mathsf{Sign}$ is a  (possibly) randomised algorithm which on input $sk$ and a message $m$ it outputs a signature $\sigma$
  • $\mathsf{VRFY}$ is a deterministic (non-stateful) algorithm which takes in the public key $pk$, a message $m$ and a signature $\sigma$ and returns 1 if $\sigma$ is a signature on $m$ and 0 otherwise

Signature schemes are used to prove the origin of a message. If a message has a signature on it, signed by Alice's secret key then it must have come from Alice. The advantage of using a signature scheme over a MAC (assuming good public key infrastructure) is that it can be verified by anyone and does not need any shared secrets.

Now for the signature to prove the origin of a message, it needs to be the case that someone without the secret key can not create a valid signature on a message he has not seen signed before. This is called UF-CMA security.

The game works as follows:

  1. The game runs $\mathsf{KG}$ to get (pk,sk)$
  2. The adversary $A$ is given $pk$ and can then send messages $m_i$ to the game and get back signatures $\sigma_i$ under the secret key $sk$
  3. $A$ must output a pair $(m^*,\sigma^*)$
$A$ is said to win the game if $\sigma^*$ is a valid signature on $m^*$ and $m^*$ is not the same as any of the $m_i$'s which $A$ asked the game to be signed. The advantage of the adversary in the UF-CMA game is defined as the probability that $A$ wins the game. The signature scheme $\mathsf{S}$ is said to be UF-CMA secure if the advantage is suitably small.

Tuesday, April 21, 2015

Its HUGE - RSA 2015

The state of the IT industry, and the security industry in particular, can be judged by just how big the RSA Conference is. This week RSA 2015 is being held in San Francisco and it is certainly the largest I have ever seen it to be. As usual we are in the Moscone Centre in downtown San Fran. But the conference is taking over the entire centre. The tracks are being held in Moscone West, whilst the main keynotes and the expo is held in Moscone South and North. It is certainly the biggest security Expo I have ever seen, and the place is packed with delegates.

As usual the conference started with some "star" giving a joke song about security. This year it was apparently someone of a programme called "Glee" singing a parody of Bowie's "Changes". We then had the usual keynotes. For those not used to the RSA Conference these can range from terrible self publicity of a company, through to really interesting and thought provoking philosophical discussions on all things security. The first keynotes this year were of the latter type.

Amit Yoran of RSA kicked off with a discussion of how our mindset of walls and perimeters is still dominant in the industry, despite this being known to be wrong for many years. His thesis was that we have the technology to prevent all the high profile attacks of the last year, but we lack the mindset. I thought this was a very thought provoking talk, and well worth catching up on the video if you were not there.

We then had Scott Charney of Microsoft, who concentrated on the idea of us not only wanting security, privacy and reliability, we also require control and transparency. This is particularly true of cloud services; since we are not expecting to give over control of our (individual and company) data to cloud providers. We want the controls on this data, and how those controls are exercised, to be done in a transparent manner.

Taking up the idea that we simply have the wrong old fashioned mindset, the next talk by Christopher Young of Intel looked at how sporting teams have changed the way they work by examining scientific evidence. He concentrated on some team doing baseball (which I am led to believe is an inferior form of cricket, close to the "girlie game" of rounders). Apparently there is some film out about a baseball team which improved by using statistics. The key point being that the team improved by changing the standard way of working, and this was achieved by processing the large amount of data which was available.

This was followed by the usual highlight of the RSA Conference; namely the Cryptographers Panel. This year Paul Kocher chaired, with the panel made up of Whit Diffie, Ron Rivest, Adi Shamir and Ed Giorgio (ex NSA). Whit was in particularly good form with a number of interesting view points and remarks which drew chuckles from the audience. The panel considered a number of issues; from Snowden to Bitcoin to IoT.

The award for Mathematical Excellence this year was shared between Ivan Damgard and Hugo Krawcyzk.  Ron gave a nice little talk linking the work of these two excellent cryptographers by their work on hash functions.

So what did I learn from the first day of RSA? Well mainly I am out of touch with modern culture. All the references to movies and TV programmes went over my head.

Friday, April 17, 2015

52 Things: Number 28: What is the IND-CCA security definition for public key encryption?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. We discuss IND-CCA security for public key encryption.

IND-CCA security stands for Indistinguishable Chosen Ciphertext Attack. The idea behind it is that in a secure encryption scheme, given a ciphertext, an adversary should not be able to tell what message the given ciphertext encrypts. In this model, the adversary is allowed to call the encryption and decryption oracles both before and after steps 3 and 4 below. The find-then-guess security game for public key IND-CCA is the following.

1. Generate the public and secret keys $(p_{k}, s_{k})$. The adversary A has access to the public key $p_{k}$

2. Assign $b \leftarrow \{ 0, 1\}$ privately

3. A is allowed to query the decryption oracle $Dec_{s_{k}}$ and the encryption oracle $Enc_{p_{k}}$

4. A then outputs a pair of messages $(m_{0}, m_{1})$

5. We output the encryption $c=Enc_{p_{k}}(m_{b})$

6. The adversary is allowed to enquire for more encryptions or decryptions, as in step 3, but he is not allowed to ask for the decryption of $c$

7. A outputs $b' \in \{ 0, 1 \}$. A wins if $b=b'$

We say the advantage of A is $Adv(A) = 2\mid Pr[A$ wins$] - 1/2 \mid$. A scheme is said to be IND-CCA secure if the said advantage is negligible.

There is a different version of IND-CCA, real-or-random, mentioned by Gareth in last week's post. The difference is at step 5 above, where instead of outputting the encryption of the message $m$ A asks for every time, we output an encryption of a random $m'$ of length $\mid m \mid$ if $b=0$, and an encryption of $m$ otherwise. A must then distinguish if he is in the "real" or "random" world. Advantage and security are defined similarly.

The two definitions are equivalent in the sense that if a scheme is IND-CCA secure in the real-or-random sense against an adversary A, we can construct an adversary B for the find-and-guess such that both advantages are equal. Similarly, if a scheme is find-and-guess secure against an adversary A, we can construct an adversary B such that $Adv_{find-and-guess}(A)=2 \cdot Adv_{real-or-random}(B)$.

Friday, April 10, 2015

52 Things: Number 27: What is the AEAD security definition for symmetric key encryption?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This post will kick off the 'Security Definitions and Proofs' section with a brief overview of Authenticated Encryption.

In a recent post Luke described a number of well-used modes of operation (ECB, CBC and CTR) for blockciphers, modes that provide privacy (confidentiality) only. We may also want integrity from our encryption mechanism, meaning that the recipient is assured that the message it receives is the one sent without accidental changes or intentional tampering, and authenticity meaning that the receiver is convinced of the origin of the message. To get these additional goals we often use a message authentication code (MAC), and the most widely used are those based on blockciphers and those based on hash functions (HMAC). Putting these two primitives together is non-trivial: to get an IND-CCA secure scheme we need to follow the 'Encrypt-then-MAC' paradigm with a secure encryption scheme and a strongly unforgeable MAC, meaning computing the MAC on the ciphertext (see here and here for more info on Encrypt-and-MAC and MAC-then-Encrypt, with a focus on why one should avoid them). The 'AD' refers to variable-length associated data such as packet headers, and we normally expect authenticity and integrity but not confidentiality from this optional component.

Next week's blog post will see an in-depth overview of IND-CCA2 security in the context of public-key encryption. The 'real-or-random' definition of IND-CCA2 (and IND-CCA1) gives the adversary access to an encryption oracle, which has an encryption key hardwired and on input message $m$ returns either a 'real' encryption $\mathsf{E}_k (m)$ or 'fake' $\mathsf{E}_k (\$^{|m|})$, and a decryption oracle that given a ciphertext $c$ will return $\mathsf{D}_k (c)$ - the adversary is then asked to distinguish which world he is in. In 2004 Shrimpton showed that a new notion dubbed IND-CCA3, where the decryption oracle in the 'fake' world is replaced by an oracle that always returns the invalid symbol $\perp$, is equivalent to the previously considered notion of AE, where the notions of privacy and authenticity/integrity are looked at separately. This observation was incorporated into Rogaway and Shrimpton's paper on the keywrap problem and Deterministic Authenticated Encryption. For more information on the impact of associated data, see here and here.

In practice, a large proportion of traffic uses CCM mode, which is a combination of a blockcipher in counter mode with CBC-MAC with the MAC-then-Encrypt approach, and GCM which uses Encrypt-then-MAC with a blockcipher in counter mode and a polynomial-based hash function called GHASH. CCM is comparatively inefficient as it requires two blockcipher calls per message block and is not online (message length needs to be known before processing can occur), and as this paper by Saarinen shows, GCM has some weak keys.

The CAESAR competition is currently in progress, with the aim of selecting a portfolio of authenticated ciphers for recommendation based on thorough academic public scrutiny. One of the main aims is to get more researchers thinking about such a vital topic, and the large number (and varied nature) of first round submissions indicates this goal has already been achieved. The second round candidates are expected to be announced next week, and an overview of the submissions can be found at the AE Zoo which is run by a number of researchers from DTU.

ECDLP Over Fields of Small Characteristic

Last week Igor Semaev distributed a paper which seems to give a new, more efficient, algorithm to solve the Elliptic Curve Discrete Logarithm Problem (ECDLP) on curves defined over fields of small characteristic. This is important as such curves are used in a number of protocols and products. Whilst not devastating (yet), the algorithm does vindicate the communities preference for curves defined over large prime fields.

Recall the ECDLP is to solve the following equation for z,
$$ Q = [z] P $$
where $P$ and $Q$ are given points on the curve $E({\mathbb F}_q)$ over the field ${\mathbb F}_q$.

Like many modern algorithms for the ECDLP over curves of small characteristic the algorithm makes use of the "summation polynomial". In particular it makes use of the third summation polynomial $S_3(x_1,x_2,x_3)$. This polynomial has a zero $(x_1,x_2,x_3)$ if and only if there exists $y$-coordinates $y_1,y_2$ and $y_3$ such that $P_i=(x_i,y_i)$ is a point on the curve and
$$ P_1+P_2+P_3 = {\mathcal O}$$.

The algorithm is very very simply, and is given by the following steps, (see page 4 of Semaev's paper).
  1. Pick an integer value $m$ and a subset $V$ of ${\mathbb F}_q$ of size $q^{1/m}$.
  2. Generate a set of "relations" as follows:
    1. Pick random  integers $u$ and $v$ and compute the point $R= [u] P + [v] Q$.  If $R={\mathcal O}$ then we immediately solve the ECDLP, so assume $R \ne {\mathcal O}$.  Write $R$ as $(R_X,R_Y)$
    2. If $R_X \in V$ then we have a relation $[u] P + [v] Q - R = {\mathcal O}$.
    3. For $t=2,\ldots,m$ try to solve the following system \begin{align*} S_3(u_1,x_1,x_2) &= 0 \\ S_3(u_i,u_{i+1},x_{i+2}) &=0 \mbox{  for } 1 \le i \le t-3 \\ S_3(u_{t-2},x_t,R_X)&=0 \end{align*} for $x_1,\ldots,x_t \in V$ and $u_1,\ldots,u_{t-2} \in {\mathbb F}_q$.
    4. If successful we can find $y_1,\ldots,y_t \in {\mathbb F}_q$ such that $$(x_1,y_1) + (x_2,y_2) + \ldots + (x_t,y_t) + [u] P + [v] Q = {\mathcal O}$$.
  3. As soon as we have enough relations we can solve for $z$ using linear algebra. 
The last step is standard in so called "index calculus" algorithms. The key innovation is in the second step. The idea is that we try to solve the system defined by the $S_3$ summation polynomials; which have low degree; rather than attack the $S_m$ polynomial in one go.  Semaev provides strong evidence that this step will be amenable to so called Grobner Basis algorithms; by analysing the "first fall degree" assumption for such a system. 
The expectation is that this method will be able to "break" various curves in small characteristic currently defined in cryptographic standards. I expect experimental validation of this expectation, or a reason why such experiments fail, to come in the next few months.

UPDATE (23/04/2015):

More updates on this potential algorithm:

Friday, April 3, 2015

52 Things: Number 26: Describe the NAF scalar multiplication algorithm.

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. In this week, we describe the NAF scalar multiplication algorithm.

The NAF scalar multiplication algorithm is an enhanced form of scalar multiplication, in which the use of Non-Adjacent Form (NAF) representation decreases the expected running time of the algorithm. In more detail:

Let k be a positive integer and P a point on an elliptic curve E over the field $\mathbb{F}_{q}$. The operation that computes the multiplication $Q = k\cdot P$ is known as the elliptic curve scalar multiplication (or point multiplication) and dominates the execution time of elliptic curve cryptographic schemes. One of the most basic methods to compute a scalar multiplication is based on a double-and-add variant of Horner's rule, known as the binary method. As the name suggests, the two most prominent building blocks of this method are the point doubling and point addition primitives. More particularly, to calculate $k\cdot P$, integer k is represented as $k=k_{n-1}2^{n-1}+k_{n-2}2^{n-2}+ \cdots +k_{1}+k_{0}$ , where $k_{i} \in \{0,1\}$, $i= 0,1,2...,n-1$ (binary representation form). The following algorithms form the additive versions of the basic repeated-square-and-multiply methods for exponentiation [1,2].

INPUT: k = (kt1,..., k1k0)2, P E($\mathbb{F}_{q}$). 
OUTPUT: k $\cdot$ P.
       For i from 0 to t1 do
           If ki  = 1 then QQ+P.

INPUT: k = (kt1,..., k1k0)2, P ∈ E($\mathbb{F}_{q}$). 
OUTPUT: k $\cdot$ P.
       For i from t−1 down to 0 do
             If ki = 1 then QQ+P.

The first algorithm processes the bits of k from right to left, while the second processes the bits from left to right. The expected number of ones in the binary representation of k for both  algorithms is t/2≈m/2, therefore the expected running time of both algorithms is approximately m/2 point additions and m point doublings [1]: 

$\frac{m}{2} \cdot A + m \cdot D$.

In 1951, Booth [3] proposed a new scalar binary representation called signed binary method and later Rietweisner [4] proved that every integer could be uniquely represented in this format [5]. More particularly, If P=(x,y)$\in$E($\mathbb{F}_{q}$) then −P=(x,x+y) if $\mathbb{F}_{q}$ is a binary field, and −P=(x,−y) if $\mathbb{F}_{q}$ has characteristic > 3. Thus subtraction of points on an elliptic curve is just as efficient as addition. This motivates us to use a signed digit representation $k=\sum_{i=0}^{l-1} k_i \cdot 2^i$, where $k_i \in \{0, \pm \}$. A particularly useful signed digit representation is the non-adjacent form (NAF). A NAF representation of a positive integer k is an expression $k=\sum_{i=0}^{l-1} k_i \cdot 2^i$ , where  $k_i \in \{0, \pm \}$ , $k_{l-1} \ne 0$ , and no two consecutive digits ki are non-zero. The length of the NAF is l [1].

Properties of NAF [1]:
  1. k has a unique NAF denoted NAF(k). 
  2. NAF(k) has the fewest non-zero digits of any signed digit representation of k. 
  3. The length of NAF(k) is at most one more than the length of the binary representation of k. 
  4. If the length of NAF(k) is l, then 2l /3 < k < 2l+1/3.
  5. The average density of non-zero digits among all NAFs of length l is approximately 1/3. 
NAF(k) can be efficiently computed using the following algorithm [1]:

INPUT: A positive integer k.
       While k≥1 do
             If k is odd then: ki ←2−(k mod 4), kkki;
             Else: ki ←0.
             kk/2, ii+1.
Return(ki1ki2,..., k1k0).

The last algorithm presents the way we can modify the left-to-right binary method for scalar multiplication by using NAF(k) instead of the binary representation of k [1]:

INPUT: Positive integer k, P E($\mathbb{F}_{q}$). 
OUTPUT: k $\cdot$ P.
       Based on previous algorithm compute NAF(k) =$\sum_{i=0}^{l-1} k_i \cdot 2^i$
       For i from l−1 down to 0 do
             If ki  = 1 then QQ+P.
             If ki  = −1 thenQQP.

Based on the third and fifth properties of NAF, the expected running time of the NAF scalar multiplication algorithm is approximately [1]:

$\frac{m}{3} \cdot A + m \cdot D$.

[1] Hankerson, Darrel, Scott Vanstone, and Alfred J. Menezes. "Guide to elliptic curve cryptography". Springer Science & Business Media, 2004.
[2] Jonathan Taverne, Armando Faz-Hernández, Diego F. Aranha, Francisco Rodríguez-Henríquez, Darrel Hankerson, Julio López. "Speeding scalar multiplication over binary elliptic curves using the new carry-less multiplication instruction", Journal of Cryptographic Engineering, Vol. 1, No 3, pp. 187-199, 2011.
[3] A.D.Booth, “A Signed binary multiplication technique”, Journal of Applied Mathematics, Vol. 4. No. 2, pp.236-240, 1951 
[4] G.W.Reitwiesner, “Binary Arithmetic”, Advances in computers, Academic Press, Vol. 1, pp.231-308, 1960
[5] Karthikeyan, E. “Survey of elliptic curve scalar multiplication algorithms.” International Journal of Advanced Networking and Applications, Vol. 4, No 2, pp. 1581-1590, 2012

Wednesday, April 1, 2015

TCC 2015: Separations in Circular Security for Arbitrary Length Key Cycles

Alice and Bob are in a committed relationship, so as the kids are doing these days they decide to share their secret keys as a totally healthy and not at all creepy or misguided expression of trust. Since millennials understand the importance of cybersecurity, they don't want to send these keys in the clear. So Alice encrypts her secret key under Bob's public key, and he encrypts his secret key under her public key. This is called a key cycle, and it's a weird situation not covered by the standard IND-CPA security game.

Is this a problem though? Can you spot a key cycle in an IND-CPA secure scheme? The notion of circular security asks an adversary to distinguish a key cycle from encryptions of zero under each of the keys. Alice and Bob might think they're safe encrypting each other's keys. But at Eurocrypt in 2010 Acar et al. showed a scheme that is IND-CPA secure, but isn't circular secure for a key cycle of length 2. So IND-CPA security doesn't imply 2-circular security, and Alice and Bob are going to need an encryption scheme specifically designed to have circular security.

What about for longer key cycles? Luckily this is a TCC blog post, so we can leave Alice and Bob and their ridiculous motivating scenario behind, and simply ask theoretical questions because we're curious about the answers. Koppula, Ramchen and Waters settled this particular question once and for all by showing that for any n there exists an encryption scheme that's IND-CPA secure but not n-circular secure. Of course the solution involves indistinguishability obfuscation.

They take an IND-CPA secure scheme and modify encryption to include an obfuscation of a program that detects key cycles, giving a scheme that's clearly not circular secure. To prove IND-CPA security still holds they modify the scheme again so that secret keys additionally contain a random value and public keys contain the evaluation of a PRG at that point. This separates keys into real and fake public keys, which are indistinguishable without seeing the corresponding secret keys. The obfuscated cycle finding program can then test for a cycle ending in a real key. Hop from the real scheme to one with fake keys by the PRG security, and from the obfuscated cycle detection program to an obfuscation of a program that always outputs 0 by iO security, and reduce to the IND-CPA security of the underlying scheme. The paper explains this all in detail and would be a nice first proof if you want to get to grips with constructions involving iO.