Bristol Cryptography Blog: August 2015

Thursday, August 27, 2015

52 Things: Number 47: What is the Fiat-Shamir transform?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. Having introduced Sigma Protocols, we now look at an important method for ensuring they are zero knowledge. Thanks again to David Bernhard for his assistance.

What is the Fiat-Shamir transform?

Sigma protocols, which we saw last week, are fast and useful protocols for Alice to prove something to Bob - as long as they're both online at the same time. Alice sends Bob a commitment, Bob replies with a challenge and Alice sends a response. Unfortunately bar further modifications, Sigma protocols are not actually known to be zero-knowledge: they are only honest-verifier zero- knowledge.

The Fiat-Shamir transform is a technique to turn a Sigma protocol into a non- interactive proof. This not only lets Alice send a proof to Bob by e-mail, which he can read later without having to send her a challenge back, it also lets her turn any Sigma protocol into a digital signature scheme with which she can assert "someone who knows the secret for this Sigma protocol has signed that message". Alice can create a signature once and post it to a usenet bulletin board and everyone who sees the signed message can check the signature without having to contact Alice. And zero-knowledge comes for free, since Bob or any other reader no longer has to do anything.

Although this technique is explained in Fiat and Shamir's 1986 paper, several eminent cryptographers have pointed out in the past that it was actually given by Blum in an even earlier work, though we have not been able to trace this.

A Sigma protocol can be implemented as four algorithms *Commit*, *Challenge*, *Respond* and *Check*, to be executed as follows:

Alice                                Bob
-----                                -----
    
co,st = Commit(secret,public)
         ---------- co --------->
                                     c = Challenge()
         <--------- c  ----------
r = Respond(st,c)
         ---------- r  --------->
                                         Check(co,c,r)

For the Fiat-Shamir transformation, Alice picks a hash function $H$ and uses it to create the challenge herself:

Alice                                World
-----                                -----

co, st = Commit(secret,public)
c = H(public,co)
r = Respond(st,c)
         ------ co,r ----------->
                                     c = H(public,co)
                                     Check(co,c,r)

If Alice wants to sign a message m, she includes that in the hash: c = H(public, co,m) and posts (m,co,r) as her signed message.

Why does this work? If $H$ were a random function, then the challenge is clearly uniformly random and independent of Alice's public information and commitment. The security analysis considers an Alice who does not have access to the code of $H$ directly, only to an oracle for $H$. In this case, the probability of Alice making a correct response without following the protocol (especially if she does not know the necessary secret) is proportional to the inverse of the size of the range of $H$, that is if $H$ has domain X and range Y then someone without the the secret who makes up to q $H$ -calls has at most a q/|Y| probability of making a r-value that *Check* accepts. Typically |Y| = 2^n for a decently large value of n, so this probability is negligible.

Some people will tell you that this style of analysis, known as the Random Oracle Model (ROM), is deeply flawed because there's an artificial counter- example of a scheme that is secure in the ROM but is insecure for any actual hash function $H$. What this counter-example shows is that if you go to enough effort to make a stupid scheme, you can end up with a stupid scheme. In practice, Fiat-Shamir has been known since at least 1986, used in several practical applications and stands unbroken to this day (if done properly). No- one has to date proposed a workable attack on a Fiat-Shamir transformed Sigma protocol that was not deliberately designed to be stupid, which is more than can be said for quite a few other cryptographic schemes.

Thursday, August 20, 2015

52 Things: Number 46: What do correctness, soundness and zero-knowledge mean in the context of a Sigma protocol?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This week we enter the final phase, introducing Sigma Protocols as the first 'advanced protocol'. Thanks to David Bernhard for his assistance in writing this blog.

Sigma protocols

Sigma protocols are protocols for Alice to prove something to Bob (that she knows some secret). They have the following general form: Alice knows a secret; Alice and Bob both share some common information. Then:

Alice sends a value to Bob, known as a commitment.
Bob picks a challenge uniformly at random and sends it to Alice.
Alice computes a response and sends it to Bob.
Bob checks the response and accepts or rejects Alice's claim.

Legend has it that if you draw a diagram of this in some way, it looks like the Greek capital letter Sigma, hence the name Sigma protocol.

In cryptography, the properties we expect a Sigma protocol to support are:

Correctness If everyone does what they are supposed to, then Bob accepts.
Soundness If Alice lies, then Bob can tell (she can't trick him into accepting something false)
Zero-knoweldge If Alice tells the truth, Bob can't learn what her secret input was.

A more formal treatment

Having provided a rough overview, we now provide a more formal treatment, based on David's PhD thesis "Zero-Knowledge Proofs in Theory and Practice".

Defining a Sigma Protocol

Let $k$ be a field. We're interested in linear functions $f: W \to X$ from one $k$- vector space to another, where Alice and Bob both know some public $x \in X$ and Alice also knows a secret preimage $w \in W$ such that $f(w) = x$. Alice wants to prove to Bob that she knows a preimage of $x$.

Quite a lot of cryptography is done over elliptic curves nowadays. An elliptic curve is a group of points of the form $P = (x,y)$ and a special point "at infinity" that is an honorary member of the curve. These points satisfy some equations and most importantly, one can add two points on the curve to get another and this addition makes the curve into a group. One often starts with a large prime $p$, works over the base field $k = \mathbb F_p$ and considers points $P$ in $E_p = E \cap \mathbb F_p \times \mathbb F_p$.

Many elliptic curve protocols start out by generating keypairs using point multiplication: everyone agrees on a common, public base point $P$. Then one can pick a secret key $x \in \mathbb F_p$ and compute the related public key $Y = x \cdot P$. If Alice wants to register her public key somewhere, it is good for security if the registrar asks her to prove that she knows the associated secret key, or else she could register someone else's key as her own. But of course Alice should not reveal her secret key to the registrar. She could, for example, use a Sigma protocol to prove that she knows the secret key that she claims to know, without revealing it.

If one takes $W = \mathbb F_p$ as a $k$-vector space and $X = E_p$ then point multiplication for a fixed base point, specifically $f: W \to X, w \mapsto w \cdot P$ is a linear function. The same can be said for "matrix product" functions. Suppose that Alice has a secret key $x$ and a public key $Y = x \cdot P$, and someone sends her an ElGamal ciphertext $(C,D)$ for her key. She wants to decrypt this and prove that she has decrypted it correctly, one way to do this is to compute a decryption share $S = x \cdot C$. The decryption is then $D-S$ which anyone can compute from $(C,D)$ and $S$. So Alice want to show that she knows an $x$ meeting the two equations $Y = x \cdot P$ and $S = x \cdot C$, so we set $W = k, X = E_p \times E_p$ and $f(x) = (x \cdot P, x \cdot C)$ which is a linear function $f: W \to X$.

The Sigma protocol for $f: W \to X$ is the following construction. Alice knows $(x, w) \in X \times W$ with $f(w) = x$ and Bob knows $x$. 1. Alice picks $r \in W$ at random, sets $A = f(r)$ and sends $A$ to Bob. This is the commitment (Alice is committing to $r$). 2. Bob picks $c \in k$ at random and sends it to Alice. This is the challenge. 3. Alice computes the response $s = r + c \cdot w$ in $W$ (the product is scalar multiplication in $W$) and sends $s$ to Bob. 4. Bob accepts if $f(s) = A + c \cdot x$ in $X$.

Let's look at the three security properties for Sigma protocols in more detail:

Correctness

In just about any protocol (cryptographic or otherwise), correctness means that if everyone follows the protocol then it does what it should. In the context of Sigma protocols, this means that if Alice and Bob do as told then Bob accepts; this is true because $f$ is linear.

Soundness

Soundness means that Alice cannot prove a false statement. This trips up a lot of people because the first protocol they see is Schnorr's protocol for proving that $y = x \cdot P$, so Alice is proving that such an $x$ exists. But that is obvious! (Alice is also proving that she knows $x$, which is more interesting but that's another property.) But let's look at the other example, Alice proving that $S$ is a correct decryption share for $C$ under public key $Y$. Here, Alice is proving that an $x$ exists such that $Y = x \cdot P$ and $S = x \cdot C$ which is not true for all tuples $(P, C, Y, S)$. What's actually going on is that the image of $f$ is a one-dimensional subspace of the two-dimensional $k$- vector space $X$. In our formalism, soundness means that Bob does not accept (except with perhaps negligible probability) unless $x$ is in the image of $f$ (that is, a preimage $w$ exists such that $x = f(w)$).

Sigma protocols are sound. Actually, they are even more than that, they have a property called special soundness. Informally, consider the point in time when Alice has just sent her commitment $A$ to Bob. For which values of $c$ can Alice possibly find a $r$ that Bob will accept? If there is at most one such value then Alice gets Bob to accept overall with probability at most $1/|k|$. Usually, $|k|$ is exponentially large (it's the prime $p$ that we started with) so $1/|k|$ is negligibly small. Special soundness says that if Alice can conveince Bob even for only two out of $|k|$ possible challenges, then a preimage $w$ must exist. Suppose that on challenge $c$ Alice would reply $s$ and on challenge $c' \neq c$ Alice would reply $s'$, and Bob would accept both of these. Then set $d = (c-c')^{-1}$ which we can do as $k$ is a field and $c \neq c'$, and a bit of linear algebra shows that $w = d \cdot (s-s')$ where the product is again $W$-scalar multiplication satisfies the equation $x = f(w)$.

Zero-Knowledge

Bob is happy because the protocol is sound. Alice still needs to know that Bob can't learn $w$ from the Sigma protocol. Actually, Alice probably wants even more than that: after running the protocol with Bob, Bob should not be able to prove to Charlie that he knows Alice's secret $w$ either. Zero-knowledge says even more than that, Bob does not learn anything from the protocol except that Alice knows $w$ (and therefore, that $w$ exists and $x$ is in the image of $f$).

The proof that Sigma protocols are zero-knowledge ... does not exist! Contrary to what one might guess after learning about Sigma protocols in the textbook chapter on zero-knowledge, Sigma protocols in general are not zero-knowledge, which beginning students of cryptography would do well to remember for their exams. (They meet a weaker requirement called honest-verifier zero-knowledge.)

Discussing Sigma protocols in the context of zero-knowledge isn't completely arbitrary though: one can make them zero-knowlege in several ways, the most practical of which is making them non-interactive. But that's next week's topic...

Friday, August 14, 2015

52 Things: Number 45: Describe some basic (maybe ineffective) defences against side channel attacks proposed in the literature for RSA.

Let's start by recalling the vanilla RSA encryption scheme.
Key generation picks a secret pair of odd prime integers, $p$ and $q$, and computes the modulus $N = pq$. An integer $3 \leq e \leq \phi(N)$, coprime to $\phi(N)$, is chosen. Then the public key is the pair $\left(N, e\right)$ and the secret key is the unique integer $3 \leq d \leq \phi(N)$ such that $ed \equiv 1 \: \left( \mathrm{mod} \: \phi(N) \right)$.
To encrypt a message $m \in \mathbb{Z}_N$, one computes $m^e \: \left( \mathrm{mod} \: N \right)$.
To decrypt a ciphertext $c \in \mathbb{Z}_N$, one computes $c^d \: \left( \mathrm{mod} \: N \right)$.

An SPA-Style Attack to Determine the Secret Key

We first give an example of how information about the secret key $d$ could be leaked during the decryption operation. A typical implementation of exponentiation will be a branching program that behaves differently depending on the inputs. Consider, for example, the square and multiply algorithm which efficiently computes an exponentiation where the exponent is expressed in binary:

Say $d = \sum_{0\leq i \leq n} b_i 2^i$ where each $b_i \in \lbrace 0,1 \rbrace$. Then,
\[ c^d = \prod_{0 \leq i \leq n} c^{b_i 2^i }.\] Noting that, if we ignore the bits $b_i$, each factor in the product is the square of the previous factor, we can easily compute the product as follows:

$ANS \leftarrow 1$
$fac \leftarrow c$
For $0 \leq i \leq n$, do

If $b_i = 1$,

$ANS \leftarrow ANS \times fac$
$fac \leftarrow fac^2$

Else,

$fac \leftarrow fac^2$

Return $ANS$.

The algorithm behaves differently depending on whether each $b_i$ is $0$ or $1$. Therefore, if this algorithm is used to decrypt an RSA ciphertext, the time it takes or its power consumption could reveal the value of each bit $b_i$, thus revealing the secret key $d$. This would be an SPA-style attack since only one trace would be needed.

In order to prevent this kind of attack, one must make the two branches of the algorithm look the same to an attacker, i.e. have both branches of the square and multiply algorithm take the same amount of time to run or consume the same amount of power.

An SPA-Style Attack to Determine the Plaintext

The above shows how the exponentiation operation used in decryption could compromise the secret key. But an attacker may be interested in a particular plaintext, $m$ (after all, encryption is designed to keep such messages secret). Again, if the encryption operation is a branching program which depends on the value of $m$, the runtime or power consumption of a single encryption could leak information about $m$ in an SPA-style attack. In particular, note that one has to perform a modular reduction in encryption. In most implementations, instead of a single reduction of a very large integer at the end of the exponentiation, there will be many modular reductions throughout the exponentiation algorithm in order to keep the numbers involved (relatively) small. As a slightly contrived example, suppose we perform modular reduction via the following loop:

While $ANS > N$

$ANS \leftarrow ANS - N$

which, since the exponent is known in the case of encryption, leaks information about the base $m$ depending on how long it takes to run and how much power it consumes (cf. David's post about attacks on Montgomery Arithmetic).

Again, to prevent this kind of attack we must ensure that our programme takes the same amount of time and consumes the same amount of power to reduce intermediate values modulo $N$, no matter what size they are (up to some upper bound which can easily be found since we know the exact exponent and range of values for the base).

Preventing DPA-Style Attacks on the Secret Key

Even if we obscure any branching in decryption that depends on $d$, the precise details of the operations performed in carrying out the exponentiations for decryption will still depend (in a less obvious way) on the exponent. So, over multiple decryptions, a statistical relationship between the decryption key and the duration or power consumption of the operations may emerge. Therefore we also need to prevent more subtle DPA-style attacks where the attacker uses statistical techniques on a large number of traces to test hypotheses on the secrect key.

To do this, we have to remove the direct dependency between the secrect key and the calculation performed each time. This involves blinding, where some random noise is injected into the exponentiation operation without affecting the result. In the case of decryption, we introduce randomness in the exponent: while $d$ is the unique multiplicative inverse of $e$ in $\mathbb{Z}_{\phi(N)}$ such that $3 \leq d \leq \phi(N)$, we can add or subtract integer multiples of $\phi(N)$ to $d$ and obtain a new inverse. So to decrypt $c \in \mathbb{Z}_N$, select a random $r \in \mathbb{Z}$ and compute $d' = d + r\phi(N)$. Then compute the message $c^{d'} \: \left( \mathrm{mod} \: N \right)$ which will be the same as directly computing $c^d \: \left( \mathrm{mod} \: N \right)$. The point is that addition is not usually a branching operation, so adding $r\phi(N)$ to $d$ will not leak information about $d$ on a single trace, and using a new random $r$ for each decryption prevents DPA-style attacks.

Coppersmith's SPA-Style Attack to Determine the Plaintext

We conclude this post with a special and interesting attack to determine $m$ which is only possible when $e$ is small ($e=3$ is a popular choice of exponent for efficiency of encryption). There is a theorem due to Coppersmith (see this article) that an attacker can efficiently find all 'small' integer roots modulo $N$ of an integer polynomial $f$ of degree $e$, where small essentially means having absolute value less than $N^{1/e}$. Obviously if $m$ happens to be small then one can use this to directly solve the equation $m^e \equiv c \: \left (\mathrm{mod} \: N \right)$ as required to recover the plaintext. If not, but some of the most significant bits of $m$ are leaked, then we may write $m = m_k + m_u$ where $m_k$ is known and $m_u$ is small, obtaining an integer polynomial $f(X) = \left( m_k + X \right)^e - c$ whose small roots modulo $N$ can be found via the Coppersmith method and correspond to the remaining bits of $m$. So we need to make sure that bits of $m$ are not leaked by the encryption operation.

To counter this kind of attack, one again uses blinding: we introduce some randomness to $m$ before exponentiating and remove it again afterwards. More precisely, to encrypt $m \in \mathbb{Z}_N$, we select a random $r \in \mathbb{Z}_N$, compute $rm \: \left( \mathrm{mod} \: N \right) $, then $\left(rm\right)^e \: \left( \mathrm{mod} \: N \right)$ and finally multiply the result by $r^{\phi(N)-e}$ (and reduce modulo $N$ again). Obviously the ciphertext is the same as it would have been without blinding, but the leakage of the exponentiation operation is now independent of $m$.

This should give you a flavour of the kind of side-channel attacks that can be mounted on an encryption scheme and the way implementations can avoid them.

Friday, August 7, 2015

52 Things: Number 44: Describe some basic (maybe ineffective) defences against side channel attacks proposed in the literature for ECC.

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This week we consider what can be done to mitigate the threat of side-channels against ECC implementations...

In this blog post we will discuss "some basic (maybe ineffective) defences against side channel attacks proposed in the literature for ECC". This can be seen as a complement to last weeks blog which asked the same question for AES.

Before we start the discussion, I want to clarify what kind of countermeasures we will be considering. From this point forward we will only be considering implementation level countermeasures, I will not consider hardware countermeasures such as Dual Rail Logic, or location security such as putting it in a concrete box. While the title says "maybe ineffective" I will stick to designs that at least have some hope of working, for example wearing a tinfoil hat will not secure my credit card and will clearly not work and so will not be discussed.

Elliptic curve cryptography as a rule is reasonably good when it comes to resisting side channel attacks but there are still some points that are worth considering.

Scalar Multiplication
As with most cryptography scalar multiplication (normally exponentiation in other schemes) is a very leaky operation, this is well studied in RSA. This is no different in elliptic curve cryptography because the addition operator and the double operator behave differently. Various techniques that can be applied to RSA can also be applied here, such as exponent blinding where for each scalar multiplication you choose a value $r$ such that $[a]P=[a+r]P$ where $a$ is the value you require to keep secret and $P$ is a generator of the curve. Since scalar multiplication only leaks information about the scalar this technique only needs to be applied when you want to keep the scalar secret. Recently there has been work to create elliptic curves which have the same operation for double and add which would resolve this issue.

Is a point on the curve?
Sometimes an $x$ value is chosen and to learn if it is on the curve you use the Jacobi symbol to learn if $x^3+a\cdot x+b$ is square. If it is $(x,y)$ is an elliptic curve point. As can be seen by the algorithm in the link, the process of calculating the Jacobi symbol is variable length and thus may leak information about the secret value $x^3+a\cdot x+b$. Since we are only interesed if $x^3+a\cdot x+b$ is square, we note that $x^3+a\cdot x+b$ is square if and only if $r^2\cdot(x^3+a\cdot x+b)$ is, for random $r$. Using this technique we can check if $x$ is a valid point on the curve but since it has been blinded by a random $r$ this will not leak anything about the underlying point.

Theoretically secure
While against known side channel attacks elliptic curves are reasonably secure without much help, it is possible to secret share certain schemes to enhance the security. Providing that each share leaks independently it is possible to create schemes which are provably secure against arbitrary leakage functions (including ones which can only happen in theory and not in practice). This area of cryptography has become known as Leakage Resilient Cryptography.