Friday, February 27, 2015

52 Things: Number 21: How does the CRT method improve performance of RSA?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year.

The Chinese Remainder Theorem (CRT) states that if we have two equations:
$x = a (mod N)$ and $x = b (mod M)$, then there is a unique solution modulo MN iff $gcd(N, M) = 1$.   

In RSA we must perform a modular exponentiation with respect to a modulus of a thousand or more bits [1]. As a generic remark we could say that public key algorithms are slower compared to symmetric schemes. This characteristic might result to slow web servers-networks and improving efficiency during implementation (software algorithms) plays a crucial role in order to avoid performance problems.

We denote that the main operation in RSA scheme is the modular exponentiation $M = C^d (mod N)$. A modular exponentiation, at an improved state, can be performed using (h - 1) multiplications and (t - 1) squarings (t is the bit length of the exponent and h is the Hamming weight). The number of multiplications and squarings on average is $t + t/2 -1$. 

Further performance improvements can be achieved using binary exponentiation algorithms (e.g. right-to-left exponentiation) or window methods. For the latter we process w bits of the exponent at a time. For this method we still need t squarings but the number of multiplications reduces to t/w on average. Further improvements can be obtained by adopting sliding window methods [t/(w + 1) multiplications on average].  

To make the RSA exponentiation even faster we can perform some additional tricks depending on whether we are performing an encryption operation with the public key or a decryption operation with the private key. The CRT is used in the case of RSA decryption. Thus, we are considering a private key operation which means that we have access to the private key and hence the factorization of the modulus, $N = pq$. If we suppose we want to decrypt a message, then our goal is to calculate $M = C^d (mod N)$.

First we will compute M modulo p and M modulo q as follows:

$M_p = C^d (mod p) = C^{d (mod p - 1)} (mod p)$
$M_q = C^d (mod q) = C^{d (mod q - 1)} (mod q)$

This calculation requires two exponentiations modulo 512-bit moduli and 512-bit exponents because $p$ and $q$ are 512-bit numbers. This is faster than a single exponentiation modulo a 1024-bit number with a 1024-bit exponent.

Using the CRT we recover $M$ from $M_p$ and $M_q$ as follows: 

Compute $T = p^{-1} (mod q)$, store it with the private key.

M (the message) can be recovered from $M_p$ and $M_q$ as follows:

$U = (M_q – M_p) T (mod q)$,
$M = M_p + up$.  


[1] http://www.cs.bris.ac.uk/~nigel/Crypto_Book/book.ps (Chapter 15)


Friday, February 20, 2015

52 Things: Number 20: How are Merkle-Damgaard style hash functions constructed?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. To finish the basic schemes section, we look at one of the most popular hash function designs...
A Merkle-Damgaard (MD) hash function is a hash function built by extending the domain of a collision-resistant compression function that preserves the collision resistance. This means we can take a small (and fixed) width compression function, prove it is secure and then use it to make a variable length hash function.  Whilst other methods for building hash functions exist, MD is by far the most "popular" (well, most frequently used at least!), with examples including MD5,SHA1 and SHA2. So, time to break those terms down:

Secure Hash Function?

Traditionally, a secure hash function $h$ should be:
  • Pre-image resistant: given $h(x)$, it is hard to find $x$.
  • Second pre-image resistance: given $x$, it is hard to find $y$ such that $h(x)=h(y)$.
  • Collision Resistance: It is hard to find $x,y$ such that $h(x)=h(y)$.
If a hash function is collision then clearly it must be second pre-image resistant, so it is this [collision resistance] that we will focus on.

Compression Function

A compression function $f:\{0,1\}^n\times\{0,1\}^r\to\{0,1\}^n$ is a function that, as the name suggest, compresses $n+b$-bits worth of input into an $n$-bit output. As you might expect, a collision resistant compression function is a compression function that is collision resistant. So, it can be thought of as a fixed input length hash function, but what happens if we want our hash function to be secure for any input length?

The MD hash function construction

The MD construction provides a method for extending the domain of a fixed length compression function into a variable input length hash function. Using a compression function $f$ as above, we are going to use the $n$-bit value as our internal state, and feed in $r$-bits each iteration (it's quite common to set $r=n$). To do this, we begin with an initial value (IV) and split the message $M$ up into blocks of $r$ bits $M=M_0M_1\cdots M_m$, and then simply iterate the construction by setting:
$$S_0:=IV; \qquad i=0,\dots,m-1: S_{i+1}=f(S_i,M_i); \qquad h(M):=S_m$$
Confused? Perhaps the the following diagram will help:
Diagram of the MD construction (from Wikipedia)
The most important thing about the MD-construction is that if the compression function is collision resistant, then so is the overall construction (as proven by Merkle). This gives us a secure method for building hash functions out of a smaller, easier studied primitives.

Length Extension

You might notice that the diagram has an extra stage that my description didn't: the "finalisation" stage. This is to prevent length extension attacks. For an example, if $N$ is a single block (ie $N\in\{0,1\}^r$) if the attacker knows $h(M)=x$, then he can very easily calculate $h(M||N)$, because $h(M||N)=f(M,N)$. So, some form of finalisation function has to be used to break this relationship.

Friday, February 13, 2015

52 Things: Number 19: The Shamir secret sharing scheme.

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year.

The Shamir secret sharing scheme [3], is an algorithm invented by Adi Shamir to allow multiple parties to divide up a secret, such as a key, and when enough of those parts are combined the whole secret can be calculated.

To put it slightly more formally, if we have a secret $S$ and $n$ parties, we can divide $S$ into $n$ parts and distribute those to the various parties. The secret can be divided in such a way that a threshold $k$ can be set so that when $k$ parts of the secret $S$ are known then the whole secret can be calculated. If $k-1$ or less parts of $S$ are known then $S$ can't be calculated. This scheme is called a $(k,n)$ threshold scheme.

The best way to explain how this scheme is constructed is by means of an example. Assume we want to divide the secret $S=1425$ between $5$ parties ($n=5$) and need $3$ parts to be known to allow the secret to be computed. First we construct a polynomial[1] $f(x)$ of order $k-1 = 2$ with random coefficients (say $a_1= 64, a_2 = 112)$ and constant $S$.
$$ f(x)= S + a_1 x + a_2 x^2 = 1425 + 64 x + 112 x^2$$
From this polynomial we can construct $n=5$ points, these points can then be distributed between our parties, one point per party.

$P_0=(1,1601),P_1=(2,2001),P_2=(3,2625),P_3=(4,3473),P_4=(5,4545)$

If we assume that we know $3$ of the $5$ points, we can compute $3$ Lagrange Polynomials[2], the sum of which, when multiplied by the associated $y_j$ values, gives $f(x)$ and therefore gives us the secret $S$. For example if we know
$$ (x_0,y_0)=(2,2001), (x_1,y_1)=(3,2625), (x_2,x_3)=(5,4545) $$
Computing 3 Lagrange Polynomials gives
$$ l_0 = \frac{x-x_1}{x_0-x_1}\frac{x-x_2}{x_0-x_2} = \frac{1}{3}x^2 - \frac{8}{3}x + 5 \\ l_1 = \frac{x-x_0}{x_1-x_0}\frac{x-x_2}{x_1-x_2} = -\frac{1}{2}x^2 + \frac{7}{2}x - 5 \\ l_2 = \frac{x-x_0}{x_2-x_0}\frac{x-x_1}{x_2-x_1} = \frac{1}{6}x^2 - \frac{5}{6}x + 1 \\ \sum_{j=0}^2 y_j l_j(x) = 112 x^2 + 64 x + 1425. $$
As demonstrated, this method works fine. However, the eavesdropper is able to glean quite a lot of information about the secret $S$; since in the above we have worked with rational arithmetic.However, if we instead work in a finite field (so the secret and the polynomial are defined over a field of size q) then if any two or less parties come together they can learn nothing about the secret. 

This is because for two such parties, say party one and party two, and any secret value S' from the field, there is always a degree two polynomial defined over the finite field which interpolates the three values (0,S'), (2,2001 mod q) and (3,2625 mod q).



Thursday, February 5, 2015

52 Things - Number 18: Draw a diagram (or describe) the ECB, CBC and CTR modes of operation

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year.

Week 18's task is to draw a diagram (or describe) the ECB, CBC and CTR modes of operation. Someone on Wikipedia has already gone a good job of drawing some nice diagrams, so for the purposes of this blog post I'll stick to explaining the rationale behind them, providing links where appropriate.

Modes of operation: the security provided by a block cipher applies to the encryption or decryption of a single fixed-length plaintext "block". When the encryption or decryption of messages longer than a block is required (as is typical), we use a mode of operation to link together the encryption of multiple blocks of plaintext. As we will see, the approach used to chain multiple blocks together is important.

Electronic Codebook (ECB) mode. (Encryption) (Decryption). ECB mode is the most straightforward approach---the plaintext is divided into m blocks, where each block is encrypted separately using the same key. The inherent problem with ECB mode is that repeated plaintext blocks produce the same ciphertext blocks. The best illustration of this problem is with the encryption of images, where repeated patterns in the original image reappear in the encrypted image. See, for example, the source image and the corresponding ECB encrypted image.

Cipher-block chaining (CBC) mode. (Encryption) (Decryption). CBC mode takes away the limitations of ECB mode. Each plaintext is XORed with the previous ciphertext before being encrypted, where the first plaintext block is XORed with a random initialisation vector (IV). The repeated patterns in the ciphertext blocks produced by ECB mode encryption are removed by the randomness and error propagation supplied through the XOR operation and initial IV. CBC is most commonly used mode in practice.

Counter (CTR) mode. (Encryption) (Decryption). Counter mode differs from ECB and CBC in that it in some sense it acts like a stream cipher. CTR mode generates a keystream by repeatedly encrypting a successive values of a "counter", which is initially set using an initialisation vector. Incrementing the counter for successive encryptions can be a simple as incrementing the initial counter by 1. Each encryption of the counter is, like a stream cipher, XORed with the next plaintext block to produce the next ciphertext block.

Further reading: some modes of operation guarantee the authenticity of a plaintext in addition to its confidentiality. See AEAD modes for more.

Friday, January 30, 2015

Discrete Gaussian Sampling and the Quest for the Shortest Vector

The tl;dr:
When it comes to short vectors in lattices
The state-of-the-art apparatus is
Discrete Gaussian sampling
For use (for example) in
Reduced time and space cryptanalysis...
This week’s study group, presented by Joop, was on Solving the Shortest Vector Problem in 2n Time via Discrete Gaussian Sampling, a 2014 pre-print release by Aggarwal, Dadush, Regev and Stephens-Davidowitz. The lattice, to my regret, is not a creature I frequently encounter in the course of my own research, so I report here from a considerably less-than-expert perspective. But hopefully, if I stick to the high-level details then all shall be well.

The motivating theme of the paper is the Shortest Vector Problem (SVP) — an important computational problem on lattices: given a basis for a lattice, find a non-zero vector in the lattice of minimum Euclidean norm. Many cryptographic primitives (notably including breakthrough developments in fully homomorphic encryption) derive their security from the worst-case hardness of SVP and related problems.

Previous algorithms to find exact or approximate solutions to SVP fall into three main classes: enumeration, whereby basis reduction is combined with exhaustive search inside Euclidean balls (time ranging from 2O(n log n) to 2O(n2), polynomial space); sieving, whereby randomly sampled vectors are iteratively combined to generate increasingly short vectors (2cn time (c>2), exponential space);  and recent proposals using the Voronoi cell of a lattice -- the real region in which all points lie closer to the zero vector than to any other (22n time, exponential space).

The intuition behind Discrete Gaussian Sampling (DGS) is to draw from a (narrow) distribution of lattice points centred around zero. As the width of the distribution (characterised by the parameter s) decreases, the samples become more and more concentrated on short vectors; sufficiently many samples for an arbitrary s will eventually land on a solution to SVP. A simple idea but, of course, far from simple in practice. In particular, for any given lattice there is a smoothing parameter s* -- informally, the 'tipping point' at which the distribution begins to 'look like' a continuous Gaussian rather than a stack around the central point. Efficient algorithms are known for s much larger than s*, but these are inadequate to solve exact lattice problems. A key contribution of Aggarwal et al. is an algorithm that samples for any parameter s > 0; it returns 2n/2 independent discrete Gaussian distributed vectors using 2n+o(n) time and space. They add to this a second algorithm for "above smoothing" sampling with a substantially improved time and space complexity (2n/2 -- the current record for fastest provable running time of a hard lattice problem), and show that it can be used to approximate decision SVP to within a factor of 1.93.

Here's where I muster all my expertise-less enthusiasm and try to explain the (high level) details of the algorithms without saying anything too stupid…

Both algorithms operate by iteratively combining vectors drawn from carefully chosen high parameter distributions (from which it is well-known how to sample efficiently) in such a way that the dispersion parameter of the distribution of the combined vectors is progressively lowered. As an aid to intuition, consider, for a moment, the counterpart continuous case: the distribution of a (continuous) Gaussian-distributed vector divided by two is similarly Gaussian distributed with half the width. In the discrete case, though, dividing by two generally does not produce another vector in the lattice. A 'fix' would be to sample from L with parameter s', keep those that were also in 2L, and divide them by 2 -- but the "loss factor" of doing so can be high (the probability that a sample from L is also in 2L can be as small as 2-n), so one needs to be cleverer.

The 2n-time below-smoothing sampler looks for pairs of vectors sampled from lattice L with parameter s whose sum is in 2L (equivalently: vectors in the same coset mod 2L). The 'greedy' combiner which pairs as many as possible in each coset-related bucket would have a loss factor of just 2 in reducing from the sampled vectors to the shorter combined vectors. However, below the smoothing parameter the required distributional properties are lost in the combining step. The workaround to produce combined vectors with the correct distribution is very involved; I caught something about coin-flips and Poisson distributions, but the sterling efforts of our presenter to render comprehensible the technical details did not take sufficient root in my brain for me to attempt the same here -- I refer you to the paper at this point! From an input sample of order 2n, combining multiple times according to the method they devise reduces s as required with a total loss factor of 2n/2, thus outputting on the order of 2n/2 sampled vectors.

The 2n/2-time above-smoothing sampler is not just a tweak on the first algorithm but a surprisingly different approach supplied with its own set of clever tricks and insights. The intuitive idea is to construct a ‘tower’ of increasingly sparse lattices [L0,…,Lm] where Lm = L, the lattice one wishes to sample from. Each Li is chosen to ‘lie between’ Li+1 and 2Li+1 — that is, it is a (strict) sublattice of Li+1 which contains 2Li+1, i.e. Li+1Li ⊆ 2Li+1. Because L0 is dense, one can sample ‘easily’ from it with a small parameter (2-m/2s, in fact), and combine (by summing) over cosets of L1 to get samples over the sparser lattice 2L1 with a slightly increased parameter 2-(m-1)/2s. Repeating this process eventually produces samples from Lm = L with distribution statistically close to the discrete Gaussian with parameter s, provided s is above the smoothing parameter. They show that this method is able to approximate decision SVP to within a factor of 1.93.

That’s all we had time for in our session — and already, I admit, somewhat more than I have the capacity to fully absorb without some serious preliminary study! (Although, the bits I was able to follow, I found very interesting). The rest of the paper, as well as containing all the ‘proper maths’ (including the formal reduction from SVP to DGS), covers applications, adaptations to the closest vector and other related problems, and relevant points for discussion, all in some depth. Despite the clear theoretical value of their contribution, the authors are circumspect about the practical usages of their algorithms relative to enumeration and sieving methods. These latter perform well heuristically in relevant scenarios, whilst the run-time bounds of DGS are tight in practice, with no trade-off available even if one wants to sample fewer than 2n/2 vectors.

52 Things: Number 17: Describe and compare the round structure of DES and AES.

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year.In this week, we describe and compare the round structure of DES and AES. 

Both DES and AES are examples of iterated block ciphers. The block ciphers obtain their security by repeated use of a simple round function. The round function takes an n-bit block and returns an n-bit block, where n is the block size of the overall cipher. The number of rounds r can either be a variable or fixed. As a general rule increasing the number of rounds will increase the level of security of the block cipher. Each use of the round function employs a round key ki (where 1 ≤ i ≤ r) derived from the main secret key k, using an algorithm called a key schedule. To allow decryption, for every round key the function implementing the round must be invertible, and for decryption the round keys are used in the opposite order that they were used for encryption. In DES the functions needed to implement the round function are not invertible, but the whole round is invertible. For AES (Rijndael) not only is the whole round function invertible but every function used to create the round function is also invertible. 
More particularly, the DES cipher is a variant of the basic Feistel cipher. The interesting property of a Feistel cipher is that the round function is invertible regardless of the choice of the function in the box marked F. To see this notice that each encryption round is given by:
Li = Ri-1
Ri = Li-1 ⊕ F(Ki,Ri-1).

Hence, the decryption can be performed via:
Ri-1 = Li
Li-1 = Ri ⊕ F(Ki,Li).

This way we can choose any function for the function F, and we will still obtain an encryption function which can be inverted using the secret key. The same code/circuitry can be used for the encryption and decryption functions. We only need to use the round keys in the reverse order for decryption. As a variant of the Feistel cipher design, DES includes the following distinct characteristics:
  • the number of rounds r is 16,
  • the block length n is 64 bits,
  • the key length is 56 bits,
  • the round keys K1,...,K16 are each 48 bits
  • before and after the main Feistel iteration a permutation is performed.
In summary the DES cipher operates on 64 bits of plaintext in the following manner:
  • Perform an initial permutation.
  • Split the blocks into left and right half.
  • Perform 16 rounds of identical operations (Festal cipher). In each round the, the F function consists of the following six stages:
    • Expansion Permutation: The right half of 32 bits is expanded and permuted to 48 bits.
    • Round Key Addition: The 48-bit output from the expansion permutation is XORed with the round key, which is also 48 bits in length.
    • Splitting: The resulting 48-bit value is split into eight lots of six-bit values.
    • S-Box: Each six-bit value is passed into one of eight different S-Boxes (Substitution Box) to produce a four-bit result. Each S-Box is a look-up table of four rows and sixteen columns. The six input bits specify which row and column to use. Bits 1 and 6 generate the row number, whilst bits 2, 3, 4 and 5 specify the column number. The output of each S-Box is the value held in that element in the table.
    • P-Box: We now have eight lots of four-bit outputs which are then combined into a 32-bit value and permuted to form the output of the function F.
  • Join the half blocks back together.
  • Perform a final permutation.
The DES key schedule takes the 56-bit key, which is actually input as a bitstring of 64 bits comprising of the key and eight parity bits, for error detection. It first permutes the bits of the key (which takes a 64-bit input and produces a 56-bit output, hence discarding the parity bits). The output of this permutation, called PC-1 in the literature, is divided into a 28-bit left half C0 and a 28-bit right half D0. Now for each round we compute:
Ci=Ci−1 ≪ pi
Di=Di−1 ≪ pi

where x ≪ pi means perform a cyclic shift on x to the left by pi positions. Finally the two portions Ci and Di are joined back together and are subject to another permutation, called PC-2, to produce the final 48-bit round key.
Note that a key length of 56 bits is insufficient for many modern applications, hence often one uses DES by using three keys and three iterations of the main cipher. Such a version is called Triple DES or 3DES. In 3DES the key length is equal to 168. There is another way of using DES three times, but using two keys instead of three giving rise to a key length of 112. In this two-key version of 3DES one uses the 3DES basic structure but with the first and third key being equal. However, two-key 3DES is not as secure as one might initially think.

More details on actual values (S-Boxes, P-Boxes and all Permutation tables) can be found in [1].

The AES (Rijndael) algorithm, unlike DES, is a block cipher that does not rely on the basic design of the Feistel cipher. However, AES does have a number of similarities with DES. It uses a repeated number of rounds to obtain security and each round consists of substitutions and permutations, plus a key addition phase. AES in addition has a strong mathematical structure, as most of its operations are based on arithmetic in the field F28 . However, unlike DES the encryption and decryption operations are distinct.
AES identifies 32-bit words with polynomials in F28[X] of degree less than four. AES is a parametrized algorithm in that it can operate on block sizes of 128, 192 or 256 bits. It can also accept keys of size 128, 192 or 256 bits. For each combination of block and key size a different number of rounds is specified.
To make our discussion simpler we shall consider the simpler, and probably more used, variant which uses a block size of 128 bits and a key size of 128 bits, in which case 10 rounds are specified. AES operates on an internal four-by-four matrix (S(4,4)) of bytes, called the state matrix, which is usually held as a vector of four 32-bit words, each word representing a column. Each round key is also held as a four-by-four matrix [1]. The AES round function operates using a set of four operations:
  • SubBytes: There are two types of S-Boxes used in Rijndael: One for the encryption rounds and one for the decryption rounds, each one being the inverse of the other. For the encryption S-Box each byte s = [s7,...,s0] of the state matrix is taken in turn and considered as an element of F28. The S-Box can be mathematically described in two steps:
    1. The multiplicative inverse in F28 of s is computed to produce a new byte x = [x7, . . . , x0].
    2. The bit-vector x is then mapped, via an affine F2 transformation [1], to a new bit-vector y. The new byte is given by y. The decryption S-Box is obtained by first inverting the affine transformation and then taking the multiplicative inverse.
  • ShiftRows: The ShiftRows operation in AES performs a cyclic shift on the state matrix. Each row is shifted by different offsets [1]. The inverse of the ShiftRows operation is simply a similar shift but in the opposite direction. The ShiftRows operation ensures that the columns of the state matrix ‘interact’ with each other over a number of rounds.
  • MixColumns: The MixColumns operation ensures that the rows in the state matrix ‘interact’ with each other over a number of rounds; combined with the ShiftRows operation it ensures each byte of the output state depends on each byte of the input state [1].
  • AddRoundKey: The round key addition is particularly simple. One takes the state matrix and XORs it, byte by byte, with the round key matrix. The inverse of this operation is clearly the same operation.
The AES algorithm can be described using the pseudo-code:

AddRoundKey(S, K0) 
for i = 1 to 9 do 
      SubBytes(S) 
      ShiftRows(S) 
      MixColumns(S) 
      AddRoundKey(S, Ki) 
end 
SubBytes(S) 
ShiftRows(S) 
AddRoundKey(S, K10)

The message block to encrypt is assumed to be entered into the state matrix S. The output encrypted block is also given by the state matrix S.
The AES key schedule makes use of a round constant which we shall denote by: 
RCi = xi (mod x8 + x4 + x3 + x + 1)
We label the round keys as (W4i, W4i+1, W4i+2, W4i+3) where i is the round. The initial main key is first divided into four 32-bit words (k0, k1, k2, k3). The round keys are then computed as algorithm below, where RotBytes is the function which rotates a word to the left by a single byte, and SubBytes applies the Rijndael encryption S-Box to every byte in a word [1].

W0 =K0,W1 =K1,W2 =K2,W3 =K3 
for i = 1 to 10 do 
      T = RotBytes(W4i−1
      T = SubBytes(T) 
      T = T ⊕ RCi 
      W4i = W4i−4 ⊕ T 
      W4i+1 = W4i−3 ⊕ W4i 
      W4i+2 = W4i−2 ⊕ W4i+1 
      W4i+3 = W4i−1 ⊕ W4i+2 
end
References: [1] http://www.cs.bris.ac.uk/~nigel/Crypto_Book/

Friday, January 23, 2015

52 Things: Number 16: Describe the key generation, signature and verification algorithms for DSA, Schnorr and RSA-FDH.

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. In this week, we describe the key generation, signing and verification algorithms of DSA, Schnorr and RSA-FDH.


1. DSA
The Digital Signature Scheme (DSA), also known as the Digital Signature Standard (DSS), was proposed by the National Institute of Standards and Technology (NIST) in 1991 [1]. Security of DSA is based on the difficult of computing discrete logarithms. But there is no known proof of its security under a standard assumption (like DL), even in the random oracle model

Domain Parameter Generation
  1. Select a prime number $p$, where $2^{L-1}<p<2^L$ and $L$ is a multiple of 64 and $512 \leq L \leq 1024$.
  2. Select a prime divisor $q$ of $p-1$, where $2^{159}<q<2^{160}$.
  3. Compute a generator $g$ of the subgroup of order $q$: choose a random integer $r$, where $1<r<p-1$ such that $g=r^{(p-1)/q} \ mod \ p$ and $g \neq 1$.
Key Generation
  1. Select a random integer $x$, where $0<x<q$.
  2. Compute $y = g^x \ mod \ p$.
Then the public key is $y$ and the private key is $x$. 

Signing
  1. Select a random integer $k$, where $0<k<q$.
  2. Compute $r = (g^k \ mod \ p)\  mod \ q$.
  3. Compute $s = (h(m)+x\cdot r)\cdot k^{-1}\ mod \ q$, where $h(m)$ is hash of $m$ using SHA-1
The signature on $m$ is the pair $(r, s)$.

Verification
  1. Compute $u_1 = h(m) \cdot s^{-1}\ mod \ q$.
  2. Compute $u_2 = r \cdot s^{-1}\ mod \ q$.
  3. Compute $v = (g^{u_1} \cdot y^{u_2}\ mod \ p)\ mod \ q$.
  4. Output $1$ if $v = r$, otherwise output $0$.
Correctness
If $(r,s)$ is a valid signature on $m$, then we have
$v = g^{u_1}\cdot y^{u_2}=g^{h(m)\cdot (h(m)+x\cdot r)^{-1}\cdot k}\cdot {g^{x \cdot r \cdot (h(m)+x\cdot r)^{-1}\cdot k}} \ mod \ p$
$= g^{(h(m)+x\cdot r)\cdot (h(m)+x \cdot r)^{-1}\cdot k}\ mod \ p$
$= g^k \ mod \ p$
Thus the verification succeeds. 

2. Schnorr
The Schnorr signature is an important DLP-based signature scheme. It works in any prime order group and its security is proven in the random oracle model under DL assumption [2]. 

Domain Parameter Generation
  1. Select a prime number $p$.
  2. Select a prime divisor $q$ of $p-1$.
  3. Select a generator $g$ of the subgroup of order $q$.
Key Generation
  1. Select a random integer $x$, where $0<x<q$.
  2. Compute $y=g^x \ mod \ p$.
The public key is $y$ and the private key is $x$.

Signing
  1. Select a random integer $k$, where $0<x<q$.
  2. Compute $a = g^k \ mod \ p$.
  3. Compute $r = h(m \| a)$, where $m$ is the message to be signed and $h:\{0,1\}^* \rightarrow \mathcal{Z}_{q}$ is a hash function. 
  4. Compute $s = (k + r\cdot x)\ mod \ q$
The signature on $m$ is the pair $(r,s)$.

Verification
  1. Compute $v = g^s \cdot y^{-r}\ mod \ p$
  2. Output $1$ if $v = r$, otherwise output $0$.
Correctness
If $(r,s)$ is a valid signature on $m$, then we have
$v=g^s\cdot y^{-r}=g^{k+r\cdot x}\cdot g^{-r\cdot x}=g^k=r$
Thus the verification succeeds.

3. RSA-FDH
The RSA-FDH (full domain hash) scheme was introduced by Bellare and Rogaway in [3]. It is a RSA-based signature scheme and follows the hash-then-sign paradigm. It makes use of the hash function (the image size of the hash function equals to RSA modulus) to generate random-looking output for the plain RSA signature scheme. Thus it prevents the algebraic attacks on the plain RSA signature scheme and it is able to sign messages of arbitrary length. But it is hard to create such hash function in practice. RSA-FDH can be proven EU-CMA secure in the random oracle model

Key Generation
  1. Select two large primes $p$ and $q$. 
  2. Compute $N=p\cdot q$.
  3. Select a random integer $e$, where $1<e<\phi(N)$, such that $gcd(e,\phi(N))=1$.
  4. Compute the integer $d$, where $1<d<\phi(N)$, such that $e\cdot d = 1\ mod \ \phi(N)$.
The public key is $(N,e)$ and the private key is $(d,p,q)$.

Signing
  1. Compute $s=h(m)^d\ mod \ N$, where $m$ is the message to be signed and $h:\{0,1\}^* \rightarrow \mathcal{Z}_N$ is a hash function.
The signature on $m$ is $s$.

Verification
  1. Output $1$ if $s^e = h(m)\ mod \ N$, otherwise output $0$.
Correctness
If $s$ is a valid signature on $m$, then we have
$s^e=h(m)^{d\cdot e} mod \ N=h(m)\ mod \ N$
Thus the verification succeeds.

[1]http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-4.pdf
[2]http://eprint.iacr.org/2012/029.pdf
[3]http://web.cs.ucdavis.edu/~rogaway/papers/exact.pdf