## Thursday, March 12, 2015

### 52 Things: Number 23: Write a C program to implement Montgomery arithmetic.

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This next blog continues the topic of 'Cryptographic Implementation Details'.

In this post I will aim to compliment what we saw last week regarding the more theoretical aspects of Montgomery arithmetic with a practical implementation of it. The implementation is written in C and written for a computer with a word size of 64 bits. The moduls m can therefore be as large as 264 – 1 and a and b can be as large as m – 1. We will take r = 264. As in the previous post, most of what is given here is derived from [1] so please refer to this for more information.
You will remember from the last blog post, that four steps were given to the algorithm (please see post for a full description of the algorithm if this is hazy). For the purposes of our implementation, we will eamine each of these stages separately.

1.       The GCD Operation
This function uses the binary extended gcd operation to find the integers r-1 and m’ such that rr-1 = 1 + mm’. These integers are required for the subsequent steps of the Montgomery reduction. The algorithm takes r and m and pointers to values r-1 and m’ which the algorithm derives. This is done using the extended gcd algorithm which could be implemented in a number of ways (the purpose of which this blog is not about) and I refer the reader to [1] or [2] for detailed descriptions of it.

2.       Transform the Multipliers
The second stage aims to compute abar = ar mod m and bbar = br mod m. As r = 264, no multiplication is required but only a shifting by 64 bits (due to our selection of r = 264), giving a 128 bit output with 64 0s tailing the value of a or b. The remainder of the division by m is then given as abar or bbar.  A function which takes the high 64 bits (x) and the low 64 bits (y) and the value for m (z) and returns the 64 bit result could be written to do this. Such a function could be defined as follows:

uint64 modul64(uint64 x, uint64 y, uint64 z);
Where uint64 is a type defined as follows:
typedef unsigned long long uint64;
3.       Montgomery Multiplication
The function which carries out the Montgomery multiplication can be defined as a function which takes the 64 bit values abar, bbar, m and mprine and returns a 64 bit value for the output of the Montgomery multiplication step.

The first sub-stage to of the function is to calculate t = abar*bbar. This is done by multiplying abar and bbar together to give a 128 bit product. An additional function could be written to do this which takes abar and bbar and returns two 64bit values, one with the high 64 bits (thi) and one with the low 64 bits (tlow).

The next stage is the computation of u = (t + (tm’ mod r)m)/r. Here t is a 128 bit integer and m’ is a 64 bit integer. As we mod by r, only the lower 64 bits of tm’ are required, meaning that we can disregard the top 64 bits of t.

tm = tlo*mprime; // Disregard thi
mulul64(tm, m, &tmmhi, &tmmlo); // Function which performs 128 bit multiplication

The subsequent multiplication by m gives an output of 128 bits and the addition of t an output of 129 bits, which can be carried out as 128bit + 128bit = 128bit output and compute the carry separately. In C:

ulo = tlo + tmmlo;
uhi = thi + tmmhi;
if (ulo < tlo) uhi = uhi +1; // test for overflow from ulo and add if necessary to uhi
ov = (uhi < thi) | ((uhi == thi) & (ulo < tlo)); // check for carry

The last step is the calculation of if(u >=m) then return u – m; else return u. This is shown below:
ulo = uhi;
uhi = 0;
if(ov > 0 || ulo >= m) // test if there was overflow or ulo is higher that
ulo = ulo – m;
return ulo;

4.       The Inverse Transformation
In the final stage we compute ur-1 mod m which is the product of a and b modulo m. This could be done by calling the following functions:

mulul64(p, rinv, &phi, &plo); // performs multiplication and returns two 64 bit values phi and plo
p = modul64(phi, plo, m); // returns value of 128bit input mod m

This gives the output p which is the 64 bit output equal to ab mod m and the Montgomery reduction is complete.

[1] http://www.hackersdelight.org/MontgomeryMultiplication.pdf
[2] http://www.ucl.ac.uk/~ucahcjm/combopt/ext_gcd_python_programs.pdf