There have been quite a few posts lately discussing the failure of sign() in curve25519.java. I still have the feeling that most people don't fully understand what causes this failure. So I will explain in detail below. Note that if the majority of the community thinks that curve25519.java should not be changed, that's ok for me. I don't forge with my small account so it doesn't matter to me.
Let's start by looking at an easy example to see what can go wrong:
Our coder Mr. Confusius wants to write some code that is doing modulus calculus in the field
F7 (i.e. all numbers can be reduced mod 7). Since
F7 contains only the numbers 0,....,6 he decides to represent each number by 4 bits and calls his class fourBitNum. So we have
0 = 0000, 1 = 0001, ..., 6 = 0110
Happy with the design he writes methods for adding, subtracting, multiplication and mod 7 reduction of those numbers:
FourBitNum add(fourBitNum a, fourBitNum b)
FourBitNum sub(fourBitNum a, fourBitNum b)
FourBitNum mul(fourBitNum a, fourBitNum b)
FourBitNum mod7(fourBitNum a)
He then tests his code by calculating 5+4 in
F7. Since 5+4=9≡2 mod 7 the output should be 2:
fourBitNum a = add(fourBitNum(5), fourBitNum(4));
fourBitNum b = mod7(a);
output(b)
and the output is indeed 2 as expected. What a great coder I am, he thinks, and tests the multiplication. 5*4=20≡6 mod 7 so the output should be 6:
fourBitNum a = mul(fourBitNum(5), fourBitNum(4));
fourBitNum b = mod7(a);
output(b)
He wraps his eyes when he sees that the output is 4. ok, I'll fix that later he says to himself, let's first try the sub method. He tests 5 and 4 as input for calculating 5-4=1 mod 7 and the result is correct. However 4 and 5 as input for calculating 4-5=-1≡6 mod 7 outputs the unexpected result 1.
What the hell is wrong with his code?
In the first example of failure the result 5*4=20=10100 is to big to be held in four bits so his add method simply omits the highest bit, effectively doing a mod 16 reduction of 5*4 followed by the wanted mod 7 reduction: output = ((5*4) mod 16) mod7 = 4.
Not handling overflows always ruins the calculation.
In the second example 4-5=-1=1111 in his four bit representation. But his mod7 method expects positive input and interprets it as 15 which again means there was an unwanted mod 16 reduction, -1≡15 mod 16, and his calculation is ruined again.
The conclusion is that when doing modulus calculation it is very important to realize when numbers get too big (i.e. overflow) or too small (i.e. < 0, underflow). The coder of curve25519.java (respectively the coder of the original in c) certainly is not Mr. Confusius. But he did make some mistakes. Let's analyze the sign() method line by line:
public static final boolean sign(byte[] v, byte[] h, byte[] x, byte[] s) {
/* v = (x - h) s mod q */
The comment simply states what he wants to calculate. At this point h is a hash representing an arbitrary 256-bit, x is a hash that got clamped (i.e. some bits were changed) and s is the inverse of the private key k. s already has been reduced mod group order. After defining some variables that he needs for the calculation and setting all bytes in the v array to 0 the calculation begins:
i = mula_small(v, x, 0, h, 32, -1);
The name "mula_small" is deceptive, it really is a multiplication
plus an addition. He calculates v = x + h * (-1). The returned value i is not used in the subsequent calculation. The author probably only used it for debugging purposes. More on that a little later.
We will leave out the next line (I will refer to this line as "strange code line")
mula_small(v, v, 0, ORDER, 32, (15-v[31])/16);
for a moment and take a look at the rest of the code first before returning to this important line.
The author continues to get to the wanted result with
mula32(tmp1, v, s, 32, 1);
Again the name "mula32" is deceptive as it really is 2 multiplications plus one addition. The line means: tmp1 = tmp1 + v * s * 1. Since v=x-h and this point we have tmp1 = (x-h)*s.
The last line that calculates something, is a mod q reduction (where q is the group order represented by the ORDER argument) which is in principle unnecessary:
divmod(tmp2, tmp1, 64, ORDER, 32);
The result is tmp2 = tmp1/ORDER as integer division and tmp1 = tmp1 mod ORDER so we finally have tmp1 = (x-h)*s mod q
The following loop copies tmp1 into v and at the same time ORs the bits of tmp1 into w. Thus w==0 <==> v==0:
for (w = 0, i = 0; i < 32; i++)
w |= v[i] = tmp1[i];
It finally returns true if and only if v!=0 (via w!=0).
Note that sign() returns false <==> v==0 <==> (x-h)s mod q == 0 <==> x=h. Since x and h are sha256 hashes (with x only a little bit changed) we either are
very unlucky) or message + s == message + Y <==> s==Y where Y is the public key of x. Again, that is
very unlikely. To me, failure to produce a valid signature is just a theoretical possibility, it should not happen in real life (correct me if I am wrong).
Now will that code work? If we comment out the "strange code line" and remove the variable "i" we probably get the first version of the sign() method that the author tested. It produces in more than 60% of the cases wrong signatures. Why that? If x-h<0 then v=(x-h)s<0 and the mod q reduction will not give the expected result. The author probably was astonished and inserted the variable i in his code to check if mula_small(v, x, 0, h, 32, -1) indicates an underflow (mula_small returns -1 in this case). Once he realized the problem he tried to fix it in a
super smart way by inserting the "strange code line":
mula_small(v, v, 0, ORDER, 32, (15-v[31])/16);
With this line, he wants to kill two birds with one stone:
1) Compensate the error if there was an underflow (i.e. x<h)
2) reduce v mod q
But alas, sometimes if you think you are doing something super smart you end up doing something super stupid! It usually is better to do it in 2 steps each of which is easy to understand.
Let's see what really happens:
If 0<=v[31]<31 then 0*group order is added to v leaving v unchanged.
If 31<=v[31]<=127 then a (positive) multiple of the group order is
subtracted from v to get 0<=v<q.
If -128<v[31]<0 then a (positive) multiple of the group order is
added to v. v is hereby crossing the 2^256 border causing an overflow.
So his idea is:
If x>=h then v=x-h is positive and thus the highest bit of v is not set, i.e. 0<=v[31]<127. In that case we reduce v by subtracting multiple of the group order ending up with 0<=v<q.
In the other case x<h making v=x-h negativ (we had an underflow) and thus its highest bit is set, i.e. -128<v[31]<0. In that case we are causing an overflow by adding a suitable multiple of the group order which compensates the underflow and will again have 0<=v<q.
Sounds good but actually is a bad idea!Consider x=2^255+1 and h=1 giving v=x-h=2^255. There was no underflow in the calculation and still the highest bit of v is set causing the above algorithm to add a multiple of the group order which in turn causes an overflow und thus ruining the whole calculation.
On the other hand if x=1 and h=2^255+1 then the calculation of v is indeed causing an underflow but this time the highest bit of v is not set so the algorithm will not compensate for it and thus again ruining the calculation.
Even though the first case cannot happen (the highest bit of x is always cleared) the second case
can happen and leads to failure. The probability is roughly (1/4*1/2*2^254*(2^254 + 1))/2^508 ≈ 1/8 which is close to the value gimre found with his tests.
The way I suggested to correct the error was very simply and thus easy to understand:
Since the whole calculation is within
Fq, reduction mod q of any positive variable at any point of the calculation doesn't alter the result (that's a mathematical fact). After we have reduced x and h mod q and calculated v=x-h it's easy to check if v is negative by looking at the highest bit (this always works!). If it is negative, adding the group order will
always result in 0<=v<q. The rest of sign() is the same es before.
Nothing is leaked.
For those who are complaining that parts of my code like
if ((v[31] & 0x80) != 0)
{
mula_small(v, v , 0, ORDER, 32, 1);
}
is time dependent and therefore bad, you can easily modify it to include a fake addition which adds 0:
if ((v[31] & 0x80) != 0)
{
mula_small(v, v , 0, ORDER, 32, 1);
}
else
{
mula_small(v, v , 0, ORDER, 32, 0);
}
Even more, if you are complaining about code in curve25519.java to be time dependent, take a look how the inverse s=k^-1 of the private key k is calculated. It uses the extended euclidean algorithm which is time dependend too. If you have problems with that, you have to replace that part too (It can be done by using Fermat's little theorem and a Montgommery ladder).
That's it, I have nothing more to say about sign() (the post was long enough
).
I hope that those who are interested in the sign() method now have a better understanding what really happens inside of it and where is goes wrong.
Thank you for that thorough analysis. Looks acceptable to me. If all operations are executed within the modulo field, they can be reduced at any time over and over again.