Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?
Since no one answered, I will assume it's taken for granted that the multiplication used by JLP is correct (2-folding reduction steps). However, according to page 15 of this research paper:
https://eprint.iacr.org/2021/1151.pdfunder the analysis of "Sloppy reduction" it is proven that unless a correctness check is performed, then you need a 3rd reduction step.
Otherwise, the modular multiplication reduction has a chance (around one in 2**193) to be incorrect, because:
Thus, if sloppy reduction is applied to a randomly chosen number in the interval [0, R2 − 1], the probability of an incorrect result is about r(r−1) / 2N
There is a missing potential carry that is assumed it can never happen, after the 2nd reduction. And no one can explain it.I do recall JLP himself writing that it's using projective coordinates in the ecmath. So I guess for his modular multiplication implementation too (this is the same for VanitySearch).
In such a coordinate system, a different variable holds the divisor that's accumulated during the multiplication steps of the other two variables, so that reduction can be done much much later (as it is still expensive, even if you use heavily optimized algos of binary GCD and 2x52 legs).
Since the multiplication can be done by repeated addition, this page might come useful:
https://www.nayuki.io/page/elliptic-curve-point-addition-in-projective-coordinatesI was referring to
modular multiplication not
point multiplication.
e.g. when we add point A with point B it is done with affine (not projective) coordinates but this is not relevant.
So there's around 3 modular multiplications for a single addition:
m = (y1 - y2) / (x1 - x2). // multiply with inverse
x3 = m ** 2 - x1 - x2 // multiply with ourself
y3 = m * (x2 - x3) - y2 // multiplication #3
There are also 3 more multiplications for each addition when computing a batched inverse.
So a total of 6 multiplications for each jump of each kangaroo.
Problematic lines are these ones reducing the 256x256 product back to mod N:
https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L856https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L905https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L10172-step reduction is only guaranteed to be correct for Mersenne primes where r = 1
For secp256k1 r is 2**32 + 977, not 1. So 2 steps are not enough.
So since there's an error ratio of only using 2-steps folding: r(r - 1) / 2N
and you have 6 multiplications / jump
and you do a lot of jumps deterministically
at some point, that error will exhibit itself and all future computations get screwed up.