Title: Need help understanding this modular inverse implementation Post by: NotATether on June 04, 2024, 01:28:24 PM Taken straight from VanitySearch
https://github.com/XopMC/CudaBrainSecp/blob/main/GPU/GPUMath.h#L548-L659 (different project but identical source file) _ModInv is the modular inverse funciton that implements the extended euclidean algorithm, which is described at Wikipedia here: https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm It's using signed 320-bit integers, probably to catch overflow and underflow, which it deals with later. That's not the issue here. There is the Euclidean (proper) division, which I already understand. And then there's this talk about the Bezout coefficients which I also understand. My issue here is understanding how this particular implementation implements this this using 63-bit bit shifts. U and V (the two inputs) are loaded with the values of P and R respectively. (because R * (the return value) === 1 mod P. And then my understanding goes downhill from there :) Can someone help me figure this out, or at least point me to a website where I can find people who might understand this sort of function? Title: Re: Need help understanding this modular inverse implementation Post by: j2002ba2 on June 11, 2024, 08:11:32 AM This is Extended Binary GCD
Binary GCD: https://xlinux.nist.gov/dads/HTML/binaryGCD.html http://www.cut-the-knot.org/blue/binary.shtml Extended Binary GCD explanation with lots of words: https://github.com/DavidNorman/gcd Title: Re: Need help understanding this modular inverse implementation Post by: NotATether on June 11, 2024, 08:43:03 AM This is Extended Binary GCD Binary GCD: https://xlinux.nist.gov/dads/HTML/binaryGCD.html http://www.cut-the-knot.org/blue/binary.shtml Extended Binary GCD explanation with lots of words: https://github.com/DavidNorman/gcd Perfect. Now I'm going to try to figure out how the binary extended GCD operates with many words - this has been a pain point for me. I'll get back to you later. Title: Re: Need help understanding this modular inverse implementation Post by: j2002ba2 on June 11, 2024, 09:12:09 AM I mean lots of words explaining it.
Here is the code I made back in 2019, this is slower than the 5x52 version (it is 8x32), but was faster than xp-2. Code: __device__ static void invModP_(unsigned int value[8]) { Title: Re: Need help understanding this modular inverse implementation Post by: NotATether on June 20, 2024, 05:45:37 AM OK, so now, finally I got the hang of this!
The key to understanding this implementation which is not a conventional binary egcd, is that this implementation is DRS62 (delayed right shift 62 bits) - actually it says so in the comments even. Basically, the reason why it's called such is because it implements the optimizations in this algorithm: https://github.com/pornin/bingcd/blob/main/doc/bingcd.pdf This paper describes how to optimize EGCD when you are working in a modular field with fixed-size registers, like 64-bit registers and the like. Although in the conventional binary gcd, we are left shifting (dividing) stuff by 2, in this implementation we are actually multiplying some of the words by 2 and then we "fix" it by dividing (right shift) by a bunch of twos at once - 62 times, to be exact. Those lines that have MM64 and MSK62 on them followed by multiplication, addition and a right shift - that is Montgomery Reduction (https://en.wikipedia.org/wiki/Montgomery_modular_multiplication#The_REDC_algorithm) and that basically calculates the modulus by P. It uses some fancy modulo 2^62 stuff, but I will have more to say when I write a separate post about this. Much thanks @j2002ba2! |