I've been playing around with generating Bitcoin addresses from scratch in Python (will post about that soon) and ran into something interesting to do with calculating modular multiplicative inverses. Until recently, I'd only bumped into two ways of doing that, either using the extended Euclidean algorithm, or using Euler's theorem:
Method A (extended Euclidean algorithm)def invert_a(x: int, m: int) -> int:
# reference: https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm#Pseudocode
r0, r1 = x, m
s0, s1 = 1, 0
while r1 != 0:
quotient = r0 // r1
r0, r1 = r1, r0 - quotient * r1
s0, s1 = s1, s0 - quotient * s1
if r0 != 1:
raise ValueError('x is not invertible for the given modulus')
return s0 if s0 >= 0 else s0 + m
Method B (exponentiation with m-2, Euler's theorem)def invert_b(x: int, m: int) -> int:
# reference: https://en.wikipedia.org/wiki/Modular_multiplicative_inverse#Using_Euler's_theorem
return pow(x, m-2, m)
Since Python 3.8, you can pass negative exponents into the
pow function, leading to two more ways:
Method C (exponentiation with -1)def invert_c(x: int, m: int) -> int:
return pow(x, -1, m)
Method D (exponentiation with -m)def invert_d(x: int, m: int) -> int:
return pow(x, -m, m)
Python is not really meant for performance, but I figured I'd benchmark all four methods out of curiosity. After testing that they all produce identical answers (for
x >= 1 and
x < m, with the moduli that I care about:
p and
n from secp256k1), I inverted 100,000 private keys 10 separate times with each method and then took the mean:
+---------------+-------+
| CPython 3.9.2 | Speed |
+----+----------+-------+
| Method A | 3.85x | |||||||||||||||||||||||
+----------+-------+
| Method B | 1.15x | |||||||
+----------+-------+
| Method C | 7.80x | |||||||||||||||||||||||||||||||||||||||||||||||
+----------+-------+
| Method D | 1.00x | ||||||
+----------+-------+I was surprised that method A held up so well, considering that the other methods consist only of a single call to
pow and therefore do almost all of their work in native code.
Why is method C the fastest? Peeking at the relevant CPython code (inside the implementation of
pow:
here), it looks like it's handling negative exponents with an internal implementation of the extended Euclidean algorithm (this one:
here). I was puzzled at first by method D being so much slower than method C (I expected it to be slower, just not 7.8x) but then realized that although it does make use of the same internal special-casing that makes method C so fast, it then follows that with an expensive (especially for big values of
m) modular exponentiation that isn't special-cased. To be clear, method C (internally) goes the same way (the
pow function inverts the base and negates the exponent before proceeding), but with
1 as the exponent the rest of the calculation terminates quickly.
If you're on Python 3.8+, use method C, it's both the fastest and the cleanest (IMHO). If you're stuck on an older version of Python and need more performance, prefer method A over method B. I haven't been able to think of a good use case for method D.
Note that
none of the above methods are safe to use with respect to side-channel attacks; they all leak information via small argument-dependent differences in execution time and/or power consumption. That's a very academic threat for most users though, so don't let that stop you from learning/experimenting. If you're writing a professional wallet, then you'll have to step up to more advanced techniques.