....
2. Outside the Box
IMO, for any substantial optimizations, it is required to think outside the box. The box here being the libsecp256k1 library. This library provides us with an API - a set of functions - which is functionally complete, but may sometimes be obstructive for certain tasks. If you look at the use case from above, it would certainly be nice if we had a function that could efficiently sum up affine points into jacobian.
That's why I started to hack my own libsecp256k1, extending it with functionality for the LBC use case (public key generation).
I made a very simple python script, 2 files -->
https://www.dropbox.com/s/xr2ypa5zplry/ecc_for_collider.zip?dl=0ecc_for_collider.py (a very small library)
gen_couple_points.py (a test program, it computes kG+mG, kG-mG, given kG and mG)
The script works
*********************************************************************
ecc_for_collider.pyfunction
add_a_a_j(x1, y1, x2, y2)
--> input kG=(x1,y1) , mG=(x2,y2)
--> output kG + mG = (x3,y3,z3) (0<m<2049)
-->
4M + 2S h=(x2-x1) % p
r=(y2-y1) % p
a=r**2 % p # 1S
b=h**2 % p # 1S
c=b*h % p # 1M c=h**3
d=x1*b % p # 1M d=x1*h**2
e=y1*c % p # 1M e=y1*h**3
x3 = (a-c-2*d) % p # 0M r**2 - h**3 - 2*x1*h**2
y3 = (r*(d-x3)-e) % p # 1M r*(x1*h**2 - x3) - y1*h**3
z3 = h # 0M x2-x1
This is the classic "mixed" jacobian-affine addition (with Z1 and Z2 = 1) ("A"+"A" --> J)
I use only affine coordinates (for the operands), because I know already all coordinates of any points kG, G, 2G, mG, 2048G, this is the first advantage of this method.
function
symmetric(x1,y1,x2,y2,x3,z3inv,z3inv2)
This function exploits the symmetry of kG+mG / kG-mG respect of kG
--> input: kG, mG, kG+mG (all in affine coordinates!), (z3)^-1 and (z3)^-2 of kG+mG
--> output: kG-mG (in affine coordinates)
-->
4M (including jacobian to affine)
def symmetric(x1,y1,x2,y2,x3,z3inv,z3inv2):
x4 = (x3+4*(y1*y2)*z3inv2) % p #2M
y4 = (z3inv*(x4-x1)*(y1+y2) -y1) % p #2M
return x4,y4
If kG = (x1,y1), mG = (x2,y2) , kG+mG = (x3,y3) then you have:
kG-mG = (x4,y4) = ( x3+4*y1*y2)/(x2-x1)^2 , (x4-x1)*(y1+y2)/(x2-x1) -y1 )
*******************************************************************
First compute kG (in affine coordinates, I changed my mind) (remember: my k stands for your k+2048)
You could compute then:
1) first k+1, k+2, k+3, ..., k+2048 ("A"+"A"->"J") 4M + 2S for each point with the function add_a_a_j
2) then jacobian to affine change 6M + 1S for 2048 points (you have already this function, don't look at mine for that)
3) then you compute k-1, k-2, k-3, ...., k-4, k-2048 ( at 1) you have to memorize the inverse of k+1,k+2,k+3,... to do that): 4M using the function symmetric.
Total: 14M + 3S for 2 points,
7M + 1,5S for each point (including jacobian to affine)
I didn't took care of the generation of the first point, k*G (more precisely: (k+2048)G)
With my proposal you have to perform 2 inverse for batch, 1 to get (k+2048)G in affine coordinates, 1 to the points from k+2049 to k+4096. You save then 3M x 2048 points (at least)
-----------------------------------------------------------------------------------------------------------------------------
EDIT: maybe an another improvement is possible: if we use the symmetric function for generate all points?
If we have kG and kG+mG (and mG) --> we get kG-mG and viceversa (with symmetric function)
if we have kG and kG-mG (and mG) --> we get kG+mG .
Now,
if we have kG+mG and kG (and mG), we can get kG+2mG!
Infact kG and kG+2mG are symmetric respect to the point kG+mG kG=(kG+mG)-mG, kG+2mG=(kG+mG)+mG
The only problem is: we don't have yet kG+mG in affine coordinates, I have to do some computations... maybe tomorrow