citb0in (OP)
|
Hello everybody, in the last days I am trying various approaches in Python to generate random private keys (hex) and the corresponding bitcoin address (optionally uncompressed, compressed, bech32 or mixed) as efficiently as possible and with good performance. There are several different approaches. I am actually very satisfied with the "bit" library from Ofek, because it is the fastest I experienced so far. However, if you also want to generate bech32 addresses, you have to do some manual extra work and the speed advantage of bit is then lost. So, as a very good and satisfying average I find the use of fastecdsa and iceland2k14/secp256k1 very fast. EDIT:Note1:I didn't know that the Python library iceland2k14/secp256k1 was written natively in C++ and is just included in Python. Also, this library is closed-source, which means the source code is not known. So far, no security vulnerabilities have become known in this regard, but this point still deserves attention. There are some tools in the Bitcoin community that use this library for fast calculations. Note2:As I progressed, I found that the Python library fastecdsa is too oversized for generating random private keys and is not optimal in terms of performance. The secure Python library "secrets" on the other hand is fast when choosing randomness but also fast. It results in over +32 % better performance.
In several of my python programs I am testing, I have integrated a performance counter, which shows me the total number of addresses and the calculated rate (addresses/sec) while the program is running. However, I have taken this out in this examples to keep the code as simple and clear as possible. For relative ease of comparison, I do create 1 million randomly generated keys from which the Bech32 Bitcoin address (prefix 'bc1q' is then calculated. I write the output into a file, so I can check it afterwards to ensure everything went correct and the data is correct. I "time" the Python program to see how long it took to create the 1 million addresses. Let's get into it... Here is an example which generated bech32 addresses: #!/usr/bin/env python3 # 2022/Dec/26, citb0in from fastecdsa import keys, curve import secp256k1 as ice
# Set the number of addresses to generate num_addresses = 1000000
# Open a file for writing with open('addresses_1M.out', 'w') as f: # Generate and write each address to the file for i in range(num_addresses): prvkey_dec = keys.gen_private_key(curve.P256) addr = ice.privatekey_to_address(2, True, prvkey_dec) f.write(f'{addr}\n')
It took 82 seconds to generate those 1 million addresses. This is a rate about 12,195 addresses/sec which is way too slow. real 1m22,192s user 1m21,461s sys 0m0,640s
I often read about numpy and from what I could find on the internet about it, it sounded interesting. So I tried to rewrite the code to use numpy and compare the results. #!/usr/bin/env python3 # 2022/Dec/26, citb0in import numpy as np import fastecdsa.keys as fkeys import fastecdsa.curve as fcurve import secp256k1 as ice
# how many addresses to generate num_addresses = 1000000
# Generate a NumPy array of random private keys using fastecdsa private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(num_addresses)])
# Use secp256k1 to convert the private keys to addresses addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])
# Write the addresses to a file np.savetxt('addresses_numpy.out', addresses, fmt='%s')
real 1m19,636s user 1m18,826s sys 0m1,027s
As you can easily see, this did not bring any speed advantage, if you disregard the 1-2sec difference. However, what caused significant speed boost was the attempt to distribute the program code into several threads and thus enable parallel processing. As I didn't see any disadvantage by using numpy I kept the numpy part in the code. Here is the extended python code: #!/usr/bin/env python3 # 2022/Dec/26, citb0in import threading import numpy as np import fastecdsa.keys as fkeys import fastecdsa.curve as fcurve import secp256k1 as ice
# Set the number of addresses to generate and the number of threads to use num_addresses = 1000000 num_threads = 16
# Divide the addresses evenly among the threads addresses_per_thread = num_addresses // num_threads
# Create a list to store the generated addresses addresses = []
# Define a worker function that generates a batch of addresses and stores them in the shared list def worker(start, end): # Generate a NumPy array of random private keys using fastecdsa private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(start, end)])
# Use secp256k1 to convert the private keys to addresses thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])
# Add the addresses to the shared list addresses.extend(thread_addresses)
# Create a list of threads threads = []
# Create and start the threads for i in range(num_threads): start = i * addresses_per_thread end = (i+1) * addresses_per_thread t = threading.Thread(target=worker, args=(start, end)) threads.append(t) t.start()
# Wait for the threads to finish for t in threads: t.join()
# Write the addresses to a file np.savetxt('addresses_1M_multithreaded.txt', addresses, fmt='%s')
My system has a total of 16 threads available, so I distributed the load across all threads for that test. Now look at this: $ time python3 create_1M_addresses_using_fastecdsa_and_ice_numpy_multithreaded.py
real 0m19,764s user 0m38,147s sys 0m6,367s
It took only 20 seconds to generate 1 million addresses. That is equal to the rate of 50.000 addresses/sec. Much better but I think it's still too slow. I did realize through Linux tool "htop" that the 16 available threads on my machine were not utilized all and not fully with 100%. So the multithreading part seemed to work ok, but not as I expected. So I looked for more ways to properly distribute the load and found what I was looking for: #!/usr/bin/env python3 # 2022/Dec/26, citb0in import concurrent.futures import os import numpy as np import fastecdsa.keys as fkeys import fastecdsa.curve as fcurve import secp256k1 as ice
# Set the number of addresses to generate num_addresses = 1000000
# Define a worker function that generates a batch of addresses and returns them def worker(start, end): # Generate a NumPy array of random private keys using fastecdsa private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(start, end)])
# Use secp256k1 to convert the private keys to addresses thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])
return thread_addresses
# Use a ProcessPoolExecutor to generate the addresses in parallel with concurrent.futures.ProcessPoolExecutor() as executor: # Divide the addresses evenly among the available CPU cores addresses_per_core = num_addresses // os.cpu_count()
# Submit a task for each batch of addresses to the executor tasks = [] for i in range(os.cpu_count()): start = i * addresses_per_core end = (i+1) * addresses_per_core tasks.append(executor.submit(worker, start, end))
# Wait for the tasks to complete and retrieve the results addresses = [] for task in concurrent.futures.as_completed(tasks): addresses.extend(task.result())
# Write the addresses to a file np.savetxt('addresses_1M_with_ProcessPool.txt', addresses, fmt='%s')
real 0m4,768s user 0m54,444s sys 0m1,894s
Perfect! Now I'm under 5 seconds for 1 million addresses. That's a rate of roughly 208,000 addresses/secHow can we further improve the performance without using GPU? Is there anything else you can suggest, that will speed up the process? Let's make a contest-like game. Create a simple code like this and try to beat the high score of currently 4.7 seconds for 1million bech32 addresses without using GPU. The next step would be ==> how can we shuffle this code to the GPU for ultra-fast performance ?
|
_ _ _ __ _ _ _ __ |_) | / \ / |/ (_ / \ | \ / |_ |_) (_ |_) |_ \_/ \_ |\ __) \_/ |_ \/ |_ | \ __) --> citb0in Solo-Mining Group <--- low stake of only 0.001 BTC. We regularly rent about 5 PH/s hash power and direct it to SoloCK pool. Wanna know more? Read through the link and JOIN NOW
|
|
|
ymgve2
|
|
December 28, 2022, 03:37:47 AM |
|
I see that you're using gen_private_key. This generates arbitrary secure random private keys, using urandom. urandom might be slow depending on your system - do you really need it? I assume whatever you're doing (brute force search?) will generate the private keys in some other fashion, so you can just set prvkey_dec to an integer representing the private key directly.
I also see that iceland2k14/secp256k1 does not provide the source for its libraries which is a bit concerning.
|
|
|
|
pooya87
Legendary
Offline
Activity: 3626
Merit: 11027
Crypto Swap Exchange
|
|
December 28, 2022, 03:47:49 AM |
|
Let's make a contest-like game. Create a simple code like this and try to beat the high score of currently 4.7 seconds for 1million bech32 addresses without using GPU.
It has to have a purpose otherwise it is a complete waste of time even if someone could optimize the code to generate 1 billion addresses in 5 seconds.
|
|
|
|
citb0in (OP)
|
|
December 28, 2022, 08:12:45 AM |
|
Let's make a contest-like game. Create a simple code like this and try to beat the high score of currently 4.7 seconds for 1million bech32 addresses without using GPU.
It has to have a purpose otherwise it is a complete waste of time even if someone could optimize the code to generate 1 billion addresses in 5 seconds. I think I understand what you mean. There is no prize money, so I better rewrite my previous sentence: Let's make a contest-like game. Create a simple code like this and try to beat the high score of currently 4.7 seconds for 1million bech32 addresses without using GPU.I don't see it as a waste, if someone makes helpful and further posts in the forum because for this not only the current forum community benefits but also those who will read here in the future. I see it as an enrichment for the forum. We could change the rules to say that the use of GPU via CUDA should be allowed. I myself have tried various approaches so far using pycu and numba, but I was always unsuccessful. So if someone can do this simple task also within CUDA and can present the result, please feel free to post here. I think everyone will benefit from it. I see that you're using gen_private_key. This generates arbitrary secure random private keys, using urandom. urandom might be slow depending on your system - do you really need it?
Honestly said, I don't care about if urandom is used or random, I just want to quickly generate random numbers for serving as a privkey. Can you suggest a faster approach which can replace this task? I also see that iceland2k14/secp256k1 does not provide the source for its libraries which is a bit concerning.
what concerns in detail you have, can you explain, please?
|
_ _ _ __ _ _ _ __ |_) | / \ / |/ (_ / \ | \ / |_ |_) (_ |_) |_ \_/ \_ |\ __) \_/ |_ \/ |_ | \ __) --> citb0in Solo-Mining Group <--- low stake of only 0.001 BTC. We regularly rent about 5 PH/s hash power and direct it to SoloCK pool. Wanna know more? Read through the link and JOIN NOW
|
|
|
citb0in (OP)
|
|
December 28, 2022, 11:11:10 AM |
|
From what I know iceland library has no malicious code in it.
One of my teachers used to say: "Thinking is not knowing" Since it is possible to put payload to functions. ... Or use firewall to see all network activity.
why would the author iceland2k14 then hide the source code by not disclosing it? Especially if he still uses source code from AlbertoBSD and Jean-Luc-Ponds? Apart from the fact that this would be illegal as far as I know, since the tools mentioned use appropriate GNU licenses, why would iceland2k14 hide the code if there was nothing to hide? I think this is an interesting point that ymgve2 has made here. I hadn't given this library any serious thought before, but he's right in a way. To the extent that we as users of this closed-source library don't know if malicious functions are being run in the background or data is being exfiltrated, I see that as a danger as well. I will therefore consider not using this library until I am convinced otherwise. Thanks to both of you for this important advice. Python can never be a good choice for fast code. It is good to try out something fast. Real-deal programs use C/C++ and CUDA.
I understand that while Python code executed on the GPU using these libraries can be very fast, it may not be as fast as native C++ code running on the GPU. This is because the Python interpreter introduces some overhead, and the libraries themselves may have some overhead as well. However, for many applications, the speedup provided by running on the GPU can more than make up for any overhead introduced by Python, and can still result in significantly faster calculations than would be possible using only the CPU. That is why I think it is reasonable to get such calculations running on the GPU, it should be possible with Python. Python can be used for fast calculations with CUDA by using libraries that provide Python bindings for CUDA. These libraries allow you to write code in Python that can be executed on the GPU, which can provide significant speedups for certain types of calculations. Examples of such libraries that provide Python bindings for CUDA are -as already mentioned before- PyCUDA or Numba. PyCUDA allows writing code in Python that is executed on the GPU using CUDA. It provides a Pythonic interface to the CUDA programming model, making it easy to use for Python developers. Numba, which is a just-in-time (JIT) compiler for Python that can automatically optimize Python code for execution on the GPU. Numba can be used to write GPU-accelerated code in Python using familiar Python syntax, and it can provide significant performance improvements for certain types of calculations. Unfortunately I wasn't able to successful use those two libraries until now, hopefully anyone else can give some insight and assistance here. Sample code is provided on my first post.
|
_ _ _ __ _ _ _ __ |_) | / \ / |/ (_ / \ | \ / |_ |_) (_ |_) |_ \_/ \_ |\ __) \_/ |_ \/ |_ | \ __) --> citb0in Solo-Mining Group <--- low stake of only 0.001 BTC. We regularly rent about 5 PH/s hash power and direct it to SoloCK pool. Wanna know more? Read through the link and JOIN NOW
|
|
|
pooya87
Legendary
Offline
Activity: 3626
Merit: 11027
Crypto Swap Exchange
|
|
December 28, 2022, 12:32:39 PM |
|
I think I understand what you mean. There is no prize money, so I better rewrite my previous sentence: Let's make a contest-like game. Create a simple code like this and try to beat the high score of currently 4.7 seconds for 1million bech32 addresses without using GPU.
Prize is incentive not purpose. A purpose could be something like a tool that would generate vanity addresses so you'd need the fastest possible algorithm and implementation that is capable of checking very large number of addresses per second. Or the purpose could be a recovery tool where you have 8-9 words out of a 12 word seed phrase so you need to check millions of addresses that is generated from it at different derivation paths to find the right combination. The importance of "purpose" is that it will also clarify HOW to implement it. Right now if I understood the code correctly, you are just aimlessly writing random addresses to disk so there is not much to work with. But if the example I used is the purpose (ie vanity address), there is no need for IO and its bottleneck hence the algorithm changes, there is also no need for Base58 or Bech32 encoding, there is also no need for ECMult each permutation. These very simple changes lead to significant increase in speed.
|
|
|
|
citb0in (OP)
|
|
December 28, 2022, 01:01:59 PM |
|
I think you misunderstood. The whole should not be a project that I commission here and want to have programmed free of charge by someone. On the contrary, it is about learning and understanding so that you can use it later in various projects. It's not about me, but in general about a kind of knowledge database, from which the whole community benefits. I read here in various posts the most diverse technical questions, often even one and the same the topic. Or it is tried to reinvent the wheel. There are a few who are very good and experiences in developing good code and the matter, but knowledge is shared sparsely, so at least my impression. I am honestly said a newbie in this matter who tries to get in and understand above all. I don't have a real intention, so it's not really about finding vanity addresses or cracking puzzles. There are already solutions for that and I can't develop on those if solid basic knowledge is missing. I want to learn and maybe some day I can find ways to optimize or help others to fulfill this task. The road is the goal.
The example I showed is the basic building block of all conceivable tools that are used in this topic. If Ofek Lev had used an existing library and made do with it, there would have been no "bit" library. But he has thought about how to optimize and improve, or shortcut ways to make the whole thing more efficient. That's what it's all about. I see no disadvantage in sharing such knowledge here, I see clear advantages. If you would work together to optimize and extend such basic calculations like creating an address, a key or derivations of it, then everybody would benefit from it because it is shared publicly. This base can be used on every bitcoin tool out there. So far, unfortunately, I have not found any explanations or examples here that illustrate how such calculations would be implemented using CuPy, Numba or JIT, ergo also the origin of this thread. I would like to optimize Python code that make use of GPU and make the calculcations as fast as possible.
If you or anyone else can contribute factually, I would greatly appreciate it on behalf of the forum community and all future readers here. But describing the whole thing as pointless from the ground up helps neither me, nor you, nor anyone else.
|
_ _ _ __ _ _ _ __ |_) | / \ / |/ (_ / \ | \ / |_ |_) (_ |_) |_ \_/ \_ |\ __) \_/ |_ \/ |_ | \ __) --> citb0in Solo-Mining Group <--- low stake of only 0.001 BTC. We regularly rent about 5 PH/s hash power and direct it to SoloCK pool. Wanna know more? Read through the link and JOIN NOW
|
|
|
ABCbits
Legendary
Offline
Activity: 3052
Merit: 8073
Crypto Swap Exchange
|
|
December 29, 2022, 12:15:45 PM |
|
Let's make a contest-like game. Create a simple code like this and try to beat the high score of currently 4.7 seconds for 1million bech32 addresses without using GPU.
Since fastecdsa already utilize C for heavy operation, i doubt we can see major improvement without either 1. Modify some part of fastecdsa to use assembly. 2. Rewrite everything using C/C++/Rust and assembly.
|
|
|
|
n0nce
|
|
December 29, 2022, 05:28:52 PM |
|
Here is an example which generated bech32 addresses: ~snip~ It took 82 seconds to generate those 1 million addresses. This is a rate about 12,195 addresses/sec which is way too slow. real 1m22,192s user 1m21,461s sys 0m0,640s
I was curious to try out a Rust implementation and took the first thing I could find on the web when looking for 'Rust Bitcoin address generator'. So far, very disappointing. I added a loop of 1000 iterations to this, and some disk I/O instead of the print statements: https://github.com/mrgian/bitcoin-address-generatorIt takes around 5s for 1000 addresses, 538.78s for 100k addresses. Sure, single-threaded, but your initial Python implementation was single-threaded too and just took 82s for 1 million addresses. As ETFbitcoin mentioned, those Python libraries you use are probably heavily optimized already.
|
|
|
|
citb0in (OP)
|
|
December 29, 2022, 06:05:41 PM |
|
next step is to shift the load to GPU with pycuda, numba, jit or similar. To use the wrapper PyCUDA we need to modify the code to offload the address generation and private key conversion tasks to the GPU. This can be done by defining GPU kernels that perform these tasks and then using PyCUDA's DeviceAllocation and Kernel functions to run these kernels on the GPU. Unfortunately I was unsucessful with that task.
|
_ _ _ __ _ _ _ __ |_) | / \ / |/ (_ / \ | \ / |_ |_) (_ |_) |_ \_/ \_ |\ __) \_/ |_ \/ |_ | \ __) --> citb0in Solo-Mining Group <--- low stake of only 0.001 BTC. We regularly rent about 5 PH/s hash power and direct it to SoloCK pool. Wanna know more? Read through the link and JOIN NOW
|
|
|
arulbero
Legendary
Offline
Activity: 1938
Merit: 2080
|
|
December 30, 2022, 11:06:04 AM |
|
real 0m4,768s user 0m54,444s sys 0m1,894s
Perfect! Now I'm under 5 seconds for 1 million addresses. That's a rate of roughly 208,000 addresses/secHow can we further improve the performance without using GPU? Is there anything else you can suggest, that will speed up the process? Let's make a contest-like game. Create a simple code like this and try to beat the high score of currently 4.7 seconds for 1million bech32 addresses without using GPU. The next step would be ==> how can we shuffle this code to the GPU for ultra-fast performance ? Long time ago I wrote a python script that computes 16.4 million of consecutive (non random) public keys (not addresses) in 12.3 s. No numpy, pure python, only 1 thread. ecc.py #see http://www.secg.org/sec2-v2.pdf at page 9
#Gx=55066263022277343669578718895168534326250603453777594175500187360389116729240 #Gy=32670510020758816978083085130507043184471273380659243275938904335757337482424 #p=115792089237316195423570985008687907853269984665640564039457584007908834671663 #n=115792089237316195423570985008687907852837564279074904382605163141518161494337
Gx=0x79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798 Gy=0x483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8 p=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F # Fp n=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141 # E(Fp)
#endomorphism lambd=0x5363ad4cc05c30e0a5261c028812645a122e22ea20816678df02967c1b23bd72 # (lambda^3 = 1 mod n) lambd2=0xac9c52b33fa3cf1f5ad9e3fd77ed9ba4a880b9fc8ec739c2e0cfc810b51283ce #(lambda^2)
beta=0x7ae96a2b657c07106e64479eac3434e99cf0497512f58995c1396c28719501ee # (beta^3 = 1 mod p) beta2=0x851695d49a83f8ef919bb86153cbcb16630fb68aed0a766a3ec693d68e6afa40 # (beta^2) ############################################################################################ ###Field operations
#a must be a number between 1 and p-1; input: a output: 1/a #see http://www.springer.com/?SGWID=4-102-45-110359-0 page 40 alg. 2.20 def inv(a,p): u, v = a%p, p x1, x2 = 1, 0 while u != 1: #q = v//u #r = v-q*u q, r = divmod(v,u) x = x2-q*x1 v = u u = r x2 = x1 x1 = x return x1%p
#inverse of a batch of x #input: batch of x : [x1,x2,x3,x4,x5,x6,x7,x8,...,x2048] #x is known (x of kG) #output: batch of inverse: 1/(x1-x), 1/(x2-x), 1/(x3-x), .... #3M for each inverse def inv_batch(x,batchx,p):
a = (batchx[1]-x) % p partial= [0]*2049 partial[1]=a for i in range(2,2049): a = (a*(batchx[i]-x)) % p #2047M partial[i]=a
inverse=inv(partial[2048],p) # 1I batch_inverse=[0]*2049 for i in range(2048,1,-1): batch_inverse[i-1]=(partial[i-1]*inverse) % p #2047M inverse=(inverse*(batchx[i]-x)) %p #2047M
batch_inverse[0]=inverse
return batch_inverse ############################################################################################ ##Group operations
# https://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication#Point_addition #'double' addition #add 2 points P and Q --> P+Q #add 2 points P and Q --> P-Q # #(X1,Y1) + (X2,Y2) --> (X3,Y3) (the inverse of (x2-x1) must be known) #(X1,Y1) + (X2,-Y2) --> (X4,Y4) (the inverse of (x2-x1) must be known) # #input: (X1,Y1), (X2,Y2), the inverse of (X2 - X1): 5 scalars #output: (X3,Y3), (X4,Y4) : 4 scalars #4M + 2S
def double_add_P_Q_inv(x1, y1, x2, y2, invx2x1):
dy = (y2-y1) #% p a = dy*invx2x1 % p #1M a2 = a**2 #1S x3 = (a2 - x1 -x2) % p y3 = (a*(x1-x3) -y1) % p #1M
dy = p-(y2+y1) #% p # only the sign of y2 changes, "y of -Q = -y of +Q" a = dy*invx2x1 % p #1M x2 and then the inverse invx2x1 are the same a2 = a**2 #1S x4 = (a2 - x1 -x2) % p y4 = (a*(x1-x4) -y1) % p #1M return x3, y3, x4, y4
#https://en.wikibooks.org/wiki/Cryptography/Prime_Curve/Jacobian_Coordinates #jacobian coordinates def add_j_j(jax, jay, jaz, jbx, jby, jbz): u1 = (jax*jbz**2) % p u2 = (jbx*jaz**2) % p s1 = (jay*jbz**3) % p s2 = (jby*jaz**3) % p h = (u2-u1) % p r = (s2-s1) % p jcx = (r**2 -h**3-2*u1*h**2) % p jcy = (r*(u1*h**2-jcx)-s1*h**3) % p jcz = (h*jaz*jbz) % p return jcx, jcy, jcz
def double_j_j(jx, jy, jz): s = (4*jx*jy**2) % p m = (3*jx**2) % p jx2 = (m**2 - 2*s) % p jy2 = (m*(s-jx2) - 8*jy**4) % p jz2 = (2*jy*jz) % p return jx2, jy2, jz2
def mul(k,jx,jy,jz):
jkx, jky, jkz = 0, 0, 1 if (k%2 == 1): jkx, jky, jkz = jx, jy, jz #only if d is odd q=k//2 while q>0: jx, jy, jz = double_j_j(jx,jy,jz) a=q%2 if (a > jkx): jkx, jky, jkz = jx, jy, jz #only if d is even elif (a): jkx, jky, jkz = add_j_j(jx, jy, jz, jkx, jky, jkz) q=q//2 return jkx, jky, jkz
############################################################################################ #Coordinates
#from jacobian to affine def jac_to_aff(jax, jay, jaz, invjaz):
ax, ay = (jax*invjaz**2) % p, (jay*invjaz**3) % p return ax, ay
gen_batches.py #!/usr/bin/env python
#this script computes 3x1334 batchs of 4097 points (1 single point + 2048 couples of points -> (kG+mG, kG-mg)) #16,4 millions of points #3 : 1 batch + 2 endomorphism
from ecc import *
######################################################################## #in this section the script pre-computes 1G, 2G, 3G, ..., 2048G mG = [(0,0)]*2049 mGx = [0]*2049 mGy = [0]*2049
mGx[1]=Gx mGy[1]=Gy mG[1]=(Gx,Gy) #list of multiples of G, #you have to precompute and store these multiples somewhere
mGx[2]=0xC6047F9441ED7D6D3045406E95C07CD85C778E4B8CEF3CA7ABAC09B95C709EE5 mGy[2]=0x1AE168FEA63DC339A3C58419466CEAEEF7F632653266D0E1236431A950CFE52A mG[2]=(mGx[2],mGy[2])
for i in range(3,2049): jx,jy,jz=add_j_j(Gx, Gy, 1, mGx[i-1], mGy[i-1], 1) #this is not a efficient way, but it is not important jzinv = inv(jz,p) (x,y) = jac_to_aff(jx, jy, jz, jzinv) mG[i] = (x,y) mGx[i] = x mGy[i] = y
########################################################################### lowest_key=2**55+789079076 #put here the lowest key you want to generate number_of_batches=1334 #put here how many consecutive batches you want to generate number_of_keys=4097*3*number_of_batches #this is the number of consecutive keys you are going to generate #this number is always a multiple of 3*4097=12291, because 12291 is the minimum size #remember: you generate always 3 batches at once, 1 in (1,2^160) and 2 out of this range #their name are: batch (the main), batch2, batch3 (that are derived from the main with endomorphism)
id_batch=lowest_key+2048 #id_batch is now the private key of the middle point of the first batch that you want compute distance_between_successive_batches=4097 #this is the distance from the middle point of the first batch and the middle point of the successive batch in (1,2^160) #4097 means that I'm generating only consecutive batches in (1,2^160), this is the default
start=id_batch #the first key to generate of the the first batch is the lowest key + 2048 --> the middle point of the first batch stop=id_batch + number_of_batches*distance_between_successive_batches #the key of the middle point of the last batch
#k is always the id of the current main batch and the key of the middle point of the current main batch for k in range(start,stop,distance_between_successive_batches):
jkx,jky,jkz = mul(k,Gx,Gy,1) #here we use the slow function mul to generate the middle point invjkz = inv(jkz,p) (kx,ky) = jac_to_aff(jkx, jky, jkz, invjkz) kminverse = [0]*2048 kminverse = inv_batch(kx,mGx,p) #you need to compute 2048 inverse, 1 for each couple #to perform this task you need only the x coordinates of the multiples of G and the x coordinate of kG: kx kminverse = [invjkz]+kminverse #the element number 0 of the list is the inverse invjkz that you have computed to get kG=(kx,ky) #the element number 1 of the list is the inverse necessary to perform kG + 1G (and kG-1G) ---> 1/(x_of_1G - kx) #the element number 2 of the list is the inverse necessary to perform kG + 2G (and kG-2G) ---> 1/(x_of_2G - kx) #... #the element number m of the list is the inverse necessary to perform kG + mG (and kG-mG) ---> 1/(x_of_mG - kx) #then the name: "kminverse", the inverse to get the generic kG+mG #... #the element number 2048 of the list is the inverse necessary to perform kG+2048G and kG-2048G ---> 1/(x_of_2048G - kx) kxl = [kx] * 2049 #this is a list of kx, all elements are the same, only to use the function map of python kyl = [ky] * 2049 #this is a list of ky, all elements are the same, only to use the function map of python batch=[(0,0,0,0)]*2048 #the element number 0 of the list will be the middle point kG=(kx,ky), you can see it as the couple ((k+0)G, (k-0)G) #the element number 1 will be the couple (k+1)G, (k-1)G #the element number 2 will be the couple (k+2)G, (k-2)G #... #the element number 2048 will be the couple (k+2048)G, (k-2048)G #each element of this list is a couple of points -> 4 scalars: x_(k+m)G,y_(k+m)G,x_(k-m)G,y_(k-m)G, #but the first element is only a fake couple batch = list(map(double_add_P_Q_inv,kxl[1:],kyl[1:],mGx[1:],mGy[1:],kminverse[1:])) #the function double_add_P_Q_inv computes kG +/-mG, it uses the list of the inverse and return a list of couple of points #each element of this list is a couple of points, 4 scalars, x_(k+m)G,y_(k+m)G,x_(k-m)G,y_(k-m)G #this list has 2049 elements batch = [(kx,ky,kx,ky)]+batch #the element number 0 of the list is now the middle point kG=(kx,ky), you can see it as the fake couple (k+0)G, (k-0)G batch2=[0] * 2049 #this is the batch2 with lambda*kG, ( lambda*(k+1)G, lambda*(k-1)G ), ( lambda*(k+2)G, lambda*(k-2)G ), ...., # (lambda*(k+2048)G, lambda*(k-2048)G) #this list actually has only the x-cooordinates of the points, because the y-coordinates are the same of main batch #each element of this list is a couple of x -> 2 scalars: x_lambda*(k+m)G, x_lambda*(k-m)G #this list has 2049 elements
batch3=[0] * 2049 #this is the batch3 with lambda^2*kG, ( lambda^2*(k+1)G, lambda^2*(k-1)G ), ( lambda^2*(k+2)G, lambda^2*(k-2)G ), ...., # (lambda^2*(k+2048)G, lambda^2*(k-2048)G) #this list actually has only the x-cooordinates of the points, because the y-coordinates are the same of main batch #each element of this list is a couple of x -> 2 scalars: x_lambda^2*(k+m)G, x_lambda^2*(k-m)G #this list has 2049 elements for i in range(0,2049): #endomorphism batch2[i] = [(batch[i][0]*beta % p), (batch[i][2]*beta % p)] #only x coordinates, the first and the third scalar of each element of the main batch #private key: lambda*k mod n, coordinate x: beta*x mod p batch3[i] = [(batch[i][0]*beta2 % p), (batch[i][2]*beta2 % p)] #only x coordinates, the first and the third scalar of each element of the main batch #private key: lambda^2*k mod n, coordinate x: beta^2*x mod p
#k in this case is the id of the last batch (or if you prefer, k is the private key of the middle point of the last batch) m=465 #a random number between 0 and 2048, you can pick up a couple of points in the last main batch (or in batch2 / batch3) (kmx1,kmy1,kmx2,kmy2)=batch[m] (kmx1,kmx2)=batch3[m] #in this case the points chosen are in batch3 #the y are from main batch, because batch3 (and batch2) have only x coordinates
print ('kG+mG') print (hex(lambd2*(k+m)%n)) #private key print (hex(kmx1), hex(kmy1)) #public key (first and second coordinate) print ('*******')
print ('kG-mG') print (hex(lambd2*(k-m)%n)) #private key print (hex(kmx2), hex(kmy2)) #public key (first and second coordinate) print ('*******')
print (' ') a=number_of_keys//1000000 print ('You have just generated ' + str(a) + ' Millions of keys!!!')
time python3 gen_batches.py You have just generated 16.4 Millions of keys!!!
real 0m12,303s user 0m12,302s sys 0m0,000s
Generating consecutive public keys is way faster than generate random public keys.
|
|
|
|
citb0in (OP)
|
|
December 30, 2022, 01:01:18 PM Last edit: December 30, 2022, 01:43:30 PM by citb0in |
|
I made quick comparison by editing lowest_key=1000000 and number_of_batches=82 ...
what was the reason therefore? lowest_key is just the number to start the consecutive run IMHO. And I doubt that you generated 1,007,862 MILLION (!) in 1.6 seconds You have just generated 1007862 Millions of keys!!!
real 0m1.626s
With number_of_batches = 82 you should have been generated 1,007,862 keys (which is about 1 million) which the result for of 1.6 is realistic. You could replace the very last line of arulsbero's code in gen_batches.py to get a more clear output: #print ('You have just generated ' + str(a) + ' Millions of keys!!!') print(f'You have just generated {f"{number_of_keys:,.0f}"} keys ({str(a)} MKeys)', end='\n')
We should take into consideration that arulbero's way does not generate the corresponding addresses and not saving them into a file, we need to rewrite the code and see how it behaves. @arulbero: thanks for sharing, sounds interesting. I need to dig into it, unfortunately I have to go right now but I certainly will check your code later when back at home. I need to think about the consecutive vs. random part and make some comparisons each other. Currently when I run your code it says it generated 1 million keys in average of 1.22 seconds. I just need to test if this speed is realistisc. Can you modify your code so it will generate an address and output all the generated addresses to a file called address.out ?
|
_ _ _ __ _ _ _ __ |_) | / \ / |/ (_ / \ | \ / |_ |_) (_ |_) |_ \_/ \_ |\ __) \_/ |_ \/ |_ | \ __) --> citb0in Solo-Mining Group <--- low stake of only 0.001 BTC. We regularly rent about 5 PH/s hash power and direct it to SoloCK pool. Wanna know more? Read through the link and JOIN NOW
|
|
|
arulbero
Legendary
Offline
Activity: 1938
Merit: 2080
|
|
December 30, 2022, 03:08:58 PM |
|
@arulbero: thanks for sharing, sounds interesting. I need to dig into it, unfortunately I have to go right now but I certainly will check your code later when back at home. I need to think about the consecutive vs. random part and make some comparisons each other. Currently when I run your code it says it generated 1 million keys in average of 1.22 seconds. I just need to test if this speed is realistisc. Can you modify your code so it will generate an address and output all the generated addresses to a file called address.out ?
I don't have a fast 'pub_to_addr' function from 12.3 s to 3m and 9s to generate 16.4 M of addresses, 16.4 M / 190 s = about 86k addresses/s. 12 seconds to generate 16.4 M public keys, almost 3m to compute 16.4 M addresses. time python3 gen_batches.py > address.out
real 3m8,827s user 3m8,252s sys 0m0,520s
wc address.out
8200098 16400196 630754794
less address.out
('1K9oVg45UaApB1SmkxKFBZPCKsj2XiKCYw', '1K9oVg45UaApB1SmkxKFBZPCKsj2XiKCYw') ('13hXNkmedXJ5UXt4RY5rLGAZGBgeSHUNT7', '17G6zU4yVBM8WJ7TvJvkVkZr8qtJheAYVy') ('1LZtUG1S8V7yfwthDfJBy1Yg7LWxpZKmQX', '1PDBxaiy3bp8woFbrErNAPoMF8W442B1Nb') ('1CAapBviMeMp91UjqgTaUfr1tGc5Lsvhgx', '1Lmrc69RFRbwGUc3UTRGxumAApEDsJKUrf') ('1JNGqSWZdJFptRrLaCtmUJ6Hq1necKuTCd', '18aQxp32qDDHEFWgVjexS1BG9MdFbQAc6N') ('1CMNQMqqZdyABzU5bms6sE1J4hz2z7w86J', '1Awa5kf9npJ9sJKK65yA1vXFXaqC75tWvt') ('1GXDr1MF5ee5HfZHK98QQrDb8Mk785Hwtj', '14R3fvJcA8SSFgiCAPSQCvrquEG1NM7JcQ') ('1G578XchMTjHTuWduB37XNzNxEyvKxNNGC', '1E2aoTr22SDH18NJdamF3zqjztjtA5i5Cz') ('13BdD8vK9xaN1aw2WXn73rtLfjFvLAs3ES', '1Hgi86UjskxQpCmiKWPBK4Qjr97awHgZ7q') ('1Bnh87jjfNFy9QmRmnsW9vEDWbMDbBbjrF', '1P5wAX3U4v4oNde8yGUcg4YHwQdGRw33wo') ('1DXzgTcLD86LJDxrZerLEBXgQTLpCCRHRC', '1Ce5FYPKPSd9nf5Fv3KtKnzbpdCzJ7bPjg') ...
new ecc.py import hashlib from hashlib import sha256 from binascii import a2b_hex, b2a_hex from ripemd import ripemd160 #https://pypi.org/project/ripemd-hash/ --> pip install ripemd-hash
#see http://www.secg.org/sec2-v2.pdf at page 9
#Gx=55066263022277343669578718895168534326250603453777594175500187360389116729240 #Gy=32670510020758816978083085130507043184471273380659243275938904335757337482424 #p=115792089237316195423570985008687907853269984665640564039457584007908834671663 #n=115792089237316195423570985008687907852837564279074904382605163141518161494337
Gx=0x79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798 Gy=0x483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8 p=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F # Fp n=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141 # E(Fp)
#endomorphism lambd=0x5363ad4cc05c30e0a5261c028812645a122e22ea20816678df02967c1b23bd72 # (lambda^3 = 1 mod n) lambd2=0xac9c52b33fa3cf1f5ad9e3fd77ed9ba4a880b9fc8ec739c2e0cfc810b51283ce #(lambda^2)
beta=0x7ae96a2b657c07106e64479eac3434e99cf0497512f58995c1396c28719501ee # (beta^3 = 1 mod p) beta2=0x851695d49a83f8ef919bb86153cbcb16630fb68aed0a766a3ec693d68e6afa40 # (beta^2) ############################################################################################ ###Field operations
#a must be a number between 1 and p-1; input: a output: 1/a #see http://www.springer.com/?SGWID=4-102-45-110359-0 page 40 alg. 2.20 def inv(a,p): u, v = a%p, p x1, x2 = 1, 0 while u != 1: #q = v//u #r = v-q*u q, r = divmod(v,u) x = x2-q*x1 v = u u = r x2 = x1 x1 = x return x1%p
#inverse of a batch of x #input: batch of x : [x1,x2,x3,x4,x5,x6,x7,x8,...,x2048] #x is known (x of kG) #output: batch of inverse: 1/(x1-x), 1/(x2-x), 1/(x3-x), .... #3M for each inverse def inv_batch(x,batchx,p):
a = (batchx[1]-x) % p partial= [0]*2049 partial[1]=a for i in range(2,2049): a = (a*(batchx[i]-x)) % p #2047M partial[i]=a
inverse=inv(partial[2048],p) # 1I batch_inverse=[0]*2049 for i in range(2048,1,-1): batch_inverse[i-1]=(partial[i-1]*inverse) % p #2047M inverse=(inverse*(batchx[i]-x)) %p #2047M
batch_inverse[0]=inverse
return batch_inverse ############################################################################################ ##Group operations
# https://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication#Point_addition #'double' addition #add 2 points P and Q --> P+Q #add 2 points P and Q --> P-Q # #(X1,Y1) + (X2,Y2) --> (X3,Y3) (the inverse of (x2-x1) must be known) #(X1,Y1) + (X2,-Y2) --> (X4,Y4) (the inverse of (x2-x1) must be known) # #input: (X1,Y1), (X2,Y2), the inverse of (X2 - X1): 5 scalars #output: (X3,Y3), (X4,Y4) : 4 scalars #4M + 2S
def double_add_P_Q_inv(x1, y1, x2, y2, invx2x1):
dy = (y2-y1) #% p a = dy*invx2x1 % p #1M a2 = a**2 #1S x3 = (a2 - x1 -x2) % p y3 = (a*(x1-x3) -y1) % p #1M
dy = p-(y2+y1) #% p # only the sign of y2 changes, "y of -Q = -y of +Q" a = dy*invx2x1 % p #1M x2 and then the inverse invx2x1 are the same a2 = a**2 #1S x4 = (a2 - x1 -x2) % p y4 = (a*(x1-x4) -y1) % p #1M return x3, y3, x4, y4
#https://en.wikibooks.org/wiki/Cryptography/Prime_Curve/Jacobian_Coordinates #jacobian coordinates def add_j_j(jax, jay, jaz, jbx, jby, jbz): u1 = (jax*jbz**2) % p u2 = (jbx*jaz**2) % p s1 = (jay*jbz**3) % p s2 = (jby*jaz**3) % p h = (u2-u1) % p r = (s2-s1) % p jcx = (r**2 -h**3-2*u1*h**2) % p jcy = (r*(u1*h**2-jcx)-s1*h**3) % p jcz = (h*jaz*jbz) % p return jcx, jcy, jcz
def double_j_j(jx, jy, jz): s = (4*jx*jy**2) % p m = (3*jx**2) % p jx2 = (m**2 - 2*s) % p jy2 = (m*(s-jx2) - 8*jy**4) % p jz2 = (2*jy*jz) % p return jx2, jy2, jz2
def mul(k,jx,jy,jz):
jkx, jky, jkz = 0, 0, 1 if (k%2 == 1): jkx, jky, jkz = jx, jy, jz #only if d is odd q=k//2 while q>0: jx, jy, jz = double_j_j(jx,jy,jz) a=q%2 if (a > jkx): jkx, jky, jkz = jx, jy, jz #only if d is even elif (a): jkx, jky, jkz = add_j_j(jx, jy, jz, jkx, jky, jkz) q=q//2 return jkx, jky, jkz
############################################################################################ #Coordinates
#from jacobian to affine def jac_to_aff(jax, jay, jaz, invjaz):
ax, ay = (jax*invjaz**2) % p, (jay*invjaz**3) % p return ax, ay ############################################################################################ # 58 character alphabet used __b58chars = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz' __b58base = len(__b58chars)
def b58encode(v): #encode v, which is a string of bytes, to base58. long_value = 0 if bytes == str: # python2 for (i, c) in enumerate(v[::-1]): long_value += ord(c) << (8*i) # 2x speedup vs. exponentiation else: # python3 for (i, c) in enumerate(v[::-1]): long_value += ord(chr(c)) << (8*i) # 2x speedup vs. exponentiation result = '' #long_value = v while long_value >= __b58base: div, mod = divmod(long_value, __b58base) result = __b58chars[mod] + result long_value = div result = __b58chars[long_value] + result #return result
# Bitcoin does a little leading-zero-compression: # leading 0-bytes in the input become leading-1s nPad = 0 #if bytes == str: # python2 # for c in v: # if c == '\0': nPad += 1 # else: break
#else: # python3 for c in v: if chr(c) == '\0': nPad += 1 else: break return (__b58chars[0]*nPad) + result
def b58decode(v, length): """ decode v into a string of len bytes.""" print(type(v)) long_value = 0 for (i, c) in enumerate(v[::-1]): long_value += __b58chars.find(c) * (__b58base**i)
div, mod = divmod(long_value, 256) result = struct.pack("B", mod) #converte numero intero tra 0 e 255 in 1 byte https://docs.python.org/2/library/struct.html long_value = div
while long_value >= 256: div, mod = divmod(long_value, 256) result = struct.pack("B", mod) + result #stringa di byte long_value = div result = struct.pack("B", long_value) + result nPad = 0 for c in v: if c == __b58chars[0]: nPad += 1 else: break result = struct.pack("B", 0)*nPad + result
if length is not None and len(result) != length: return None print(type(result)) return result def pub_to_addr(couple): #d = int(d1,16) #Px, Py = mul2G(d) # P = d*G Px=couple[0] Py=couple[1] Px2=couple[2] Py2=couple[3]
#if (compressed == 1): if (Py % 2) == 0: P_string = '02' + hex(Px)[2:].rstrip('L').rjust(64,'0') #formato compresso - caso y pari else: P_string = '03' + hex(Px)[2:].rstrip('L').rjust(64,'0') #formato compresso - caso y pari #else: #uncompressed # P_string = '04' + hex(Px)[2:].rstrip('L').rjust(64,'0') + hex(Py)[2:].rstrip('L').rjust(64,'0')
h = sha256(a2b_hex(P_string)) #sha256 h1=ripemd160.new() #ripmed160 h1.update(h.digest())
a = b'\x00' + h1.digest() #aggiunge byte 00 all'inizio h2 = sha256(sha256(a).digest()).digest() #doppio sha256 per controllo
addr = a + h2[:4]
address1 = b58encode(addr) #ultimo passaggio: codifica in base 58 if (Py2 % 2) == 0: P_string = '02' + hex(Px2)[2:].rstrip('L').rjust(64,'0') #formato compresso - caso y pari else: P_string = '03' + hex(Px2)[2:].rstrip('L').rjust(64,'0') #formato compresso - caso y pari #else: #uncompressed # P_string = '04' + hex(Px)[2:].rstrip('L').rjust(64,'0') + hex(Py)[2:].rstrip('L').rjust(64,'0')
h = sha256(a2b_hex(P_string)) #sha256 h1=ripemd160.new() #ripmed160 h1.update(h.digest())
a = b'\x00' + h1.digest() #aggiunge byte 00 all'inizio h2 = sha256(sha256(a).digest()).digest() #doppio sha256 per controllo
addr = a + h2[:4]
address2 = b58encode(addr) #ultimo passaggio: codifica in base 58
return address1, address2 #print (address)
new gen_batches.py #!/usr/bin/env python
#this script computes 3x1334 batchs of 4097 points (1 single point + 2048 couples of points -> (kG+mG, kG-mg)) #16,4 millions of points #3 : 1 batch + 2 endomorphism
from ecc import *
######################################################################## #in this section the script pre-computes 1G, 2G, 3G, ..., 2048G mG = [(0,0)]*2049 mGx = [0]*2049 mGy = [0]*2049
mGx[1]=Gx mGy[1]=Gy mG[1]=(Gx,Gy) #list of multiples of G, #you have to precompute and store these multiples somewhere
mGx[2]=0xC6047F9441ED7D6D3045406E95C07CD85C778E4B8CEF3CA7ABAC09B95C709EE5 mGy[2]=0x1AE168FEA63DC339A3C58419466CEAEEF7F632653266D0E1236431A950CFE52A mG[2]=(mGx[2],mGy[2])
for i in range(3,2049): jx,jy,jz=add_j_j(Gx, Gy, 1, mGx[i-1], mGy[i-1], 1) #this is not a efficient way, but it is not important jzinv = inv(jz,p) (x,y) = jac_to_aff(jx, jy, jz, jzinv) mG[i] = (x,y) mGx[i] = x mGy[i] = y
########################################################################### lowest_key=2**55+789079076 #put here the lowest key you want to generate number_of_batches=1334 #put here how many consecutive batches you want to generate number_of_keys=4097*3*number_of_batches #this is the number of consecutive keys you are going to generate #this number is always a multiple of 3*4097=12291, because 12291 is the minimum size #remember: you generate always 3 batches at once, 1 in (1,2^160) and 2 out of this range #their name are: batch (the main), batch2, batch3 (that are derived from the main with endomorphism)
id_batch=lowest_key+2048 #id_batch is now the private key of the middle point of the first batch that you want compute distance_between_successive_batches=4097 #this is the distance from the middle point of the first batch and the middle point of the successive batch in (1,2^160) #4097 means that I'm generating only consecutive batches in (1,2^160), this is the default
start=id_batch #the first key to generate of the the first batch is the lowest key + 2048 --> the middle point of the first batch stop=id_batch + number_of_batches*distance_between_successive_batches #the key of the middle point of the last batch
#k is always the id of the current main batch and the key of the middle point of the current main batch for k in range(start,stop,distance_between_successive_batches):
jkx,jky,jkz = mul(k,Gx,Gy,1) #here we use the slow function mul to generate the middle point invjkz = inv(jkz,p) (kx,ky) = jac_to_aff(jkx, jky, jkz, invjkz) kminverse = [0]*2048 kminverse = inv_batch(kx,mGx,p) #you need to compute 2048 inverse, 1 for each couple #to perform this task you need only the x coordinates of the multiples of G and the x coordinate of kG: kx kminverse = [invjkz]+kminverse #the element number 0 of the list is the inverse invjkz that you have computed to get kG=(kx,ky) #the element number 1 of the list is the inverse necessary to perform kG + 1G (and kG-1G) ---> 1/(x_of_1G - kx) #the element number 2 of the list is the inverse necessary to perform kG + 2G (and kG-2G) ---> 1/(x_of_2G - kx) #... #the element number m of the list is the inverse necessary to perform kG + mG (and kG-mG) ---> 1/(x_of_mG - kx) #then the name: "kminverse", the inverse to get the generic kG+mG #... #the element number 2048 of the list is the inverse necessary to perform kG+2048G and kG-2048G ---> 1/(x_of_2048G - kx) kxl = [kx] * 2049 #this is a list of kx, all elements are the same, only to use the function map of python kyl = [ky] * 2049 #this is a list of ky, all elements are the same, only to use the function map of python batch=[(0,0,0,0)]*2048 #the element number 0 of the list will be the middle point kG=(kx,ky), you can see it as the couple ((k+0)G, (k-0)G) #the element number 1 will be the couple (k+1)G, (k-1)G #the element number 2 will be the couple (k+2)G, (k-2)G #... #the element number 2048 will be the couple (k+2048)G, (k-2048)G #each element of this list is a couple of points -> 4 scalars: x_(k+m)G,y_(k+m)G,x_(k-m)G,y_(k-m)G, #but the first element is only a fake couple batch = list(map(double_add_P_Q_inv,kxl[1:],kyl[1:],mGx[1:],mGy[1:],kminverse[1:])) #the function double_add_P_Q_inv computes kG +/-mG, it uses the list of the inverse and return a list of couple of points #each element of this list is a couple of points, 4 scalars, x_(k+m)G,y_(k+m)G,x_(k-m)G,y_(k-m)G #this list has 2049 elements batch = [(kx,ky,kx,ky)]+batch #the element number 0 of the list is now the middle point kG=(kx,ky), you can see it as the fake couple (k+0)G, (k-0)G #batch_tot = batch_tot + batch addresses = list(map(pub_to_addr, batch)) print (*addresses, sep="\n") batch2=[(0,0,0,0)] * 2049 #this is the batch2 with lambda*kG, ( lambda*(k+1)G, lambda*(k-1)G ), ( lambda*(k+2)G, lambda*(k-2)G ), ...., # (lambda*(k+2048)G, lambda*(k-2048)G) #this list actually has only the x-cooordinates of the points, because the y-coordinates are the same of main batch #each element of this list is a couple of x -> 2 scalars: x_lambda*(k+m)G, x_lambda*(k-m)G #this list has 2049 elements
batch3=[(0,0,0,0)] * 2049 #this is the batch3 with lambda^2*kG, ( lambda^2*(k+1)G, lambda^2*(k-1)G ), ( lambda^2*(k+2)G, lambda^2*(k-2)G ), ...., # (lambda^2*(k+2048)G, lambda^2*(k-2048)G) #this list actually has only the x-cooordinates of the points, because the y-coordinates are the same of main batch #each element of this list is a couple of x -> 2 scalars: x_lambda^2*(k+m)G, x_lambda^2*(k-m)G #this list has 2049 elements for i in range(0,2049): #endomorphism batch2[i] = ((batch[i][0]*beta % p), batch[i][1], (batch[i][2]*beta % p), batch[i][3]) #only x coordinates, the first and the third scalar of each element of the main batch #private key: lambda*k mod n, coordinate x: beta*x mod p batch3[i] = ((batch[i][0]*beta2 % p), batch[i][1], (batch[i][2]*beta2 % p), batch[i][3]) #only x coordinates, the first and the third scalar of each element of the main batch #private key: lambda^2*k mod n, coordinate x: beta^2*x mod p
addresses2 = list(map(pub_to_addr, batch2)) print (*addresses2, sep="\n") addresses3 = list(map(pub_to_addr, batch3)) print (*addresses3, sep="\n")
#k in this case is the id of the last batch (or if you prefer, k is the private key of the middle point of the last batch) """ m=465 #a random number between 0 and 2048, you can pick up a couple of points in the last main batch (or in batch2 / batch3) (kmx1,kmy1,kmx2,kmy2)=batch[m] (kmx1,kmx2)=batch3[m] #in this case the points chosen are in batch3 #the y are from main batch, because batch3 (and batch2) have only x coordinates
print ('kG+mG') print (hex(lambd2*(k+m)%n)) #private key print (hex(kmx1), hex(kmy1)) #public key (first and second coordinate) print ('*******')
print ('kG-mG') print (hex(lambd2*(k-m)%n)) #private key print (hex(kmx2), hex(kmy2)) #public key (first and second coordinate) print ('*******') """
""" print (' ') a=number_of_keys//1000000 print ('You have just generated ' + str(a) + ' Millions of keys!!!') """
|
|
|
|
citb0in (OP)
|
|
December 30, 2022, 03:40:50 PM |
|
I don't have a fast 'pub_to_addr' function
from 12.3 s to 3m and 9s to generate 16.4 M of addresses, 16.4 M / 190 s = about 86k addresses/s.
12 seconds to generate 16.4 M public keys, almost 3m to compute 16.4 M addresses.
I had already thought that halfway. Now with your updated version which also generates addresses it's much slower, the result is clearly different than just calculating the pubkey as you showed in your initial version. I get about 70,000 addresses/sec with your updated version: $ time python3 new_gen_batches.py > addresses.out
real 0m13,378s user 0m13,347s sys 0m0,012s
I realized that you output tuples of addresses using list(map). I suggest to adjust your program so it outputs a single address on each line so we compares apples by apples. I am not sure if the list output affects performance in some way, that's why I am pointing to it. The example I originally showed using iceland2k14/libsecp256k1 was about half the speed. With it I can generate 1 million addresses using all cores (in my case 16 cores) and write to the file in 4.8 seconds, this is a rate of about 208,300 keys/sec. I have modified my initial program so that you can now also configure the cores under which the program is executed. So one can select specifically "1" as value, so that we also compare apples with apples. #!/usr/bin/env python3 # 2022/Dec/30, citb0in import concurrent.futures import os import numpy as np import fastecdsa.keys as fkeys import fastecdsa.curve as fcurve import secp256k1 as ice
# how many cores to use num_cores = 1 #num_cores = os.cpu_count()
# Set the number of addresses to generate num_addresses = 1000000
# Define a worker function that generates a batch of addresses and returns them def worker(start, end): # Generate a NumPy array of random private keys using fastecdsa private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(start, end)])
# Use secp256k1 to convert the private keys to addresses thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])
return thread_addresses
# Use a ProcessPoolExecutor to generate the addresses in parallel with concurrent.futures.ProcessPoolExecutor() as executor: # Divide the addresses evenly among the available CPU cores addresses_per_core = num_addresses // num_cores
# Submit a task for each batch of addresses to the executor tasks = [] for i in range(num_cores): start = i * addresses_per_core end = (i+1) * addresses_per_core tasks.append(executor.submit(worker, start, end))
# Wait for the tasks to complete and retrieve the results addresses = [] for task in concurrent.futures.as_completed(tasks): addresses.extend(task.result())
# Write the addresses to a file np.savetxt('addresses_1M_with_singleCore.txt', addresses, fmt='%s')
Running this under one single core, now I get 26.5 sec for 1 million generated addresses. This is a rate of 37,735 keys/second. I am curious to see your program using all available cores.
|
_ _ _ __ _ _ _ __ |_) | / \ / |/ (_ / \ | \ / |_ |_) (_ |_) |_ \_/ \_ |\ __) \_/ |_ \/ |_ | \ __) --> citb0in Solo-Mining Group <--- low stake of only 0.001 BTC. We regularly rent about 5 PH/s hash power and direct it to SoloCK pool. Wanna know more? Read through the link and JOIN NOW
|
|
|
arulbero
Legendary
Offline
Activity: 1938
Merit: 2080
|
|
December 30, 2022, 04:40:39 PM |
|
I realized that you output tuples of addresses using list(map). I suggest to adjust your program so it outputs a single address on each line so we compares apples by apples. I am not sure if the list output affects performance in some way, that's why I am pointing to it.
The example I originally showed using iceland2k14/libsecp256k1 was about half the speed. With it I can generate 1 million addresses using all cores (in my case 16 cores) and write to the file in 4.8 seconds, this is a rate of about 208,300 keys/sec. I have modified my initial program so that you can now also configure the cores under which the program is executed. So one can select specifically "1" as value, so that we also compare apples with apples.
To generate 1 address for line you need to replace print (*addresses,sep="\n") print (*addresses2,sep="\n") print (*addresses3,sep="\n") with for i in addresses: print (*i, sep="\n") for i in addresses2: print (*i, sep="\n") for i in addresses3: print (*i, sep="\n") The results are the same (slightly faster): time python3 gen_batches.py > addresses.out
real 3m5,736s user 3m5,268s sys 0m0,404s
wc addresses.out 16400196 16400196 573355137
less addresses.out
1K9oVg45UaApB1SmkxKFBZPCKsj2XiKCYw 1K9oVg45UaApB1SmkxKFBZPCKsj2XiKCYw 13hXNkmedXJ5UXt4RY5rLGAZGBgeSHUNT7 17G6zU4yVBM8WJ7TvJvkVkZr8qtJheAYVy 1LZtUG1S8V7yfwthDfJBy1Yg7LWxpZKmQX 1PDBxaiy3bp8woFbrErNAPoMF8W442B1Nb 1CAapBviMeMp91UjqgTaUfr1tGc5Lsvhgx 1Lmrc69RFRbwGUc3UTRGxumAApEDsJKUrf 1JNGqSWZdJFptRrLaCtmUJ6Hq1necKuTCd 18aQxp32qDDHEFWgVjexS1BG9MdFbQAc6N ....
|
|
|
|
citb0in (OP)
|
|
December 30, 2022, 04:50:52 PM Last edit: December 30, 2022, 05:49:03 PM by citb0in |
|
I would like to try another approach and using iceland's secp256k1 library for generating the address from the pubkey. Can you tell me in which variable I can find the public key in your program? For example. When I print the second element of the list "batch" I get (26088438602568126703237513271074049975900194846956880971674554085576597277452, 66899334846368891744016007453456293662065123677302105461153252024609684696055, 54338240394213138931889225770364196918256374885639854771044914499200701239376, 91288621757068410199936994202114003372515557459679014336095768812037691308008) The first element is the xkey (pubkey) of that key: private key (dec) : 36028797808045093 private key (hex) : 8000002f086c25 public pair x : 26088438602568126703237513271074049975900194846956880971674554085576597277452 the second element 66899334846368891744016007453456293662065123677302105461153252024609684696055 represents the public pair y what is third and fourth element representing? 54338240394213138931889225770364196918256374885639854771044914499200701239376 91288621757068410199936994202114003372515557459679014336095768812037691308008 As I recognize, they seem not to be the public keys of the consequent keys 3602879780804509 4, 3602879780804509 5, 3602879780804509 6
|
_ _ _ __ _ _ _ __ |_) | / \ / |/ (_ / \ | \ / |_ |_) (_ |_) |_ \_/ \_ |\ __) \_/ |_ \/ |_ | \ __) --> citb0in Solo-Mining Group <--- low stake of only 0.001 BTC. We regularly rent about 5 PH/s hash power and direct it to SoloCK pool. Wanna know more? Read through the link and JOIN NOW
|
|
|
arulbero
Legendary
Offline
Activity: 1938
Merit: 2080
|
|
December 30, 2022, 06:05:25 PM Last edit: December 30, 2022, 10:08:28 PM by arulbero |
|
what is third and fourth element representing?
54338240394213138931889225770364196918256374885639854771044914499200701239376 91288621757068410199936994202114003372515557459679014336095768812037691308008
As I recognize, they seem not to be the public keys of the consequent keys 36028797808045094, 36028797808045095, 36028797808045096
The keys are generated in this way: P (P + 1*G , P - 1*G) -> 4 coordinates, x and y * 2 points (P + 2*G , P - 2*G) (P + 3*G , P - 3*G) (P + 4*G , P - 4*G) (P + 5*G , P - 5*G) (P + 6*G , P - 6*G) ..... ..... (P + 2048*G, P - 2048*G) each batch has all the consecutive keys from P - 2048*G to P+2048*G (4097 public keys), but they are not in order batch=[(0,0,0,0)]*2048 #the element number 0 of the list will be the middle point kG=(kx,ky), you can see it as the couple ((k+0)G, (k-0)G) #the element number 1 will be the couple (k+1)G, (k-1)G #the element number 2 will be the couple (k+2)G, (k-2)G #... #the element number 2048 will be the couple (k+2048)G, (k-2048)G #each element of this list is a couple of points -> 4 scalars: x_(k+m)G,y_(k+m)G,x_(k-m)G,y_(k-m)G, #but the first element is only a fake couple batch = [(kx,ky,kx,ky)]+batch #the element number 0 of the list is now the middle point kG=(kx,ky), you can see it as the fake couple (k+0)G, (k-0)G
|
|
|
|
NotATether
Legendary
Offline
Activity: 1778
Merit: 7372
Top Crypto Casino
|
|
December 30, 2022, 06:21:01 PM |
|
@arulbero
Impressive how you're getting those speeds on Python, considering that multithreading is complicated to do efficiently in that language, and GPU/FPGA support is lackluster to nonexistent.
|
|
|
|
arulbero
Legendary
Offline
Activity: 1938
Merit: 2080
|
|
December 30, 2022, 06:24:12 PM |
|
@arulbero
Impressive how you're getting those speeds on Python, considering that multithreading is complicated to do efficiently in that language, and GPU/FPGA support is lackluster to nonexistent.
It is not programming related, computing P+G is much more easier than compute P=3512878756844563653653674365654352352*G
|
|
|
|
arulbero
Legendary
Offline
Activity: 1938
Merit: 2080
|
|
December 30, 2022, 06:34:55 PM Last edit: December 30, 2022, 07:48:52 PM by arulbero |
|
The example I originally showed using iceland2k14/libsecp256k1 was about half the speed. With it I can generate 1 million addresses using all cores (in my case 16 cores) and write to the file in 4.8 seconds, this is a rate of about 208,300 keys/sec. I have modified my initial program so that you can now also configure the cores under which the program is executed. So one can select specifically "1" as value, so that we also compare apples with apples. #!/usr/bin/env python3 # 2022/Dec/30, citb0in import concurrent.futures import os import numpy as np import fastecdsa.keys as fkeys import fastecdsa.curve as fcurve import secp256k1 as ice
# how many cores to use num_cores = 1 #num_cores = os.cpu_count()
# Set the number of addresses to generate num_addresses = 1000000
# Define a worker function that generates a batch of addresses and returns them def worker(start, end): # Generate a NumPy array of random private keys using fastecdsa private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(start, end)])
# Use secp256k1 to convert the private keys to addresses thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])
return thread_addresses
# Use a ProcessPoolExecutor to generate the addresses in parallel with concurrent.futures.ProcessPoolExecutor() as executor: # Divide the addresses evenly among the available CPU cores addresses_per_core = num_addresses // num_cores
# Submit a task for each batch of addresses to the executor tasks = [] for i in range(num_cores): start = i * addresses_per_core end = (i+1) * addresses_per_core tasks.append(executor.submit(worker, start, end))
# Wait for the tasks to complete and retrieve the results addresses = [] for task in concurrent.futures.as_completed(tasks): addresses.extend(task.result())
# Write the addresses to a file np.savetxt('addresses_1M_with_singleCore.txt', addresses, fmt='%s')
Running this under one single core, now I get 26.5 sec for 1 million generated addresses. This is a rate of 37,735 keys/second. I am curious to see your program using all available cores. Ok, finally I managed to use your multithread program with my script. Results: 30s to generate 16.4 M of addresses, about 540k addresses / s, and store them in a file. time python3 gen_batches.py > addresses.out
real 0m29,810s user 8m22,402s sys 0m6,273s
wc addresses.out
16474040 16474040 575934502
less addresses.out
18GW1MFwpYZgmRSy8R4wgjUtBnscJtXeUZ 1FXUP9HViWxrtgAvqbpgVrd69yarx6Njdp 1MuxV99fRFnhoNopdxV7kkQB4u14chCtqA 17AzXPYExdE8nxkiV6zJGLuJq32WkAZ134 12DUcGmFMkmXV8tYihM2jLQ36pc4zLBDP6 1AqwPisfi9hDHoF95SEoSEVN89fX8NSGjo 17qRbvdShkoCDw3XJGhEqxaFkQHaK36Yk1 187x8hPiVpRLGDRsvy4nreeMNW7aW9zuFv 1EGSDnc2j7p3qsPJsPakKgavCTz4szYU3w 1F4KjKLNqQSbmw9xsJurRVQ3RciL7Kox7g 1AKaHBfhMEedrtnjGgmFFsdG3SzyN3hpcZ 1GaWuuPPM747Na3CTwaR3DcewoBHc9VVSv 12VeWT4jZCep17HJswMtfCEqh14oDcRr3L 16MBC6WLWMvNeexYYX853Ucf2ZsPoYLRoS 1JVrhzEKTcgzGtSc9u9LpFjksFZiQveFT6 17kz2Wpry3vnLM6RMjB3KXoG6kRxjeDQcj 1TBh3q7vQoTmgX6pY75ADHDSkcaWkmCvd 13QVDFeeWkErhTAFqX3SP2YzrzWt3Bh7iu 1GPyzJC8Ey8PasHn2Qa8aFanjrXuE1aNK5 17QE8iNrtLnju5pinG1vdoCVZpn8zbrM7S 17nKRy1XdKdroWXPsnsd4RvxWKVySZxath 1KWaY8K824oEFk5q4tZioCZ86viLJuKNgS 141m3BUZNjMxbs4a9Cb6SbCmEzx4nPJKCL 12ueqsLkT1XCVheGM34uQWW2fpnhVvtwmE 14eH2SFvCq3fxmrK9NgKTxfMCeKixNwQ6c 1G9CaAqoKJSfTRyGcvPhVFewLRXKWzkwh7 15RthH7DBuDSmYcFCYanUhBsrdEYzBahNw 13BTmQ2gCaZ8x4T7Sba9k2MRBSDRrZLy6u 1NpbKVze4pPVCghjaa2CV7LfRkotCNu76Y 1HfQ42v7DEDHEuQMhH5caPzX2RgtSmoYWw ...
Only 1.3 s to generate 16.4 M public keys (without writing) 30s to generate 16.4 M addresses and store them in a file (24s if each thread writes on the file without waiting for the others)
|
|
|
|
|