vladkens (OP)
Newbie
Offline
Activity: 3
Merit: 8
|
Hi everyone, I'd like to share a tool I've been working on: ecloop – a CPU-optimized tool for searching Bitcoin public keys (hash160) on the secp256k1 curve. It combines approaches from projects like keyhunt (range/random range scanning with optional endomorphism) and brainflayer (dictionary-based search), with a focus on clean C code and SIMD acceleration. It supports both compressed and uncompressed public key formats, Bloom filter for large-scale hash160 scanning, works on MacOS / Linux (Windows via WSL). This is my first time posting it here. Main features: - Fixed 256-bit modular arithmetic and ECC implementation in single C file `lib/ecc.c` - Group inversion for points addition / precomputed table for point multiplication - SIMD acceleration for RIPEMD-160 (AVX2 / NEON) ( https://vladkens.cc/rmd160-simd/) - Accelerated SHA-256 with SHA extension (both ARM and x86) - Search by range, random range, private key list or words list - Bloom filter support for efficient filtering of large hash160 sets Benchmarks show 3.5x+ speed over keyhunt on x86 CPU. Repo: https://github.com/vladkens/ecloop
|
|
|
|
AlexanderCurl
Jr. Member
Offline
Activity: 42
Merit: 193
|
 |
May 28, 2025, 03:46:33 PM |
|
Nice piece of code. Added to my github collection. But asking buy me a coffee for stuff (like batch addition and batch inversion(Montgomery trick)) that come from renowned cryptographers and mathematicians research is not quite appropriate. BitCrack, JLP had that implemented for a long time now.
|
|
|
|
Akito S. M. Hosana
Jr. Member
Offline
Activity: 364
Merit: 8
|
 |
May 29, 2025, 02:34:01 PM Last edit: May 29, 2025, 04:05:12 PM by Akito S. M. Hosana |
|
I am using Makefile flags from @nomachine CC = cc CC_FLAGS ?= -m64 -Ofast -Wall -Wextra -mtune=native \ -funroll-loops -ftree-vectorize -fstrict-aliasing \ -fno-semantic-interposition -fvect-cost-model=unlimited \ -fno-trapping-math -fipa-ra -flto -fassociative-math \ -mavx2 -mbmi2 -madx -fwrapv \ -fomit-frame-pointer -fpredictive-commoning -fgcse-sm -fgcse-las \ -fmodulo-sched -fmodulo-sched-allow-regmoves -funsafe-math-optimizations
# Source files ifeq ($(shell uname -m),x86_64) CC_FLAGS += -march=native -pthread -lpthread endif
default: build
clean: @rm -rf ecloop bench main a.out *.profraw *.profdata
build: clean $(CC) $(CC_FLAGS) main.c -o ecloop
# ./ecloop rnd -f 71.txt -t 12 -o ./BINGO.txt -r 400000000000000000:7fffffffffffffffff -endo threads: 12 ~ addr33: 1 ~ addr65: 0 ~ endo: 1 | filter: list (1) ---------------------------------------- [RANDOM MODE] offs: 2 ~ bits: 32 0000000000000000 0000000000000000 0000000000000042 8ddff88400000000 0000000000000000 0000000000000000 0000000000000042 8ddff887fffffffc 27.91s ~ 64.92 Mkeys/s ~ 0 / 1,811,939,328 ('p' – pause) i have about 65 Mkeys/s - This is madness. 
|
|
|
|
nomachine
|
 |
May 29, 2025, 02:44:48 PM |
|
I am using Makefile flags from @nomachine
You're welcome 
|
BTC: bc1qdwnxr7s08xwelpjy3cc52rrxg63xsmagv50fa8
|
|
|
AlexanderCurl
Jr. Member
Offline
Activity: 42
Merit: 193
|
 |
May 30, 2025, 02:31:23 PM Last edit: May 30, 2025, 02:58:32 PM by AlexanderCurl |
|
I am using Makefile flags from @nomachine CC = cc CC_FLAGS ?= -m64 -Ofast -Wall -Wextra -mtune=native \ -funroll-loops -ftree-vectorize -fstrict-aliasing \ -fno-semantic-interposition -fvect-cost-model=unlimited \ -fno-trapping-math -fipa-ra -flto -fassociative-math \ -mavx2 -mbmi2 -madx -fwrapv \ -fomit-frame-pointer -fpredictive-commoning -fgcse-sm -fgcse-las \ -fmodulo-sched -fmodulo-sched-allow-regmoves -funsafe-math-optimizations
# Source files ifeq ($(shell uname -m),x86_64) CC_FLAGS += -march=native -pthread -lpthread endif
default: build
clean: @rm -rf ecloop bench main a.out *.profraw *.profdata
build: clean $(CC) $(CC_FLAGS) main.c -o ecloop
# ./ecloop rnd -f 71.txt -t 12 -o ./BINGO.txt -r 400000000000000000:7fffffffffffffffff -endo threads: 12 ~ addr33: 1 ~ addr65: 0 ~ endo: 1 | filter: list (1) ---------------------------------------- [RANDOM MODE] offs: 2 ~ bits: 32 0000000000000000 0000000000000000 0000000000000042 8ddff88400000000 0000000000000000 0000000000000000 0000000000000042 8ddff887fffffffc 27.91s ~ 64.92 Mkeys/s ~ 0 / 1,811,939,328 ('p' – pause) i have about 65 Mkeys/s - This is madness.  No wonder. The basic principle was implemented and totally described by JeanLucPons like five years ago in his BSGS. https://github.com/JeanLucPons/BSGS/blob/master/BSGS.cppvoid BSGS::FillBabySteps(TH_PARAM *ph) ; Very simple. If you move from the center of the group using batch addition and batch inversion along with batch subtraction(negation map, symmetry) you need only one batch inverse for addition and subtraction batches for each iteration. That way you scan the range the fastest way possible.
|
|
|
|
vladkens (OP)
Newbie
Offline
Activity: 3
Merit: 8
|
 |
May 31, 2025, 01:08:11 AM |
|
Nice piece of code. Added to my github collection. But asking buy me a coffee for stuff (like batch addition and batch inversion(Montgomery trick)) that come from renowned cryptographers and mathematicians research is not quite appropriate. BitCrack, JLP had that implemented for a long time now.
Thanks for checking out my code and adding it to your collection! I appreciate the feedback, but the comment about the "buy me a coffee" link seems a bit out of place — I include that link in most of my public repos, not because I believe these techniques are mine or somehow original. I know this work builds on well-known ideas from papers, wikis, Bitcoin Core, and other open-source projects. Honestly, JLP's code is quite hard for me to read, so I'm not exactly sure what's implemented there. I wrote ecloop from scratch, originally as a brain wallet checker, and figured out the necessary algorithms as I went. Some concepts, like group inversion, come from Wikipedia and similar sources. I recently noticed JLP's trick with negative points and updated my code to use it. I also borrowed the multiplication (mod N) idea from JLP, since I didn't have enough time at the moment to search for relevant papers. So no, I'm not claiming the ideas are entirely new — just that this is a clean, fast, and (hopefully) simpler implementation that runs well on both x86 and ARM. Maybe it will help others push things forward. --- Also, I forgot to mention in the original post — if anyone knows of other mathematical ideas to improve CPU performance, I'd love to hear about them 
|
|
|
|
vladkens (OP)
Newbie
Offline
Activity: 3
Merit: 8
|
 |
May 31, 2025, 01:21:22 AM |
|
I am using Makefile flags from @nomachine CC = cc CC_FLAGS ?= -m64 -Ofast -Wall -Wextra -mtune=native \ -funroll-loops -ftree-vectorize -fstrict-aliasing \ -fno-semantic-interposition -fvect-cost-model=unlimited \ -fno-trapping-math -fipa-ra -flto -fassociative-math \ -mavx2 -mbmi2 -madx -fwrapv \ -fomit-frame-pointer -fpredictive-commoning -fgcse-sm -fgcse-las \ -fmodulo-sched -fmodulo-sched-allow-regmoves -funsafe-math-optimizations
# Source files ifeq ($(shell uname -m),x86_64) CC_FLAGS += -march=native -pthread -lpthread endif
default: build
clean: @rm -rf ecloop bench main a.out *.profraw *.profdata
build: clean $(CC) $(CC_FLAGS) main.c -o ecloop
# ./ecloop rnd -f 71.txt -t 12 -o ./BINGO.txt -r 400000000000000000:7fffffffffffffffff -endo threads: 12 ~ addr33: 1 ~ addr65: 0 ~ endo: 1 | filter: list (1) ---------------------------------------- [RANDOM MODE] offs: 2 ~ bits: 32 0000000000000000 0000000000000000 0000000000000042 8ddff88400000000 0000000000000000 0000000000000000 0000000000000042 8ddff887fffffffc 27.91s ~ 64.92 Mkeys/s ~ 0 / 1,811,939,328 ('p' – pause) i have about 65 Mkeys/s - This is madness.  That's cool! What CPU are you using? I don't have a good x86 processor at the moment to run proper benchmarks. Also, which compiler did you use? On my side, Clang on Linux gives about 10% better performance compared to GCC, but I haven't figured out the reason for the difference yet.
|
|
|
|
Akito S. M. Hosana
Jr. Member
Offline
Activity: 364
Merit: 8
|
 |
May 31, 2025, 08:58:59 AM |
|
That's cool! What CPU are you using? I don't have a good x86 processor at the moment to run proper benchmarks.
Also, which compiler did you use? On my side, Clang on Linux gives about 10% better performance compared to GCC, but I haven't figured out the reason for the difference yet.
I have AMD Ryzen 5 3600 + GCC C++11 - Debian 12 What about the AOCC compiler that was @nomachine mentioned earlier? https://www.amd.com/en/developer/aocc.htmlThis is a specialized Clang for AMD processors. AOCC automatically converts scalar operations into SIMD instructions 
|
|
|
|
nomachine
|
 |
May 31, 2025, 09:04:02 AM |
|
What about the AOCC compiler that was @nomachine mentioned earlier?
It has only one flaw. You can burn the processor if you don't know what you are doing and you have inadequate cooling. 
|
BTC: bc1qdwnxr7s08xwelpjy3cc52rrxg63xsmagv50fa8
|
|
|
analyticnomad
Newbie
Offline
Activity: 48
Merit: 0
|
 |
June 06, 2025, 02:16:39 PM |
|
Hi everyone, I'd like to share a tool I've been working on: ecloop – a CPU-optimized tool for searching Bitcoin public keys (hash160) on the secp256k1 curve. It combines approaches from projects like keyhunt (range/random range scanning with optional endomorphism) and brainflayer (dictionary-based search), with a focus on clean C code and SIMD acceleration. It supports both compressed and uncompressed public key formats, Bloom filter for large-scale hash160 scanning, works on MacOS / Linux (Windows via WSL). This is my first time posting it here. Main features: - Fixed 256-bit modular arithmetic and ECC implementation in single C file `lib/ecc.c` - Group inversion for points addition / precomputed table for point multiplication - SIMD acceleration for RIPEMD-160 (AVX2 / NEON) ( https://vladkens.cc/rmd160-simd/) - Accelerated SHA-256 with SHA extension (both ARM and x86) - Search by range, random range, private key list or words list - Bloom filter support for efficient filtering of large hash160 sets Benchmarks show 3.5x+ speed over keyhunt on x86 CPU. Repo: https://github.com/vladkens/ecloopVery cool! You know how to do this using GPU/CUDA? If you can, and want a $$pecial project dm me.
|
|
|
|
satscollector
Newbie
Offline
Activity: 3
Merit: 0
|
 |
July 13, 2025, 11:33:04 AM |
|
Why does `-endo` show higher speed but takes longer?
For the test below, using `-endo` takes 3x longer for the same range.
```console $ time ./ecloop add -f data/btc-puzzles-hash -r 400000000000000000:40000000000fffffff -o ./found_71.txt -t 4 threads: 4 ~ addr33: 1 ~ addr65: 0 ~ endo: 0 | filter: list (160) range_s: 0000000000000000 0000000000000000 0000000000000040 0000000000000000 range_e: 0000000000000000 0000000000000000 0000000000000040 000000000fffffff ---------------------------------------- 11.25s ~ 23.86 Mkeys/s ~ 0 / 268,435,456 ./ecloop add -f data/btc-puzzles-hash -r 400000000000000000:40000000000ffffff 44.70s user 0.03s system 397% cpu 11.256 total
$ time ./ecloop add -f data/btc-puzzles-hash -r 400000000000000000:40000000000fffffff -o ./found_71.txt -t 4 -endo threads: 4 ~ addr33: 1 ~ addr65: 0 ~ endo: 1 | filter: list (160) range_s: 0000000000000000 0000000000000000 0000000000000040 0000000000000000 range_e: 0000000000000000 0000000000000000 0000000000000040 000000000fffffff ---------------------------------------- 33.01s ~ 48.79 Mkeys/s ~ 0 / 1,610,612,736 ./ecloop add -f data/btc-puzzles-hash -r 400000000000000000:40000000000ffffff 131.16s user 0.05s system 397% cpu 33.018 total ```
|
|
|
|
kTimesG
|
 |
July 13, 2025, 01:32:26 PM |
|
Why does `-endo` show higher speed but takes longer?
For the test below, using `-endo` takes 3x longer for the same range.
11.25s ~ 23.86 Mkeys/s ~ 0 / 268,435,456 33.01s ~ 48.79 Mkeys/s ~ 0 / 1,610,612,736
May be because one sixth (non-endo range keys) of 48.79 is equal to one third of 23.86? So the same key is reached, but at a speed rate of three times slower? Endomorphism is like slowing down a fast car, but having more cars instead. However, only one of the cars is the one you actually want to get to the first Starbucks, while the other 5 cars are wondering around the Milky Way.
|
Off the grid, training pigeons to broadcast signed messages.
|
|
|
satscollector
Newbie
Offline
Activity: 3
Merit: 0
|
 |
July 13, 2025, 02:17:15 PM |
|
May be because one sixth (non-endo range keys) of 48.79 is equal to one third of 23.86? So the same key is reached, but at a speed rate of three times slower?
Endomorphism is like slowing down a fast car, but having more cars instead. However, only one of the cars is the one you actually want to get to the first Starbucks, while the other 5 cars are wondering around the Milky Way.
Ok, makes sense. But this means using `-endo` causes it to search outside the range `-r 400000000000000000:40000000000ffffff` which is explicitly provided. Right?
|
|
|
|
kTimesG
|
 |
July 13, 2025, 03:01:38 PM |
|
May be because one sixth (non-endo range keys) of 48.79 is equal to one third of 23.86? So the same key is reached, but at a speed rate of three times slower?
Endomorphism is like slowing down a fast car, but having more cars instead. However, only one of the cars is the one you actually want to get to the first Starbucks, while the other 5 cars are wondering around the Milky Way.
Ok, makes sense. But this means using `-endo` causes it to search outside the range `-r 400000000000000000:40000000000ffffff` which is explicitly provided. Right? Right, since that's pretty much the definition of endomorphism: fast mapping of a scalar multiple to some other point. The multiple here being a 256-bits constant.
|
Off the grid, training pigeons to broadcast signed messages.
|
|
|
satscollector
Newbie
Offline
Activity: 3
Merit: 0
|
 |
July 14, 2025, 05:32:27 AM Last edit: July 15, 2025, 08:35:36 AM by satscollector |
|
I did some tinkering with Cursor IDE. So, in summary, when `-endo` is used, for each key defined in the range given using `-r`, an additional 5 private keys are generated and checked. These additional 5 keys can be well outside the range. Here is a small python script to figure out what other 5 keys are generated for each private key in the range `-r`: #!/usr/bin/env python3
# Calculate additional private keys checked with -endo option for a single key # Usage: python3 endomorphism_calculator.py <private_key_hex>
import sys
# Secp256k1 order N N = 0xfffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141
# Endomorphism constants (alpha and alpha^2) A1 = 0x5363ad4cc05c30e0a5261c028812645a122e22ea20816678df02967c1b23bd72 # alpha A2 = 0xac9c52b33fa3cf1f5ad9e3fd77ed9ba4a880b9fc8ec739c2e0cfc810b51283ce # alpha^2
def mod_neg(k): """Calculate -k (mod N)""" return (N - k) % N
def mod_mul(a, b): """Calculate a * b (mod N)""" return (a * b) % N
def calculate_endo_keys(base_key): """Calculate all 6 endomorphism variants for a given base key""" keys = [] keys.append(("Original", base_key)) keys.append(("Negated", mod_neg(base_key))) alpha_key = mod_mul(base_key, A1) keys.append(("Alpha ×", alpha_key)) keys.append(("Neg Alpha ×", mod_neg(alpha_key))) alpha2_key = mod_mul(base_key, A2) keys.append(("Alpha² ×", alpha2_key)) keys.append(("Neg Alpha² ×", mod_neg(alpha2_key))) return keys
def main(): if len(sys.argv) != 2: print("Usage: python3 endomorphism_calculator.py <private_key_hex>") print("Example: python3 endomorphism_calculator.py 0x10000") sys.exit(1) try: base_key = int(sys.argv[1], 16) except ValueError: print("Error: Invalid hex value") sys.exit(1)
if base_key <= 0 or base_key >= N: print(f"Error: Private key must be in range [1, 0x{N-1:x}]") sys.exit(1)
keys = calculate_endo_keys(base_key)
print(f"All 6 endomorphism variants for private key 0x{base_key:x}:") print("=" * 80) for i, (desc, key) in enumerate(keys): print(f"{i}: 0x{key:064x} # {desc}")
if __name__ == "__main__": main()
So for the private key 0x400000000000000000, the additional keys those are generated using -endo are: python3 endomorphism_calculator.py 0x400000000000000000 All 6 endomorphism variants for private key 0x400000000000000000: ================================================================================ 0: 0x0000000000000000000000000000000000000000000000400000000000000000 # Original 1: 0xfffffffffffffffffffffffffffffffebaaedce6af489ffbbfd25e8cd0364141 # Negated 2: 0x498700a20499169f0986f9516de685ec9f0ea6a6e6f058c3cd521c4ee2fd5497 # Alpha × 3: 0xb678ff5dfb66e960f67906ae92197a121ba0363fc8584777f280423ded38ecaa # Neg Alpha × 4: 0xb678ff5dfb66e960f67906ae92197a121ba0363fc8584737f280423ded38ecaa # Alpha² × 5: 0x498700a20499169f0986f9516de685ec9f0ea6a6e6f05903cd521c4ee2fd5497 # Neg Alpha² ×
|
|
|
|
|