CUDAHUNT - Three Tools One Binary

Hello Bitcointalk.

https://i.postimg.cc/437zRq29/CUDAHUNT.png
https://i.postimg.cc/sgtM7XKx/CUDA2.png

I created a single-GPU secp256k1 hunting toolkit.

Three tools in one binary brute-force keysearch, Pollard's kangaroo for keys whose public half is already exposed, and BIP39seed-phrase recovery.

One math core underneath all of them, which I trust because I checked it half to death. (tests included)

I should be honest about how this began. I wanted to learn CUDA, and I have never once learned anything by reading about it. So I picked the most unforgiving teacher I could find, which is cryptography on a GPU, where a single mishandled carry bit does not crash it just quietly hands you the wrong answer forever. What grew out of that is this toolkit, and what follows is less a manual than a field notebook from someone who got a little obsessed.

Underneath all three modes sits one stubborn fact. secp256k1, the curve every Bitcoin key lives on, makes "multiply the generator by k" trivial and "given k times G, recover k" believed to be impossible. Every mode in here is just a different, slightly cheeky refusal to accept that second word.

It all bottoms out in 256-bit arithmetic modulo the prime p = 2^256 - 2^32 - 977, and there is a small piece of magic worth knowing even from a distance. That prime sits a hair under 2^256 (short by only about 4.3 billion), so anything that overflows can be folded back down by multiplying with one tiny constant instead of doing real division. That single property is why secp256k1 is a pleasure to implement, and why the Curve25519 crowd do not get to feel quite as smug as they would like.

What my version of cudahunt actually does.

There are exactly three situations where searching this keyspace is not a fool's errand, so there are three modes.

The blunt one is brute force. It picks a range of private keys and tries every single one derive the public key, hash it down to the 20-byte fingerprint that a Bitcoin
address really is, ask the target list "anyone home?", move on. This is the right tool for the Bitcoin "puzzle" addresses, where someone deliberately planted keys in known bit-ranges as a public benchmark, and it is gloriously useless above the mid-60-bit ranges, exactly as a sound curve should make it. The trick that keeps it honest is that I never pay for a full multiply per key each GPU thread pays once, then walks to the next key by cheap addition, and a whole batch shares a single expensive inverse.

The clever one is kangaroo, and it is my favourite partly for the name. When a target's public key is already sitting on the blockchain, brute force is embarrassing,
because Pollard's lambda method can find the key in roughly the square root of the work. The mental picture is exactly as silly as it sounds. I release a herd of
kangaroos that go hopping across the number line in pseudo-random jumps. Half are tame, and I know where they started half are wild, and they started at the unknown key. Wherever a tame and a wild kangaroo land on the same spot, the two stories of how they got there form an equation, and that equation is the key.

The human one is seed recovery. Someone wrote twelve words on a scrap of paper, the paper got wet, and now a couple of words are smudged or out of order, and there is real money on the other side of remembering them. Given the addresses and a partial phrase, this mode rebuilds the rest. The words become a seed through a deliberately slow 2048-round stretch, which BIP39 uses precisely to make people like me suffer the seed grows a whole tree of keys the standard branches get checked against your targets. There is also a tiny rfc1751 converter for the obscure 1994 word encoding that tried to do all of this thirty years early, kept around purely because every project deserves one room reserved for curiosity.

To build from source you need an NVIDIA GPU and the CUDA toolkit, nothing more exotic. I wrote it on a tired old GTX 1080 (Pascal, sm_61) on a newer card just tell it the arch, like make ARCH=sm_86, and it goes faster for free. The BIP39 wordlists live in `BIP Lists/`, and the RFC1751 dictionary is baked straight into the source, so there are no stray data files to lose.

Code:

./hunter --list                       # every puzzle and whether it is still open
./hunter --puzzle 71                  # brute it (auto-switches to kangaroo if the pubkey is known)
./hunter kangaroo --bits 48           # watch it recover a random key from its pubkey alone
./hunter seed --targets address_list.txt --phrase myseed.txt
./hunter rfc1751 --decode "TIDE ITCH SLOW REIN RULE MOT"

For seed recovery the gentlest path is a file with one word per line and a `?` on any line you cannot remember, you can literally count the lines to be sure you have twelve. you can also write ~aban for a word you half-remember (it expands to the close BIP39 spellings), `aban*` for "it started like this", or [river|rival|ribbon] for a short list of guesses. Every sub command answers to --help.

Example would be

Code:

./hunter seed --targets address_list.txt  --words hold safe dust only time will release wealth pass ? ? ?

This is the part I am proudest of, and the part most worth stealing. The entire math core is written once and compiled for both the CPU and the GPU, which let me run the exact same code on my laptop and check it, one layer at a time, against an outside referee before I allowed myself to optimise anything at all. The field math, the curve, both hash families, the seed stretch, the key derivation, each one is cross-checked against Python or against the official published test vectors.

The key-derivation layer in particular is checked against the specification's own answer key, not merely against a second implementation of my own, because two of my own implementations can cheerfully agree on the same mistake and lie to me in unison.

My one rule throughout correctness first, speed second, because a fast wrong answer is just a confident lie.

What I learned making it fast Shocked

Mostly that the GPU was bored, not busy. Almost every real speedup came from feeding the machine better rather than from cleverer mathematics. The brute kernel was choking on its own scratch memory rather than on the hashing, so shrinking that woke it up. The seed kernel taught me two lessons the hard way. First, that a cheap filter rejecting fifteen of every sixteen guesses can paradoxically waste the whole machine if the survivors and the rejects share a warp, which I fixed by herding the survivors into their own dense crowd before the expensive work begins. Second, that the slow seed stretch is waiting-bound rather than compute-bound, so the cure is more threads in flight, not fewer instructions.

Matching against a ten-thousand-address list went from a slog to a couple of bit-pokes with a Bloom filter, with an exact check sitting behind it so that a lucky false alarm can never become a false treasure. The numbers, on that same old 1080: brute around 255 million keys a second, kangaroo around 185 million hops a second, and seed around 46 thousand full guesses a second once the free checksum has thrown most of them away.

In the real world, mathematics does not care how fast the graphics card is, and I would rather tell you that than sell you a fantasy. Brute force doubles in cost with every extra bit and simply dies in the mid 60-80s nothing in here changes that, and nothing should. Seed recovery only works when you are missing a few words: one is instant, two is seconds,three is hours, and four or more is, for all practical purposes, never. The lever for real recovery is not raw speed but partial memory, which is exactly why the fuzzy matching above exists. There is no shortcut for a genuinely blank word. That is not a limitation of this tool, it is the whole point of the entropy, and on most days I am glad it holds. Wink

Where everything lives for the curious minds.

Code:

src/field.cuh      256-bit field arithmetic mod p (hand-written PTX + a portable twin)
src/ec.cuh         secp256k1 point math, plus a fast comb for repeated k*G
src/hash.cuh       SHA-256 + RIPEMD-160 + the address fingerprint
src/sha512.cuh     SHA-512 / HMAC / PBKDF2 (the BIP39 slow stretch)
src/bip39.cuh      mnemonic checksum + words-to-seed
src/bip32.cuh      the key tree (BIP44/49/84) + address hashing
src/bloom.cuh      fast-reject filter for big target lists
src/seed.cuh       seed recovery + the candidate generators
src/kangaroo.cuh   the kangaroo herd (interval ECDLP)
src/hunter.cu      brute kernel + CLI + live dashboard + dispatch
src/rfc1751.h      the RFC1751 word/key curiosity (wordlist embedded)
tests/             GPU known-answer tests with Python referees
tools/             benchmark harness, end-to-end find test, puzzle status
BIP Lists/         BIP39 wordlists

Feedback welcome
Would be interested to see results from newer cards or multiple cards if anyone is running such hunts.

Source Code : https://gitlab.com/0x1000003d1-group/cudahunt
Release : https://gitlab.com/0x1000003d1-group/cudahunt/-/releases