My Bitcoin Recovery Story (and Why I Started Building OpenCL + FPGA Prototypes)

ipsbruno3 (OP)

Newbie

Offline

Activity: 4
Merit: 1

My Bitcoin Recovery Story (and Why I Started Building OpenCL + FPGA Prototypes)

January 04, 2026, 06:50:47 AM

Merited by Vod (1)

I stored most of my Bitcoin in a wallet with four backups. Two were paper backups containing only 7 of the 12 seed words, and the other backups were stored on my computer and on a USB drive. After a power surge, my computer’s HDD failed. When I tried the USB drive, the data was corrupted.

The good news is that the missing words are the last 5 (out of 12). The amount involved is significant (I’ve been in the market since 2013), and recovering five missing words is essentially near the upper practical limit for brute-force recovery today—still possible, but it requires serious compute and careful optimization.

Why “Last 5 Words Missing” Is a Big Advantage

A 12-word BIP39 seed has 2048 possible values per word. Naively, 5 unknown words would be:

Quote

2048⁵ = 36,028,797,018,963,968 possibilities (~36 quadrillion)

But in a 12-word phrase, the final word contains checksum bits, so it’s not fully free. With the first 7 words known and the last 5 missing, the effective space becomes:

Quote

2048⁴ × 128 = 2,251,799,813,685,248 possibilities (~2.25 quadrillion)

That’s a 16× reduction (because 2048 / 128 = 16). In practical terms: if the checksum-constrained version takes ~5 years, the fully unconstrained 5-word search could take ~80 years at the same throughput. So being “missing the last 5” is still hard, but it’s a much better position than “missing 5 random words.”

GPU Prototypes (OpenCL): What I Learned

Because of the scale, I started building recovery prototypes in OpenCL (first), and more recently in Verilog on FPGAs, focusing on PBKDF2/HMAC/SHA-512 pipelines and partial-seed recovery strategies.

While experimenting:

For Electrum (RTX 5090), I found ways to leverage checksum behavior to discard candidates more aggressively (in practice: strong early screening that can significantly reduce full validations).
In my benchmarks, Electrum-oriented checks reached up to ~65 million candidate seeds/sec (effectively ~1/4096 after checksum filtering).
For BIP39 / PBKDF2-HMAC-SHA512 (Trust Wallet-style wallets and similar), I reached around ~1.5 million seeds/sec in OpenCL.
I also tried renting machines on Vast.ai, but the total cost ended up close to simply buying GPUs (especially once you factor in time, sustained utilization, and overhead). (Also: don’t trust third-party hosted environments for sensitive recovery work.)

At the scale required for my case, GPU recovery could mean something like ~100 high-end GPUs to target a ~6-month window. That’s “viable” in theory given the value involved—but the real pain point quickly becomes electricity cost.

Why I Pivoted to FPGAs (Lower Energy Cost / ASIC Path)

FPGAs are not cheap—often comparable to high-end GPUs in hardware cost—but the energy efficiency can be radically different.

In my tests, FPGAs consumed a small fraction of the power for certain workloads. That means the long-term cost shifts from “hardware + massive electricity” to mostly “hardware,” which is a huge difference when you’re running sustained compute for months.

A Side Project I Ended Up Loving: secp256k1 Public Key Generation

While my recovery problem is PBKDF2/HMAC/SHA-512 (not ECDSA), I started experimenting with secp256k1 acceleration because I genuinely enjoyed the engineering challenge.

Scalar multiplication (point_mul) is expensive, but there are techniques where you do one heavier operation and then rely on many cheaper additions (point_add). Point addition is much lighter, and that can drastically increase throughput in certain scenarios.

As a result of these experiments, I ended up measuring performance like:

Quote

~328 million keys/sec on an AMD Artix-7 XC7A200T, at < 28 W Lips sealed

(measured at the board/system level in my setup)

Versus a high-end GPU that can draw hundreds of watts up to ~1 kW depending on model and configuration.

Again: this is not required for my seed recovery flow—but it became a serious learning playground and produced some interesting open-source prototypes.

Longer-Term Idea (Still Experimental)

There are still many ways to improve these approaches, including designing specialized hardware (FPGA → ASIC) for narrowly-defined recovery tasks that help people who are missing 1–4 words, or have partially corrupted wallet files.

I’m also interested in exploring other mathematical/algorithmic areas on FPGAs from a research/engineering perspective, but my main focus remains on recovery pipelines and validation acceleration. (Perhaps exploring something like Pollard’s Rho in specialized hardware—purely as an experiment.)

Important note: I will never ask for anyone’s seed words, and you should never give your seed phrase to anyone. If you need help, ask for help installing tools, verifying builds, or understanding how to run open-source software — never share the actual secret.

============================================================

Energy Efficiency (Field Additions per Watt)
Higher is better

Quote

AMD XC7A200T FPGA (42 cores) | ████████████████████████████████████████ | 318M/W
GPU (RTX 3090 / RTX-class) | ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ | 1.4M/W
CPU (i9-12900K) | █░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ | 0.4M/W

============================================================

Open-Source Repositories

Everything below is open source — no subscription, no paid service, just code and experiments. If it’s useful, feel free to clone/fork (and yes, even sell). I’m genuinely happy to see people using the code and contributing.

GitHub repositories:

@ipsbruno3/fpga_bitcoin_seed_rescue
* Pipeline / PBKDF2-HMAC-SHA512-oriented workspace for accelerated hardware (FPGA-focused)
@ipsbruno/bitcoin_cracking_final
* Performance-oriented OpenCL implementation for PBKDF2 workloads on NVIDIA GPUs
@ipsbruno3/bitcoin_electrum_cracking
* OpenCL tools for Electrum-related checks; includes checksum-based shortcuts and benchmarks
@ipsbruno3/secp256k1-gpu-accelerator
* GPU code for secp256k1 public key generation (point_add / point_mul), with incremental techniques and benchmarks
@ipsbrunoreserva/seedmistake
* Typing aid for seed words using Levenshtein distance (suggests nearest BIP39 word)
@ipsbruno3/fpga_secp256k1_verilog
* Sequential public-key search on FPGAs at near-zero energy cost

Quote

Final Note
Before anyone criticizes: this is not a practical way to steal other people’s keys. Randomly “finding” someone else’s private key remains astronomically unlikely—winning lotteries would be easier. But if you lost part of your own data (e.g., a few missing words, or partially corrupted wallet files), don’t give up. In many real-world cases, recovery is possible with the right tooling, patience, and careful validation.

If you have any questions about the implementations, benchmarks, or the FPGA approach, I’ll be happy to answer.
And If you want to help me, buy me a coffe (~~or a GPU~~): bc1qc6yypnwtvfd09ashe73dlg5u3msr5c6xxnxxcv

Thanks guys

HODL BITCOIN 🌕🌕!!!

ABCbits

Legendary

Offline

Activity: 3486
Merit: 9558

Re: My Bitcoin Recovery Story (and Why I Started Building OpenCL + FPGA Prototypes)

January 04, 2026, 08:39:49 AM

Quote from: ipsbruno3 on January 04, 2026, 06:50:47 AM

For BIP39 / PBKDF2-HMAC-SHA512 (Trust Wallet-style wallets and similar), I reached around ~1.5 million seeds/sec in OpenCL.

The number looks impressive, is the speed also from RTX 5090?. Looking at other work on https://bitcoinwords.github.io/how-i-checked-over-1-trillion-mnemonics, he only managed about 142 thousand/sec using RTX 2080 Ti.

Quote from: ipsbruno3 on January 04, 2026, 06:50:47 AM

@ipsbruno/bitcoin_cracking_final
* Performance-oriented OpenCL implementation for PBKDF2 workloads on NVIDIA GPUs

This link leads to "404 This is not web page you are looking for".

Quote from: ipsbruno3 on January 04, 2026, 06:50:47 AM

Open-Source Repositories

Everything below is open source — no subscription, no paid service, just code and experiments. If it’s useful, feel free to clone/fork (and yes, even sell). I’m genuinely happy to see people using the code and contributing.

Quote from: ipsbruno3 on January 04, 2026, 06:50:47 AM

@ipsbruno3/bitcoin_electrum_cracking
* OpenCL tools for Electrum-related checks; includes checksum-based shortcuts and benchmarks
@ipsbruno3/secp256k1-gpu-accelerator
* GPU code for secp256k1 public key generation (point_add / point_mul), with incremental techniques and benchmarks

These 2 repository have zero mention of word "license".

.
.^{Duelbits PREDICT}..

.
.^{WHERE EVERYTHING IS A MARKET}..

█████
██
██

██
██
██████

Will Bitcoin hit $200,000
before January 1st 2027?
^No @1.15 ^Yes @6.00

█████
██
██

██
██
██████

^{CHECK MORE >}

NotATether

Legendary

Offline

Activity: 2212
Merit: 9230

Trêvoid █ No KYC-AML Crypto Swaps

Re: My Bitcoin Recovery Story (and Why I Started Building OpenCL + FPGA Prototypes)

January 04, 2026, 09:19:13 AM

Quote from: ipsbruno3 on January 4, 2026, 06:50:47 AM

As a result of these experiments, I ended up measuring performance like:

Quote

~328 million keys/sec on an AMD Artix-7 XC7A200T, at < 28 W (measured at the board/system level in my setup)

Versus a high-end GPU that can draw hundreds of watts up to ~1 kW depending on model and configuration.

I know you said that secp256k1 is not part of your recovery workflow, but I've done the math and that's basically a group of 7 seed words for 12-word seed can be processed every second (and then some).

Of course, that assumes pure point_mul or point_add operations and no other algorithms needing to be done to brute-force seed words, which is totally unrealistic.

For the 1.5 million seeds figure in OpenCL you posted, that works out to be about a group of 5 seed words per second. Meaning you have a good shot at recovering your coins. Good luck!

▄▄███▄▄
▄▄███████████▄▄
▄██████████████████▄
▄█████▀▀▀█████▀▀▀█████▄
████▌░░░░░░░░░░░░░▐████
████▌░░░░░░░░░░░░░▐████
████▀░░░▄▄░░░▄▄░░░▀████
████░░░██▀░░░▀██░░░████
████▄░░░░░▀█▀░░░░░▄████
▀█████▄▄▄▄▄▄▄▄▄▄▄█████▀
▀██████████████████▀
▀▀███████████▀▀
▀▀███▀▀

.
betpanda.io

│

ANONYMOUS & INSTANT
.......ONLINE CASINO.......

│

▄███████████████████████▄
█████████████████████████
█████████████████████████
████████▀▀▀▀▀▀███████████
████▀▀▀█░▀▀░░░░░░▄███████
████░▄▄█▄▄▀█▄░░░█▄░▄█████
████▀██▀░▄█▀░░░█▀░░██████
██████░░▄▀░░░░▐░░░▐█▄████
██████▄▄█░▀▀░░░█▄▄▄██████
█████████████████████████
█████████████████████████
█████████████████████████
▀███████████████████████▀

▄███████████████████████▄
█████████████████████████
██████████▀░░░▀██████████
█████████░░░░░░░█████████
████████░░░░░░░░░████████
████████░░░░░░░░░████████
█████████▄░░░░░▄█████████
███████▀▀▀█▄▄▄█▀▀▀███████
██████░░░░▄░▄░▄░░░░██████
██████░░░░█▀█▀█░░░░██████
██████░░░░░░░░░░░░░██████
█████████████████████████
▀███████████████████████▀

▄███████████████████████▄
█████████████████████████
██████████▀▀▀▀▀▀█████████
███████▀▀░░░░░░░░░███████
██████▀░░░░░░░░░░░░▀█████
██████░░░░░░░░░░░░░░▀████
██████▄░░░░░░▄▄░░░░░░████
████▀▀▀▀▀░░░█░░█░░░░░████
████░▀░▀░░░░░▀▀░░░░░█████
████░▀░▀▄░░░░░░▄▄▄▄██████
█████░▀░█████████████████
█████████████████████████
▀███████████████████████▀

SLOT GAMES
....SPORTS....
LIVE CASINO

│

▄░░▄█▄░░▄
▀█▀░▄▀▄░▀█▀
▄▄▄▄▄▄▄▄▄▄▄
█████████████
█░░░░░░░░░░░█
█████████████
▄▀▄██▀▄▄▄▄▄███▄▀▄
▄▀▄██▄███▄█▄██▄▀▄
▄▀▄█▐▐▌███▐▐▌█▄▀▄
▄▀▄██▀█████▀██▄▀▄
▄▀▄█████▀▄████▄▀▄
▀▄▀▄▀█████▀▄▀▄▀
▀▀▀▄█▀█▄▀▄▀▀

Regional Sponsor of the
Argentina National Team

ipsbruno3 (OP)

Newbie

Offline

Activity: 4
Merit: 1

Re: My Bitcoin Recovery Story (and Why I Started Building OpenCL + FPGA Prototypes)

January 04, 2026, 09:22:30 AM
Last edit: January 09, 2026, 09:36:59 AM by Mitchell

Quote from: NotATether on January 04, 2026, 09:19:13 AM

Quote from: ipsbruno3 on January 4, 2026, 06:50:47 AM

As a result of these experiments, I ended up measuring performance like:

Quote

~328 million keys/sec on an AMD Artix-7 XC7A200T, at < 28 W (measured at the board/system level in my setup)

Versus a high-end GPU that can draw hundreds of watts up to ~1 kW depending on model and configuration.

Hello friend,

Thanks for getting back to me.

---

Yes — an RTX 5090 should be able to reach roughly ~2 million PBKDF iterations/sec (PBKDF2 core only), depending on kernel design and memory pressure.

The main optimization trick is the **rotate/rename approach** to the round state: instead of explicitly shuffling `a, b, c, d, e, f, g, h` every round, you pass the registers “rotated” as arguments, so the round function only needs to write the updated D and H (or equivalently `e` and `a`). That avoids a lot of move/assignment instructions.

Example:

Quote

RoR(A0, A1, A2, A3, A4, A5, A6, A7, message[0], 0x428a2f98d728ae22);
RoR(A7, A0, A1, A2, A3, A4, A5, A6, message[1], 0x7137449123ef65cd);
RoR(A6, A7, A0, A1, A2, A3, A4, A5, message[2], 0xb5c0fbcfec4d3b2f);

It’s also possible to reuse parts of the Maj computation: because `(B, C)` in the current round become `(A, B)` in the next round, some intermediates (like pairwise ANDs) can be carried forward. The goal is simply to eliminate as many redundant instructions as possible.

Right now I’m working with wNAF 8 bit, and I plan to add a windowed comb method, plus a Montgomery trick to save one modular reduction. Most public implementations (e.g., hashcat-style approaches) stick to wNAF-4 to limit memory usage, but you can move precomputed tables from constant memory to a global buffer and take advantage of the GPU’s 12GB VRAM, rather than being constrained by ~64KB constant space.

John’s code is excellent, but it was clearly rushed to meet the competition deadline, so there wasn’t time for these smaller low-level optimizations.

---

I’ll fix the incorrect link — thanks for pointing that out.

And regarding licensing: no worries. The code is open-source and you’re free to use it however you like. I’m happy to see people learning from it, discussing the topic with me, and using it to solve real problems. That said, I’ll add a license file for clarity — do you recommend MIT, or would you prefer something else (e.g., Apache-2.0)?

Quote from: NotATether on January 04, 2026, 09:19:13 AM

Quote from: ipsbruno3 on January 4, 2026, 06:50:47 AM

As a result of these experiments, I ended up measuring performance like:

Quote

~328 million keys/sec on an AMD Artix-7 XC7A200T, at < 28 W (measured at the board/system level in my setup)

Versus a high-end GPU that can draw hundreds of watts up to ~1 kW depending on model and configuration.

Thank you!

I hope to recover in a few months or years; losing all my savings was a blow that almost ended my life... So i'm posting to help people avoid going through the same thing I did. I hope it's helpful in some way.

[mod note: Merged consecutive posts]

BitcoinSoloMiner

Member

Offline

Activity: 167
Merit: 27

Re: My Bitcoin Recovery Story (and Why I Started Building OpenCL + FPGA Prototypes)

January 08, 2026, 07:14:00 PM

very good work, its interesting, i hope to see more

Pages: [1]

Bitcoin Forum > Bitcoin > Project Development > My Bitcoin Recovery Story (and Why I Started Building OpenCL + FPGA Prototypes)

« previous topic next topic »