Bitcoin Forum
May 20, 2019, 07:39:05 PM *
News: Latest Bitcoin Core release: 0.18.0 [Torrent] (New!)
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Generating addresses for millions of private keys from seeds [SOLVED]  (Read 349 times)
Desmond1543
Newbie
*
Offline Offline

Activity: 22
Merit: 7


View Profile
February 01, 2019, 12:12:50 AM
Last edit: February 12, 2019, 03:10:53 PM by Desmond1543
Merited by LoyceV (3), ETFbitcoin (1)
 #1

This problem has been solved, with a modification of BTCRecover I am running through my seeds with a speed of 24k seeds per second with a derivation depth of 5. If someone makes the same mistake in the future, here are ETAs. Im running on 8 threads i7.

15 words all scrambled - Max 105 days
14 words all scrambled - Max 21 days
13 words all scrambled - Max 3 days
12 words all scrambled - Max 6 hours


Hi,

Long story short. Some years ago I wrote down 12 seed words from my Mycelium-wallet. To make it less suspicious if found I added some words to make sentences. However I accidentally added words that were part of the English BIP 39 wordlist.  Fast forward till today. Memory is much more fragile than one would think. My easy to memorize order swap of the words were wrong. So I have too many words and the wrong order.

I worked with this problem for a couple of days now, slowly going through larger and larger scopes of possible combinations. Generating 12 word seeds is fast, does some millions per second. Making those seeds into private keys is time expensive, doing around 36 generations per second. Knowing how fast vanitygen works (might be a different method though?) I feel like this is a tad slow. Especially if I have to run through thousands of addresses. But generating addresses is even worse! I can only generate one address every second. Deriving it from the xpriv key.

I have accepted that this might take for ever, but I would love to get some pointers and help from the community, and I'll make sure to reward anyone who contributes to the solving of this problem.

Having 14 possible words of a 12 word seed makes 43 589 145 600 possible arrangements. Let's say 5% of those gives a correct checksum. That would be 2 179 457 280 combinations.
If I somehow managed to check 1000 addresses each second it would take at max a month to find the correct one. I recon it should be able to push that number. I am also fairly sure about some of the words, which should bring down the possible amount of addresses.

I am using btctools for Python at the moment. I have no idea if it should take this long to generate, on the other hand, when using sites as https://iancoleman.io/bip39/ it generates 20 publickeys in a second. I am sure it must be a faster way then the one I am using.

My method right now:
1. Generage huge lists of possible combinations of seeds, ex. oven rifle phrase planet dirt true cinnamon kick first echo thing excuse
2. Run through the list line by line and generate BIP32 root key ex. xprv9s21ZrQH143K3HKXZ8ZPebpXnQbWRsQeKnoUbu7BzMpgtym7ya8hPaF2dmFS621C2BMnvCb3qYj 4cL7GiVK1VNmnA7wxFtPmBT8U1xUW8D6
3. Derive the BIP44 address from this root key, ex. 1NF7rutG9zTiZ7HbYuqmik2Sbb8HwqJcqG
4. Check the given addresses against blockchain.
1558381145
Hero Member
*
Offline Offline

Posts: 1558381145

View Profile Personal Message (Offline)

Ignore
1558381145
Reply with quote  #2

1558381145
Report to moderator
The Man Behind
Pokémon
&
Yu-Gi-Oh
brands
Collect!
Trade!
Play!
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1558381145
Hero Member
*
Offline Offline

Posts: 1558381145

View Profile Personal Message (Offline)

Ignore
1558381145
Reply with quote  #2

1558381145
Report to moderator
ETFbitcoin
Legendary
*
Offline Offline

Activity: 1638
Merit: 1764

Use SegWit and enjoy lower fees.


View Profile WWW
February 01, 2019, 04:26:11 AM
 #2

IMO you need to look for library written on language which have good performance (such as C++) or support GPGPU, otherwise i doubt you could get find your seed within your lifetime especially because check whether an address contain balance is big overhead.
For starter you can check list of library at https://en.bitcoin.it/wiki/Software#Libraries or check https://github.com/gurnec/btcrecover software

Additionally, you can check this thread : Math problem regarding recovery seed. OP had similar problem, but the difference is OP only swap few of the words.

HCP
Legendary
*
Offline Offline

Activity: 966
Merit: 1489

<insert witty quote here>


View Profile
February 01, 2019, 05:17:01 AM
 #3

I think you might be approaching this in the wrong way... but I'm a bit confused as to what you do and don't have with regards to words and addresses.

You say you have 14 possible words? Do you mean you added 2 extra words... and jumbled up the word order? Huh

As for the number of "legitimate arrangements", I think you'll find the correct checksum percentage is quite a bit lower than 5%. Given a set of 11 words, in my testing, only 4 or 5 words out of the 2048 word list will then generate the correct checksum when used as the 12th word.

I have a script that I have used for fixing 1 or 2 word errors in a seed... and it can generate and test seed combinations to see if they are valid, then generate 200 privatekeys and addresses from each "valid" seed that it finds at a fairly quick rate.

Also, I assume you don't remember or have a record of ANY address that you previously used from this seed?

Desmond1543
Newbie
*
Offline Offline

Activity: 22
Merit: 7


View Profile
February 01, 2019, 09:11:26 AM
 #4

IMO you need to look for library written on language which have good performance (such as C++) or support GPGPU, otherwise i doubt you could get find your seed within your lifetime especially because check whether an address contain balance is big overhead.
For starter you can check list of library at https://en.bitcoin.it/wiki/Software#Libraries or check https://github.com/gurnec/btcrecover software

Additionally, you can check this thread : Math problem regarding recovery seed. OP had similar problem, but the difference is OP only swap few of the words.

Sadly I have every only used C#, Python and Java. However I will look into that and see if I can get it running. Thanks!
I also realise that if I could get hold of a address used with this account it would be way faster not having to check every address.

Br
Desmond1543
Newbie
*
Offline Offline

Activity: 22
Merit: 7


View Profile
February 01, 2019, 09:19:54 AM
 #5

I think you might be approaching this in the wrong way... but I'm a bit confused as to what you do and don't have with regards to words and addresses.

You say you have 14 possible words? Do you mean you added 2 extra words... and jumbled up the word order? Huh

As for the number of "legitimate arrangements", I think you'll find the correct checksum percentage is quite a bit lower than 5%. Given a set of 11 words, in my testing, only 4 or 5 words out of the 2048 word list will then generate the correct checksum when used as the 12th word.

I have a script that I have used for fixing 1 or 2 word errors in a seed... and it can generate and test seed combinations to see if they are valid, then generate 200 privatekeys and addresses from each "valid" seed that it finds at a fairly quick rate.

Also, I assume you don't remember or have a record of ANY address that you previously used from this seed?

Here is a clarification. I generated the 12 word seed. Then I though I had a bomb method to rearrange them. I show an example with 6 words.
word 0, word 1, word 2, word 3, word 4, word 5
I then though of a number that was important to me, lets say 142201
So then I would first write down word 1, then word 4 etc. Giving me
Word 1, word 4, word 2, word 3, word 0, word 5.

At least, this is how I though I did it. But after having checked most of these possibilities I am starting to wonder where I fudged up.

Then I wrote a story to conceal the words.
Bla bla bla word 0, bla bla bla word 4, bla bla bla word 2, bla bla word 3 etc.
And by accident, I put in words in my story that was also part of the BIP39 word list.

I have ruled it down to 14-15 words now. I could take different approaches in how to solve this, either by simply bruteforceing all of them, or trying to make out the arrangement which they were ordered in.

However, when I do this I need to be able to check the fastest way possible, hence, asking for your help. I hope that the checksumed ones would be less than 5%, but that's the number I got after generating some millions of scrambled 12 word seeds. Another thing I might look into is as you said, try to find a corresponding address so I have any clue which address I am looking for.

TLDR; Yes, I am an idiot, I did not only screw up the order, I put in additional words Smiley
Desmond1543
Newbie
*
Offline Offline

Activity: 22
Merit: 7


View Profile
February 01, 2019, 02:29:16 PM
 #6

Hi again.

Just throwing an idea out there, maybe someone could tell me if it is doable.
Let's say I know there are around 1-2BTC in the account. There should only be around 300000 addresses within this range.
If I know certain dates of transactions, or the last time it was touched, I should be able to narrow it down and find my address.

I've been searching for a way to look through all addresses on the chain. Should be able to get them from a fully synced node?
KingZee
Sr. Member
****
Offline Offline

Activity: 574
Merit: 413


Check your coin privilege


View Profile
February 02, 2019, 03:11:27 AM
 #7

Hi again.

Just throwing an idea out there, maybe someone could tell me if it is doable.
Let's say I know there are around 1-2BTC in the account. There should only be around 300000 addresses within this range.
If I know certain dates of transactions, or the last time it was touched, I should be able to narrow it down and find my address.

I've been searching for a way to look through all addresses on the chain. Should be able to get them from a fully synced node?

Did you generate all the addresses? You need to add them in bulk to bitcoin core and then check the wallet's balance. I tried for a long time to find a way to get an address's balance from bitcore, but there isn't any command that implements it, the address has to be part of the wallet.

I still have code that can run through all your addresses, it might take a while, but it's still very reasonably fast. For 300.000 it should take less than a day.

nololol
Newbie
*
Offline Offline

Activity: 1
Merit: 0


View Profile
February 02, 2019, 06:59:38 AM
 #8

Presumably you'll scan the blockchain in advance and compile a list of addresses you'd like to usurp. You put these in a set data structure (maybe a probabilistic one like Bloom filter) and check against it any address you generate. Then the only question is whether this can be done efficiently on a GPU.
source https://www.technologyaside.com/
Desmond1543
Newbie
*
Offline Offline

Activity: 22
Merit: 7


View Profile
February 02, 2019, 11:31:14 AM
 #9

Hi, thanks for your answers.

I used the btcrecover repo to get all public keys from the blockchain (about 300 million lmao) into a database of some sort.
However I haven't yet figured out how I can get btcrecover to use my own lists of seeds to read from, instead of btcrecover trying to do it itself.
Btcrecover seed is fairly simple as it just assumes 4 mistakes in the seedwords. But that is not  my issue here. I want to use BTC recover for its speed, but not the seed generation part.

I have now generated all possible word combinations of 12 words, which is around 479 million combinations.
After checking the checksum of all of these combinations I am down to 30 million valid seeds. I think BTCrecover can check around 10k addresses per second. Would only take an hour to go through if I manage to pull that off.

So I can't simply search in the blockchain with simple commands which addresses have been used when etc? I recon the top100 address site is able to check the balance at least.

What I would love to do is scan through the entire blockchain

If(balance>2 || balance<1):
disregard addresses
If(transactiondate > april2017||transactiondate < january 2017)
disregard addresses
print(whateverisleft)

Then I would probably be able to pinpoint the wallet. There shouldn't be too many wallets that has only been used during some months, haven't been touched for months and still contains 1+ BTC
arulbero
Legendary
*
Offline Offline

Activity: 1267
Merit: 1292


View Profile
February 02, 2019, 12:52:33 PM
 #10



I have accepted that this might take for ever, but I would love to get some pointers and help from the community, and I'll make sure to reward anyone who contributes to the solving of this problem.

Having 14 possible words of a 12 word seed makes 43 589 145 600 possible arrangements. Let's say 5% of those gives a correct checksum. That would be 2 179 457 280 combinations.
If I somehow managed to check 1000 addresses each second it would take at max a month to find the correct one. I recon it should be able to push that number. I am also fairly sure about some of the words, which should bring down the possible amount of addresses.

I am using btctools for Python at the moment. I have no idea if it should take this long to generate, on the other hand, when using sites as https://iancoleman.io/bip39/ it generates 20 publickeys in a second. I am sure it must be a faster way then the one I am using.

My method right now:
1. Generage huge lists of possible combinations of seeds, ex. oven rifle phrase planet dirt true cinnamon kick first echo thing excuse
2. Run through the list line by line and generate BIP32 root key ex. xprv9s21ZrQH143K3HKXZ8ZPebpXnQbWRsQeKnoUbu7BzMpgtym7ya8hPaF2dmFS621C2BMnvCb3qYj 4cL7GiVK1VNmnA7wxFtPmBT8U1xUW8D6
3. Derive the BIP44 address from this root key, ex. 1NF7rutG9zTiZ7HbYuqmik2Sbb8HwqJcqG
4. Check the given addresses against blockchain.

You don't know
1) the private key
2) the address?

If you have to check each address you generate against the blockchain, you need a bloom filter like in this program https://github.com/ryancdotorg/brainflayer

2 180 000 000 addresses are not too many.  Besides you can save the time of the encode58.
Desmond1543
Newbie
*
Offline Offline

Activity: 22
Merit: 7


View Profile
February 02, 2019, 09:15:30 PM
 #11



I have accepted that this might take for ever, but I would love to get some pointers and help from the community, and I'll make sure to reward anyone who contributes to the solving of this problem.

Having 14 possible words of a 12 word seed makes 43 589 145 600 possible arrangements. Let's say 5% of those gives a correct checksum. That would be 2 179 457 280 combinations.
If I somehow managed to check 1000 addresses each second it would take at max a month to find the correct one. I recon it should be able to push that number. I am also fairly sure about some of the words, which should bring down the possible amount of addresses.

I am using btctools for Python at the moment. I have no idea if it should take this long to generate, on the other hand, when using sites as https://iancoleman.io/bip39/ it generates 20 publickeys in a second. I am sure it must be a faster way then the one I am using.

My method right now:
1. Generage huge lists of possible combinations of seeds, ex. oven rifle phrase planet dirt true cinnamon kick first echo thing excuse
2. Run through the list line by line and generate BIP32 root key ex. xprv9s21ZrQH143K3HKXZ8ZPebpXnQbWRsQeKnoUbu7BzMpgtym7ya8hPaF2dmFS621C2BMnvCb3qYj 4cL7GiVK1VNmnA7wxFtPmBT8U1xUW8D6
3. Derive the BIP44 address from this root key, ex. 1NF7rutG9zTiZ7HbYuqmik2Sbb8HwqJcqG
4. Check the given addresses against blockchain.

You don't know
1) the private key
2) the address?

If you have to check each address you generate against the blockchain, you need a bloom filter like in this program https://github.com/ryancdotorg/brainflayer

2 180 000 000 addresses are not too many.  Besides you can save the time of the encode58.

Damn I wish I was more fluent in programming. Right now I am using the blockchain.info API to check balances, which is totaly retarded. As you said, I need to check all used addresses on the blockchain instead.
I was reading about crawling and saving all addresses from the blockchain? But there are over 300 million of those, so yes, I would need a bloom filter (as I understand it, it basically reduces size?)

So there are two approaches, either crawl the entire blockchain, and search through it every time.
Or crawl it to find the public key so I dont have to search through a database.

Please explain what you mean by save the time of encode58?

Br
pooya87
Legendary
*
Offline Offline

Activity: 1638
Merit: 1644



View Profile
February 03, 2019, 04:23:33 AM
 #12

Please explain what you mean by save the time of encode58?

this is the route you take from a private key to an address. <-> means it is reversible, and -> means it only works in one direction:
Private key -> public key -> SHA256 hash -> RIPEMD160 hash <-> Base58 encoding with a checksum

when you are brute forcing you choose a private key in your loop and then compare it with any of the results that you may have, obviously public key is the fastest but in your case since you only have the "address" you need to take it all the way to the end. and since Base58 encoding is reversible it is best to only use its Hash160 result and not do the encoding itself.
besides when you search the "blockchain" itself there are not base58 encodings either. there are only scripts which have that Hash160 result in them as transaction outputs.
so the result is that your loop becomes slightly faster if you skip the last step

arulbero
Legendary
*
Offline Offline

Activity: 1267
Merit: 1292


View Profile
February 03, 2019, 06:59:42 AM
Last edit: February 03, 2019, 03:48:56 PM by arulbero
Merited by ETFbitcoin (6), suchmoon (4), JayJuanGee (1), bones261 (1), Alex_Sr (1)
 #13

Damn I wish I was more fluent in programming. Right now I am using the blockchain.info API to check balances, which is totaly retarded. As you said, I need to check all used addresses on the blockchain instead.
I was reading about crawling and saving all addresses from the blockchain? But there are over 300 million of those, so yes, I would need a bloom filter (as I understand it, it basically reduces size?)

So there are two approaches, either crawl the entire blockchain, and search through it every time.
Or crawl it to find the public key so I dont have to search through a database.


First, you have to look only at the UTXO set, you don't care about addresses used in the past and now empties

Updated at block # 547944   30/10/2018

 output               # addresses                       Tot bitcoin
                                                                                          
P2PKH                        18.453.794                    10.541.332  
P2SH                            3.865.985                      4.906.667  
P2PK                                 38.678                      1.759.927      
P2WPKH                           62.643                          126.738  
P2WSH                             18.662                            11.812    
MULTISIG 1-1                        357                                    0.056
MULTISIG 1-2                 142.354                             23.24      
MULTISIG 1-3                 205.226                             17.85        

TOT                          22.787.699                       17.346.536


so we are talking about 20 million, not 300.



With bloom filter the searching time becomes the neglegible part. If you have a list of all addresses (not base58 encoded) in UTXO

addresses.hex
Code:
0000000000000000000000000000000000000000
0000000000000000000000000000000000000001
0000000000000000000000000000000000000002
0000000000000000000000000000000000000003
0000000000000000000000000000000000000004
0000000000000000000000000000000000000005
0000000000000000000000000000000000000006
0000000000000000000000000000000000000007
0000000000000000000000000000000000000008
000000000000000000000000000000000000000a
0000000000000000000000000000000000000011
000000000000000000000000000000000000001a
0000000000000000000000000000000000000023
0000000000000000000000000000000000000064
0000000000000000000000000000000000000092
0000000000000000000000000000000000000100
0000000000000000000000000000000000000246
0000000000000000000000000000000000000258
000000000000000000000000000000000000028f
00000000000000000000000000000000000002fe
.......................................

you can use the program I linked to get from that list a 512 MB bloom filter called funds_h160.blf (addresses with funds)

First you have to download the program (I assume you use Linux) and compile it:

you can download it from https://github.com/ryancdotorg/brainflayer/archive/master.zip and unzip or use the 'git' command:

Code:
git clone https://github.com/ryancdotorg/brainflayer.git
cd brainflayer/
make

then run it

Code:
./hex2blf addresses.hex funds_h160.blf

[*] Initializing bloom filter...
[*] Loading hash160s from 'addresses.hex'  100.0%
[*] Loaded 18503292 hashes, false positive rate: ~2.162e-22 (1 in ~4.625e+21)
[*] Writing bloom filter to 'funds_h160.blf'...
[+] Success!


To perform a search in the bloom filter, suppose you have a list of 1000 addresses generated from 1000 private keys to check against the bloom filter:

Code:
for (uint16_t i = 0; i < 1000;  i++){
    
    if (bloom_chk_hash160(bloom, addresses[i])) {  //if there is a match between addresses[i] and one of the addresses in bloom filter
      
           printf("Found! Key number %08x\n" , i);
           exit();

    }
}

where the bloom_chk_hash160 function is defined here -> https://github.com/ryancdotorg/brainflayer/blob/master/bloom.h
Desmond1543
Newbie
*
Offline Offline

Activity: 22
Merit: 7


View Profile
February 03, 2019, 11:43:22 PM
Last edit: February 08, 2019, 02:30:46 PM by Desmond1543
 #14



Thanks both of you, I really wish I had gone into linux, however I am still on Windows. Which means I have no clue how to do anything of what you just said, even if it looks simple when you put it that way.
I just have no idea how to fetch the addresses from the blockchain. If I would have that I could probably figure something out.

BTCrecover from Python is really solid, already got all addresses, already got the generating program, does 18k addresses through blockchain per second with a wallet depth of 10
All I need is to figure out how to feed it my own set of privatekeys/seedlist.

Which would mean at MOST
7 hours with 12 words,
4 days with 13 words,
28 days with 14 words,
140 days with 15 words and 17 terrabytes of seeds.

I could live with that.

EDIT: That would be including all with wrong checksum, the correct checksum would only be 5% of these.

EDIT 2:
Day 14 of problem solving.

I have now started to generate all combinations. I settled with the 14 words I am most confident about.
This give me 43 589 145 600 combinations, I have three words I am certain about, which brings the combinations down to about 20 000 000 000 combinations.
This totals at some 1.6 Terrabytes of data. Only problem left is to feed it to btcrecover.
Desmond1543
Newbie
*
Offline Offline

Activity: 22
Merit: 7


View Profile
February 11, 2019, 07:16:07 PM
 #15

I have now generated every possible combinations from all 14 words. It took lots of time and lots of programming.

Now I really need your help in order to get this into addresses. I want a efficient way to input seeds and get addresses from them.

Any takers? I will make sure there will be a reward for contribution. I am really tired of this right now  Sad
arulbero
Legendary
*
Offline Offline

Activity: 1267
Merit: 1292


View Profile
February 12, 2019, 04:53:06 PM
Last edit: February 12, 2019, 05:07:26 PM by arulbero
 #16

I have now generated every possible combinations from all 14 words. It took lots of time and lots of programming.

Now I really need your help in order to get this into addresses. I want a efficient way to input seeds and get addresses from them.

Any takers? I will make sure there will be a reward for contribution. I am really tired of this right now  Sad

If you want I can give you a program that works like that:

generate_address file_input file_output

where file_input contains private key in hex format and file_output contains addresses in hex format.

I have to modify my program that generates only consecutive addresses.


What kind of speed do you need? It is enough Python or you need a C program?
Desmond1543
Newbie
*
Offline Offline

Activity: 22
Merit: 7


View Profile
February 12, 2019, 06:16:49 PM
Merited by bones261 (2)
 #17

I have now generated every possible combinations from all 14 words. It took lots of time and lots of programming.

Now I really need your help in order to get this into addresses. I want a efficient way to input seeds and get addresses from them.

Any takers? I will make sure there will be a reward for contribution. I am really tired of this right now  Sad

If you want I can give you a program that works like that:

generate_address file_input file_output

where file_input contains private key in hex format and file_output contains addresses in hex format.

I have to modify my program that generates only consecutive addresses.


What kind of speed do you need? It is enough Python or you need a C program?


Thanks for your reply!

I have now solved this problem with a modification to the BTCrecover code. It now takes my custom-generated seeds as input, and then outputs if any of the addresses from that seed is found on blockchain.
It does about 25k per second, which is not lots, but it is sufficient.
Pages: [1]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!