Print Page - Bruteforce partial electrum seed words

Quote from: eranglr on February 08, 2021, 08:24:18 PM

I know it's possible bruteforce 4 missing words

Are you sure that this is possible? I know that for two electrum missing words, it can take around 20 seconds on an average pc. For three words it'll take 20*2048 = 40960 seconds which is equal with ~11.3 hours. But for 4 words... Oh boy. It'll take around 23,142 hours which is 964 days.

Quote from: eranglr on February 08, 2021, 08:24:18 PM

Let's say that I don't know any of the words, but for 12 words (out of 13) I know the last one or two letters, and for two words I know the first letter.
I also know their order and even some of their length.

You can surely reduce it, by a lot. But still, brute forcing by not knowing 4 out of 12 words isn't meant to be found.

Quote from: eranglr on February 08, 2021, 08:49:42 PM

So let's say that now I have 13 lists of possible words, is there a available tool that I can use with those words as input?

Are we talking 13 lists of mnemonics that don't have 4 words out of 12? It seems impossible to me the way you describe it.

Anyway, there is no program that can do your job, but if you have 13 lists of possible words then you can try FinderOuter (https://bitcointalk.org/index.php?topic=5214021.0). Instead of opening one window, open as many as you want and use a mouse scheduler for each window.

Of course there is a program that can do it.
It's just all custom made to suit particular implementations.
That is why you have to tweak it by coding the word lengths and letters into the code one by one.
If you are talking about the black lives matters puzzle those runes are already decoded that way.

Here you have an example:

Code:

<script type="text/javascript">

for (w1=0;w1<WORDLISTS['english'].length;w1++) {
       var word1 = WORDLISTS["english"][w1]
              if(word1.length == 5&&word1[2]==word1[3] )
              {
                for (w2=0;w2<WORDLISTS['english'].length;w2++) {
                       var word2 = WORDLISTS["english"][w2]
                              if(word2.length == 4&&word2[2]==word1[1] )
                              {
                                for (w3=0;w3<WORDLISTS['english'].length;w3++) {
                                       var word3 = WORDLISTS["english"][w3]
                                              if(word3.length == 5&&word3[2]==word1[1] )
                                              {
                                                for (w4=0;w4<WORDLISTS['english'].length;w4++) {
                                                       var word4 = WORDLISTS["english"][w4]
                                                              if(word4.length == 5&&word4[1]==word4[3] && word4[2]==word2[0])
                                                              {
                                                                for (w5=0;w5<WORDLISTS['english'].length;w5++) {
                                                                       var word5 = WORDLISTS["english"][w5]
                                                                              if(word5.length == 5&&word5[2]==word5[4])
                                                                              {
                                                                                for (w6=0;w6<WORDLISTS['english'].length;w6++) {
                                                                                       var word6 = WORDLISTS["english"][w6]
                                                                                              if(word6.length == 3&&word6[0]==word3[0]&&word6[1]==word4[4]&&word6[2]==word5[2])
                                                                                              {
                                                                                                for (w7=0;w7<WORDLISTS['english'].length;w7++) {
                                                                                                       var word7= WORDLISTS["english"][w7]
                                                                                                              if(word7.length == 4&&word7[0]==word1[0]&&word7[2]==word4[1]&&word7[3]==word1[4])
                                                                                                              {
                                                                                                               
// and etc...
// yes i know... Horrible code... just horrible.
// but it get's the job done boiii

</script>

Quote from: BlackHatCoiner on February 08, 2021, 09:04:34 PM

Quote from: eranglr on February 08, 2021, 08:24:18 PM

I know it's possible bruteforce 4 missing words

Quote from: eranglr on February 08, 2021, 08:24:18 PM

You can surely reduce it, by a lot. But still, brute forcing by not knowing 4 out of 12 words isn't meant to be found.

Quote from: eranglr on February 08, 2021, 08:49:42 PM

So let's say that now I have 13 lists of possible words, is there a available tool that I can use with those words as input?

Thanks, I have 13 text files with possible words.
I don't see an option to use them in FinderOuter, any idea?

Quote from: BASE16 on February 08, 2021, 10:05:54 PM

Here is how i do.
Put all words in a text file and load it into the filereader.
Split() strip() and trim() into array.
Then you rotate the words randomly and validate the resulting mnemonic.
If true verify it to the target address.

https://i.ibb.co/16k8tBb/mnemonics.png (https://imgbb.com/)

I thought about doing that, but I don't know how to do it (newbie in Linux..).
1. How to make a combined text to be checked? note that I have 13 text files, each text file has a different word list.
2. How to check the text with all the possible seeds?

Well you could do it in Linux but i'm using JavaScript mostly.
You can also use NodeJS or Python.
Node is build on Chrome's V8 JavaScript engine.

You can also just type the words into a hardcoded array that would be the easiest way so you can omit the filereader.
Then you can use a library like bitcoinjs or jsbtc either one will work.
All the functions are readily available.

Here are some examples for bitcoinjs-lib: https://github.com/bitcoinjs/bip39 (https://github.com/bitcoinjs/bip39) <--- i prefer this one
You can see here some examples for jsbtc: https://github.com/bitaps-com/jsbtc/blob/master/test/jsbtc.test.js (https://github.com/bitaps-com/jsbtc/blob/master/test/jsbtc.test.js)

Quote from: eranglr on February 08, 2021, 08:24:18 PM

Edit:
I have 13 text files with possible seed words (1.txt, 2.txt...13.txt). most of the files contain 4-10 words.
I'm looking for a way to brute-force the seed based on those text files.

I know it's possible bruteforce 4 missing words, but how about the following scenario:

Let's say that I don't know any of the words, but for 12 words (out of 13) I know the last one or two letters, and for two words I know the first letter.
I also know their order and even some of their length.

Is there a known tool that I can play with to solve such puzzle?

Do you know the addresses you are looking for?
I think I could change my https://github.com/PawelGorny/lostword to implement solver of your problem, but it all makes no sense if you do not know what you are looking for.

Quote from: PawGo on February 08, 2021, 10:22:58 PM

Quote from: eranglr on February 08, 2021, 08:24:18 PM

Yes, I know the public address.

Quote from: eranglr on February 08, 2021, 10:31:12 PM

Quote from: PawGo on February 08, 2021, 10:22:58 PM

Quote from: eranglr on February 08, 2021, 08:24:18 PM

Yes, I know the public address.

OK, I will try to do this tomorrow.

Quote from: eranglr on February 08, 2021, 09:24:51 PM

I don't see an option to use them in FinderOuter, any idea?

The option is found under "Missing mnemonic" and you have to select the "Electrum" type from the second drop down on the right, then select the wallet type that Electrum was using from the drop down that appears after that. Make sure to enter the correct BIP32 path. Example 6 and 7 are Electrum mnemonics missing 1 and 2 words respectively.
https://i.imgur.com/FN2HmLZ.jpg

A couple of points:
* Electrum mnemonics are 12 words by default
* You have to know the position of the missing word. But if it is only 1 word you can manually change the position from start to end and click Find 12 times, it should all take 12 seconds.
* You have to know a derived child from that wallet (it can be a child private key, public key or an address)
* Setting the path correctly is very important. When setting the wallet type (Electrum mnemonic type) FinderOuter automatically sets the correct path for that type but you have to enter the index of the child key you entered (eg. if you entered the 10th address you have to add 9 to the end of the path)

If I'm understanding what you're writing... you have something like:

- possible_word1(list of ~4-10 words)
- possible_word2(list of ~4-10 words)
- possible_word3(list of ~4-10 words)
...
- possible_word12(list of ~4-10 words)

and from the sounds of it, you actually 13 words? so, this is the "old" electrum seed format? ???

Anyway, assuming just 12 words for now, your possible seed combinations are: #word1_words * #word2_words * #word3_words * ... * #word12_words. Doesn't sound like much, until you do the math... even with "best case" of just 4 possible words in each position... that's 4^12 or 16,777,216 permutations.

Rough experience tells me that around ~6% of those will be "valid" (ie. the checksum matches)... so you're looking at something like ~1,006,632 "valid" seeds that you'd need to check for a match against your public address.

Anyway, I wrote a quick python script that takes in 12 files named 1.txt, 2.txt, 3.txt, ..., 12.txt (for my demo, each file has only 4 words) and then just iterates through the words in each position... it generates over 10,000 seeds/minute... and is finding (as expected) slightly more than 6% of those to be "valid".

Quote

Start: 2021-02-10 13:19:45.293000
10000 Seeds 2021-02-10 13:20:29.445000
621 Valid Seeds 2021-02-10 13:20:29.445000
20000 Seeds 2021-02-10 13:21:11.167000
1204 Valid Seeds 2021-02-10 13:21:11.167000
30000 Seeds 2021-02-10 13:21:55.523000
1832 Valid Seeds 2021-02-10 13:21:55.523000
40000 Seeds 2021-02-10 13:22:38.955000
2442 Valid Seeds 2021-02-10 13:22:38.955000
50000 Seeds 2021-02-10 13:23:23.073000
3041 Valid Seeds 2021-02-10 13:23:23.073000
60000 Seeds 2021-02-10 13:24:08.397000
3670 Valid Seeds 2021-02-10 13:24:08.397000
70000 Seeds 2021-02-10 13:24:53.846000
4308 Valid Seeds 2021-02-10 13:24:53.846000
80000 Seeds 2021-02-10 13:25:37.352000
4911 Valid Seeds 2021-02-10 13:25:37.352000
90000 Seeds 2021-02-10 13:26:23.622000
5543 Valid Seeds 2021-02-10 13:26:23.622000
100000 Seeds 2021-02-10 13:27:09.700000
6180 Valid Seeds 2021-02-10 13:27:09.700000
...

So, at ~10k seeds/minute... maybe ~1677 minutes to go through the full search space (only 4 words per slot, 12 slots)... which is around 28hours. :-\

The main slow downs will be:
- More words in a given position... that's going to ramp up the total search space
- Actually generating private keys/addresses to check, as these are computationally a lot slower than just checking if a seed is valid.

With some optimisation, and maybe porting to C or something faster than Python, you'd probably gain some performance benefits... I'm sure it's in the realms of reality to be able to do it within a matter of days? ??? It really depends on how big your search space is... what are the exact number of words you have in each position? ???

Quote from: HCP on February 10, 2021, 12:35:48 AM

If you are ONLY validating the checksum of each permutation then your speed must be in the millions per second rate. 10k is way too low for that because essentially you are just computing SHA256 (for BIP39) or SHA512 (for Electrum) and your CPU is capable of computing millions per second of these hashes. Translating your code to C shouldn't change much about its speed because the slowness is most probably in the algorithm you are using not the language.

Quote from: HCP on February 10, 2021, 12:35:48 AM

It could also be a BIP39 seed with a one-word BIP38 password. OP did not clarify if each file corresponds to a word in the seed so it's possible that the order is also unknown and this is no different from having one file full of words.

Actually, we don't even know which wordlist is used which makes a big difference if the seed phrase is for a custom wordlist (otherwise I would not see the point of having a file full of BIP39 or Electrum words ???)

Quote from: HCP on February 10, 2021, 12:35:48 AM

With some optimisation, and maybe porting to C or something faster than Python, you'd probably gain some performance benefits... I'm sure it's in the realms of reality to be able to do it within a matter of days? ??? It really depends on how big your search space is... what are the exact number of words you have in each position? ???

Ultra modern processors (AMD Ryzen/ anything using Zen microarchitecture and Intel ice lake 10xxx and later) have hardware accelerated SHA256 instructions which you can call from C using the __asm__ keyword: SHA256RNDS2, SHA256MSG1 and SHA256MSG2 (https://en.wikipedia.org/wiki/Intel_SHA_extensions).

Quote from: NotATether on February 10, 2021, 05:25:23 AM

Quote from: HCP on February 10, 2021, 12:35:48 AM

The seed is 13 words from the English dictionary of electrum 2.0.3 version.

Quote from: PawGo on February 08, 2021, 10:31:59 PM

Quote from: eranglr on February 08, 2021, 10:31:12 PM

Quote from: PawGo on February 08, 2021, 10:22:58 PM

Quote from: eranglr on February 08, 2021, 08:24:18 PM

Yes, I know the public address.

OK, I will try to do this tomorrow.

Did you have time to do that?
Thanks!

Quote from: pooya87 on February 10, 2021, 04:38:32 AM

f you are ONLY validating the checksum of each permutation then your speed must be in the millions per second rate. 10k is way too low for that because essentially you are just computing SHA256 (for BIP39) or SHA512 (for Electrum) and your CPU is capable of computing millions per second of these hashes.

Which would probably be true if you had a pre-existing list of all the seed word permutations and/or seed value...

But I ran into memory issues attempting to pre-compute everything, so it has to generate each permutation, then attempt to convert it to a seed while checking the checksum etc... I'm sure the way the algorithm is generating the seed permuations is probably pretty poor. It's basically just iterating some lists.

The "library" I'm using for testing the validity of the mnemonic itself should not be too bad, it's the "to_entropy()" function from the Trezor "python-mnemonic" code: https://github.com/trezor/python-mnemonic/blob/master/mnemonic/mnemonic.py#L125

But I'm sure there are probably ways to make that more efficient.

For what it's worth... the script finished overnight:

Quote

...
67100000 Seeds 2021-02-11 10:22:16.525000
1049432 Valid Seeds 2021-02-11 10:22:16.525000
End: 2021-02-11 10:22:17.004000
Total: 21:02:31.711000

21hrs and "found" 1,049,432 valid seeds :P

interestingly... the last 12.5 million or so seed permutations didn't create any valid seeds:

Quote

...
54520000 Seeds 2021-02-11 10:10:36.431000
1049059 Valid Seeds 2021-02-11 10:10:36.431000
54530000 Seeds 2021-02-11 10:11:02.463000
1049432 Valid Seeds 2021-02-11 10:11:02.463000
54540000 Seeds 2021-02-11 10:11:02.990000
1049432 Valid Seeds 2021-02-11 10:11:02.990000
54550000 Seeds 2021-02-11 10:11:03.514000
1049432 Valid Seeds 2021-02-11 10:11:03.514000
54560000 Seeds 2021-02-11 10:11:04.035000
...

I guess the limited number of words available in position 12 just didn't make any valid checksums with the other available words!

Quote from: HCP on February 11, 2021, 12:02:21 AM

The "library" I'm using for testing the validity of the mnemonic itself should not be too bad, it's the "to_entropy()" function from the Trezor "python-mnemonic" code: https://github.com/trezor/python-mnemonic/blob/master/mnemonic/mnemonic.py#L125

That's the problem with using a library to brute force, they are not designed to be the most efficient. They are just optimized to perform one operation in reasonable time. From a quick look this part is being repeated pointlessly and it has obvious expensive processes such as the memory lookup even if it is a binary search, the conversion between the words and bits, working with strings, unnecessary checks,...
https://github.com/trezor/python-mnemonic/blob/master/mnemonic/mnemonic.py#L126-L156

What you want to do is to first convert all your words to their binary representation without having any memory issues.
13 file * 10 word each * 4 byte to hold bits = 520 byte total
Now you create the permutation of these bits and compute their hashes. Working with integers is significantly faster than working with strings.

OOPS... I had accidentally left in the "to_seed()" call... which takes the mnemonic and generates the actual seed... including the computationally expensive code:

Code:

stretched = hashlib.pbkdf2_hmac(
    "sha512", mnemonic_bytes, passphrase_bytes, PBKDF2_ROUNDS
)

If I comment that part out... it runs "a little bit" faster ;) ::)

Quote

Start: 2021-02-11 21:13:07.520000
10000 Seeds 2021-02-11 21:13:10.346000
621 Valid Seeds 2021-02-11 21:13:10.346000
20000 Seeds 2021-02-11 21:13:13.092000
1204 Valid Seeds 2021-02-11 21:13:13.092000
30000 Seeds 2021-02-11 21:13:15.903000
1832 Valid Seeds 2021-02-11 21:13:15.903000
40000 Seeds 2021-02-11 21:13:18.769000
2442 Valid Seeds 2021-02-11 21:13:18.769000
50000 Seeds 2021-02-11 21:13:21.671000
3041 Valid Seeds 2021-02-11 21:13:21.671000
...

~3secs for 10,000... or ~200,000/minute... just goes to show how "expensive" the "ENT to seed" process can be! :o ;)

Quote from: eranglr on February 08, 2021, 08:24:18 PM

I couldn't understand the problem. Say you have 7.txt with 10 seed words. You just need two more words to make 12 word seed. After lotta combination and permutations and filtering bad seeds and getting final address and looking up on blockchain for utxo. Is that what you are doing?

There are couple of questions
1. Is each file corresponds to one wallet?
2. Do you know the positions of missing seed words in each file?
3. Are these 12 word mnemonic seeds or 15 or 24? (If they are 15 words or more and your answer to 2 is no then you can lose all hopes to crack them, most probably.)

If you can answer that, they it's kinda easy to comeup with bruteforce solutions...

Quote from: HCP on February 11, 2021, 08:27:06 AM

~3secs for 10,000... or ~200,000/minute... just goes to show how "expensive" the "ENT to seed" process can be! :o ;)

FWIW if I comment out the mnemonic to BIP32 seed part and just validate the checksums of each permutation in FinderOuter, it can perform ~7.2 million checks per second on my corei3 CPU.

I should point out that the reason why the overall mnemonic recovery option is slow is because the algorithm that comes after checksum validation is expensive (to derive the child keys), and also there is an issue with my implementation of Elliptic Curve Cryptography that puts a lot of pressure on GC slowing down the process.

Quote from: eranglr on February 10, 2021, 07:17:22 PM

Quote from: PawGo on February 08, 2021, 10:31:59 PM

Quote from: eranglr on February 08, 2021, 10:31:12 PM

Quote from: PawGo on February 08, 2021, 10:22:58 PM

Quote from: eranglr on February 08, 2021, 08:24:18 PM

Yes, I know the public address.

OK, I will try to do this tomorrow.

Did you have time to do that?
Thanks!

Yes, I have prepared a new release: https://github.com/PawelGorny/lostword/releases/tag/v0.8.0
Now it works with a candidates on the given positions, for example:

Code:

POOL
bc1q0v5q36eaculyrykjnjsyuey6ctd3802ft4jdcc
6
brother
window master canal cat
?
black master remove cat
pitch
hill

Quote from: BlackHatCoiner on February 08, 2021, 09:04:34 PM

Quote from: eranglr on February 08, 2021, 08:24:18 PM

I know it's possible bruteforce 4 missing words

Whenever I am in such a situation where I have to perform the loops faster I do the below thing. Let me know if it can work well in this scenario as well.

So let's consider that 4 words will take 964 days according to you. So when we write a program to bruteforce the seed one program will take 964 days.
Now let's restrict that program to limited amount of combination and create another program with another set of combinations such that both programs have different set of combinations and are executed in parallel.
In this way both will be executed simultaneously and we will get the results in half of it's time meaning 964/2 = 482 days.

If we repeat this and create 10 such programs then the speed of results will be 10 times faster. Even if we don't run it in the same machine, we can run them in different machines.
So 964/10 = 96.4 days.

How do you find the above approach ?

Quote

Quote from: eranglr on February 08, 2021, 08:24:18 PM

You can surely reduce it, by a lot. But still, brute forcing by not knowing 4 out of 12 words isn't meant to be found.

Well that's very true. Bruteforcing 4 words is completely waste of time unless we know a possible combination of words and the sequence in which those words might have been placed.
Otherwise it's completely a waster of time trying out so many combinations.

Quote from: pawanjain on February 11, 2021, 04:23:36 PM

what you are describing is the definition of parallelism and is used by any modern CPU that has more than one core. you don't need to write or run different programs, you just write once but split the workload between multiple CPU cores and each core performs the same task on a different data but separately and in "parallel".
CPUs can have between 2 to 64 cores.

GPUs have hundreds of cores and can run a lot more operations in parallel which is why using GPU in brute forcing is very common and the speed gain is huge.

with that in mind i don't think it is hard to recover a seed with missing 4 words.

Quote from: BrewMaster on February 11, 2021, 05:26:44 PM

Thats great, I'll read more kn how to execute a set of instructions on a particular core and how to divide the load in % to different cores. If you have any leads let me know.

Quote

with that in mind i don't think it is hard to recover a seed with missing 4 words.

It actually depends on various factors whether it's hard to recover the missing words or not. For instance

if you know the sequence then the possible combinations for entropy of 128 bit is
2048*12 (for one word)
2048*2048*2048*2048*12 (for 4 words)

If you dont know the sequence
2048*12! (for one word)
2048*2048*2048*2048*12! (for 4 words)
where ! is factorial

and that is too many combinations. Also, if the entropy is higher the number of combinations drastically increases.
Someone correct me if I am wrong.

Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: eranglr on February 08, 2021, 08:24:18 PM