Bitcoin Forum
June 16, 2024, 02:44:36 PM *
News: Voting for pizza day contest
 
   Home   Help Search Login Register More  
Pages: « 1 [2]  All
  Print  
Author Topic: Bruteforce partial electrum seed words  (Read 354 times)
HCP
Legendary
*
Offline Offline

Activity: 2086
Merit: 4316

<insert witty quote here>


View Profile
February 11, 2021, 12:02:21 AM
Last edit: February 11, 2021, 12:27:49 AM by HCP
 #21

f you are ONLY validating the checksum of each permutation then your speed must be in the millions per second rate. 10k is way too low for that because essentially you are just computing SHA256 (for BIP39) or SHA512 (for Electrum) and your CPU is capable of computing millions per second of these hashes.
Which would probably be true if you had a pre-existing list of all the seed word permutations and/or seed value...

But I ran into memory issues attempting to pre-compute everything, so it has to generate each permutation, then attempt to convert it to a seed while checking the checksum etc... I'm sure the way the algorithm is generating the seed permuations is probably pretty poor. It's basically just iterating some lists.

The "library" I'm using for testing the validity of the mnemonic itself should not be too bad, it's the "to_entropy()" function from the Trezor "python-mnemonic" code: https://github.com/trezor/python-mnemonic/blob/master/mnemonic/mnemonic.py#L125

But I'm sure there are probably ways to make that more efficient.


For what it's worth... the script finished overnight:
Quote
...
67100000 Seeds 2021-02-11 10:22:16.525000
1049432 Valid Seeds 2021-02-11 10:22:16.525000
End: 2021-02-11 10:22:17.004000
Total: 21:02:31.711000
21hrs and "found" 1,049,432 valid seeds Tongue


interestingly... the last 12.5 million or so seed permutations didn't create any valid seeds:
Quote
...
54520000 Seeds 2021-02-11 10:10:36.431000
1049059 Valid Seeds 2021-02-11 10:10:36.431000
54530000 Seeds 2021-02-11 10:11:02.463000
1049432 Valid Seeds 2021-02-11 10:11:02.463000
54540000 Seeds 2021-02-11 10:11:02.990000
1049432 Valid Seeds 2021-02-11 10:11:02.990000
54550000 Seeds 2021-02-11 10:11:03.514000
1049432 Valid Seeds 2021-02-11 10:11:03.514000
54560000 Seeds 2021-02-11 10:11:04.035000
...
I guess the limited number of words available in position 12 just didn't make any valid checksums with the other available words!

█████████████████████████
████▐██▄█████████████████
████▐██████▄▄▄███████████
████▐████▄█████▄▄████████
████▐█████▀▀▀▀▀███▄██████
████▐███▀████████████████
████▐█████████▄█████▌████
████▐██▌█████▀██████▌████
████▐██████████▀████▌████
█████▀███▄█████▄███▀█████
███████▀█████████▀███████
██████████▀███▀██████████
█████████████████████████
.
BC.GAME
▄▄░░░▄▀▀▄████████
▄▄▄
██████████████
█████░░▄▄▄▄████████
▄▄▄▄▄▄▄▄▄██▄██████▄▄▄▄████
▄███▄█▄▄██████████▄████▄████
███████████████████████████▀███
▀████▄██▄██▄░░░░▄████████████
▀▀▀█████▄▄▄███████████▀██
███████████████████▀██
███████████████████▄██
▄███████████████████▄██
█████████████████████▀██
██████████████████████▄
.
..CASINO....SPORTS....RACING..
█░░░░░░█░░░░░░█
▀███▀░░▀███▀░░▀███▀
▀░▀░░░░▀░▀░░░░▀░▀
░░░░░░░░░░░░
▀██████████
░░░░░███░░░░
░░█░░░███▄█░░░
░░██▌░░███░▀░░██▌
░█░██░░███░░░█░██
░█▀▀▀█▌░███░░█▀▀▀█▌
▄█▄░░░██▄███▄█▄░░▄██▄
▄███▄
░░░░▀██▄▀


▄▄████▄▄
▄███▀▀███▄
██████████
▀███▄░▄██▀
▄▄████▄▄░▀█▀▄██▀▄▄████▄▄
▄███▀▀▀████▄▄██▀▄███▀▀███▄
███████▄▄▀▀████▄▄▀▀███████
▀███▄▄███▀░░░▀▀████▄▄▄███▀
▀▀████▀▀████████▀▀████▀▀
pooya87
Legendary
*
Offline Offline

Activity: 3486
Merit: 10653



View Profile
February 11, 2021, 04:04:47 AM
 #22

The "library" I'm using for testing the validity of the mnemonic itself should not be too bad, it's the "to_entropy()" function from the Trezor "python-mnemonic" code: https://github.com/trezor/python-mnemonic/blob/master/mnemonic/mnemonic.py#L125
That's the problem with using a library to brute force, they are not designed to be the most efficient. They are just optimized to perform one operation in reasonable time. From a quick look this part is being repeated pointlessly and it has obvious expensive processes such as the memory lookup even if it is a binary search, the conversion between the words and bits, working with strings, unnecessary checks,...
https://github.com/trezor/python-mnemonic/blob/master/mnemonic/mnemonic.py#L126-L156

What you want to do is to first convert all your words to their binary representation without having any memory issues.
13 file * 10 word each * 4 byte to hold bits = 520 byte total
Now you create the permutation of these bits and compute their hashes. Working with integers is significantly faster than working with strings.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
HCP
Legendary
*
Offline Offline

Activity: 2086
Merit: 4316

<insert witty quote here>


View Profile
February 11, 2021, 08:27:06 AM
 #23

OOPS... I had accidentally left in the "to_seed()" call... which takes the mnemonic and generates the actual seed... including the computationally expensive code:
Code:
stretched = hashlib.pbkdf2_hmac(
    "sha512", mnemonic_bytes, passphrase_bytes, PBKDF2_ROUNDS
)

If I comment that part out... it runs "a little bit" faster Wink Roll Eyes

Quote
Start: 2021-02-11 21:13:07.520000
10000 Seeds 2021-02-11 21:13:10.346000
621 Valid Seeds 2021-02-11 21:13:10.346000
20000 Seeds 2021-02-11 21:13:13.092000
1204 Valid Seeds 2021-02-11 21:13:13.092000
30000 Seeds 2021-02-11 21:13:15.903000
1832 Valid Seeds 2021-02-11 21:13:15.903000
40000 Seeds 2021-02-11 21:13:18.769000
2442 Valid Seeds 2021-02-11 21:13:18.769000
50000 Seeds 2021-02-11 21:13:21.671000
3041 Valid Seeds 2021-02-11 21:13:21.671000
...

~3secs for 10,000... or ~200,000/minute... just goes to show how "expensive" the "ENT to seed" process can be! Shocked Wink

█████████████████████████
████▐██▄█████████████████
████▐██████▄▄▄███████████
████▐████▄█████▄▄████████
████▐█████▀▀▀▀▀███▄██████
████▐███▀████████████████
████▐█████████▄█████▌████
████▐██▌█████▀██████▌████
████▐██████████▀████▌████
█████▀███▄█████▄███▀█████
███████▀█████████▀███████
██████████▀███▀██████████
█████████████████████████
.
BC.GAME
▄▄░░░▄▀▀▄████████
▄▄▄
██████████████
█████░░▄▄▄▄████████
▄▄▄▄▄▄▄▄▄██▄██████▄▄▄▄████
▄███▄█▄▄██████████▄████▄████
███████████████████████████▀███
▀████▄██▄██▄░░░░▄████████████
▀▀▀█████▄▄▄███████████▀██
███████████████████▀██
███████████████████▄██
▄███████████████████▄██
█████████████████████▀██
██████████████████████▄
.
..CASINO....SPORTS....RACING..
█░░░░░░█░░░░░░█
▀███▀░░▀███▀░░▀███▀
▀░▀░░░░▀░▀░░░░▀░▀
░░░░░░░░░░░░
▀██████████
░░░░░███░░░░
░░█░░░███▄█░░░
░░██▌░░███░▀░░██▌
░█░██░░███░░░█░██
░█▀▀▀█▌░███░░█▀▀▀█▌
▄█▄░░░██▄███▄█▄░░▄██▄
▄███▄
░░░░▀██▄▀


▄▄████▄▄
▄███▀▀███▄
██████████
▀███▄░▄██▀
▄▄████▄▄░▀█▀▄██▀▄▄████▄▄
▄███▀▀▀████▄▄██▀▄███▀▀███▄
███████▄▄▀▀████▄▄▀▀███████
▀███▄▄███▀░░░▀▀████▄▄▄███▀
▀▀████▀▀████████▀▀████▀▀
rama_rao
Newbie
*
Offline Offline

Activity: 18
Merit: 1


View Profile
February 11, 2021, 08:30:58 AM
 #24

Edit:
I have 13 text files with possible seed words (1.txt, 2.txt...13.txt). most of the files contain 4-10 words.
I'm looking for a way to brute-force the seed based on those text files.



I know it's possible bruteforce 4 missing words, but how about the following scenario:

Let's say that I don't know any of the words, but for 12 words (out of 13) I know the last one or two letters, and for two words I know the first letter.
I also know their order and even some of their length.

Is there a known tool that I can play with to solve such puzzle?

I couldn't understand the problem. Say you have 7.txt with 10 seed words. You just need two more words to make 12 word seed. After lotta combination and permutations and filtering bad seeds and getting final address and looking up on blockchain for utxo. Is that what you are doing?

There are couple of questions
1. Is each file corresponds to one wallet?
2. Do you know the positions of missing seed words in each file?
3. Are these 12 word mnemonic seeds or 15 or 24? (If they are 15 words or more and your answer to 2 is no then you can lose all hopes to crack them, most probably.)

If you can answer that, they it's kinda easy to comeup with bruteforce solutions...
Coding Enthusiast
Legendary
*
Offline Offline

Activity: 1039
Merit: 2783


Bitcoin and C♯ Enthusiast


View Profile WWW
February 11, 2021, 09:14:08 AM
Merited by HCP (1)
 #25

~3secs for 10,000... or ~200,000/minute... just goes to show how "expensive" the "ENT to seed" process can be! Shocked Wink
FWIW if I comment out the mnemonic to BIP32 seed part and just validate the checksums of each permutation in FinderOuter, it can perform ~7.2 million checks per second on my corei3 CPU.

I should point out that the reason why the overall mnemonic recovery option is slow is because the algorithm that comes after checksum validation is expensive (to derive the child keys), and also there is an issue with my implementation of Elliptic Curve Cryptography that puts a lot of pressure on GC slowing down the process.

Projects List+Suggestion box
Donate: 1Q9s or bc1q
|
|
|
FinderOuter(0.19.1)Ann-git
Denovo(0.7.0)Ann-git
Bitcoin.Net(0.26.0)Ann-git
|
|
|
BitcoinTransactionTool(0.11.0)Ann-git
WatchOnlyBitcoinWallet(3.2.1)Ann-git
SharpPusher(0.12.0)Ann-git
PawGo
Legendary
*
Offline Offline

Activity: 952
Merit: 1367


View Profile
February 11, 2021, 04:04:35 PM
 #26

Edit:
I have 13 text files with possible seed words (1.txt, 2.txt...13.txt). most of the files contain 4-10 words.
I'm looking for a way to brute-force the seed based on those text files.

I know it's possible bruteforce 4 missing words, but how about the following scenario:

Let's say that I don't know any of the words, but for 12 words (out of 13) I know the last one or two letters, and for two words I know the first letter.
I also know their order and even some of their length.

Is there a known tool that I can play with to solve such puzzle?

Do you know the addresses you are looking for?
I think I could change my https://github.com/PawelGorny/lostword to implement solver of your problem, but it all makes no sense if you do not know what you are looking for.

Yes, I know the public address.

OK, I will try to do this tomorrow.

Did you have time to do that?
Thanks!


Yes, I have prepared a new release: https://github.com/PawelGorny/lostword/releases/tag/v0.8.0
Now it works with a candidates on the given positions, for example:
Code:
POOL
bc1q0v5q36eaculyrykjnjsyuey6ctd3802ft4jdcc
6
brother
window master canal cat
?
black master remove cat
pitch
hill
pawanjain
Hero Member
*****
Offline Offline

Activity: 2716
Merit: 717


Nothing lasts forever


View Profile
February 11, 2021, 04:23:36 PM
 #27

I know it's possible bruteforce 4 missing words
Are you sure that this is possible? I know that for two electrum missing words, it can take around 20 seconds on an average pc. For three words it'll take 20*2048 = 40960 seconds which is equal with ~11.3 hours. But for 4 words... Oh boy. It'll take around 23,142 hours which is 964 days.

Whenever I am in such a situation where I have to perform the loops faster I do the below thing. Let me know if it can work well in this scenario as well.

So let's consider that 4 words will take 964 days according to you. So when we write a program to bruteforce the seed one program will take 964 days.
Now let's restrict that program to limited amount of combination and create another program with another set of combinations such that both programs have different set of combinations and are executed in parallel.
In this way both will be executed simultaneously and we will get the results in half of it's time meaning 964/2 = 482 days.

If we repeat this and create 10 such programs then the speed of results will be 10 times faster. Even if we don't run it in the same machine, we can run them in different machines.
So 964/10 = 96.4 days.

How do you find the above approach ?

Quote
Let's say that I don't know any of the words, but for 12 words (out of 13) I know the last one or two letters, and for two words I know the first letter.
I also know their order and even some of their length.
You can surely reduce it, by a lot. But still, brute forcing by not knowing 4 out of 12 words isn't meant to be found.

Well that's very true. Bruteforcing 4 words is completely waste of time unless we know a possible combination of words and the sequence in which those words might have been placed.
Otherwise it's completely a waster of time trying out so many combinations.

███████████████████████████
███████▄████████████▄██████
████████▄████████▄████████
███▀█████▀▄███▄▀█████▀███
█████▀█▀▄██▀▀▀██▄▀█▀█████
███████▄███████████▄███████
███████████████████████████
███████▀███████████▀███████
████▄██▄▀██▄▄▄██▀▄██▄████
████▄████▄▀███▀▄████▄████
██▄███▀▀█▀██████▀█▀███▄███
██▀█▀████████████████▀█▀███
███████████████████████████
.
.Duelbits.
..........UNLEASH..........
THE ULTIMATE
GAMING EXPERIENCE
DUELBITS
FANTASY
SPORTS
████▄▄█████▄▄
░▄████
███████████▄
▐███
███████████████▄
███
████████████████
███
████████████████▌
███
██████████████████
████████████████▀▀▀
███████████████▌
███████████████▌
████████████████
████████████████
████████████████
████▀▀███████▀▀
.
▬▬
VS
▬▬
████▄▄▄█████▄▄▄
░▄████████████████▄
▐██████████████████▄
████████████████████
████████████████████▌
█████████████████████
███████████████████
███████████████▌
███████████████▌
████████████████
████████████████
████████████████
████▀▀███████▀▀
/// PLAY FOR  FREE  ///
WIN FOR REAL
..PLAY NOW..
BrewMaster
Legendary
*
Offline Offline

Activity: 2114
Merit: 1292


There is trouble abrewing


View Profile
February 11, 2021, 05:26:44 PM
 #28

Whenever I am in such a situation where I have to perform the loops faster I do the below thing. Let me know if it can work well in this scenario as well.

So let's consider that 4 words will take 964 days according to you. So when we write a program to bruteforce the seed one program will take 964 days.
Now let's restrict that program to limited amount of combination and create another program with another set of combinations such that both programs have different set of combinations and are executed in parallel.
In this way both will be executed simultaneously and we will get the results in half of it's time meaning 964/2 = 482 days.

If we repeat this and create 10 such programs then the speed of results will be 10 times faster. Even if we don't run it in the same machine, we can run them in different machines.
So 964/10 = 96.4 days.

How do you find the above approach ?

what you are describing is the definition of parallelism and is used by any modern CPU that has more than one core. you don't need to write or run different programs, you just write once but split the workload between multiple CPU cores and each core performs the same task on a different data but separately and in "parallel".
CPUs can have between 2 to 64 cores.

GPUs have hundreds of cores and can run a lot more operations in parallel which is why using GPU in brute forcing is very common and the speed gain is huge.

with that in mind i don't think it is hard to recover a seed with missing 4 words.

There is a FOMO brewing...
pawanjain
Hero Member
*****
Offline Offline

Activity: 2716
Merit: 717


Nothing lasts forever


View Profile
February 11, 2021, 05:38:29 PM
 #29

what you are describing is the definition of parallelism and is used by any modern CPU that has more than one core. you don't need to write or run different programs, you just write once but split the workload between multiple CPU cores and each core performs the same task on a different data but separately and in "parallel".
CPUs can have between 2 to 64 cores.

Thats great, I'll read more kn how to execute a set of instructions on a particular core and how to divide the load in % to different cores. If you have any leads let me know.

Quote
with that in mind i don't think it is hard to recover a seed with missing 4 words.

It actually depends on various factors whether it's hard to recover the missing words or not. For instance

if you know the sequence then the possible combinations for entropy of 128 bit is
2048*12 (for one word)
2048*2048*2048*2048*12 (for 4 words)

If you dont know the sequence
2048*12! (for one word)
2048*2048*2048*2048*12! (for 4 words)
where ! is factorial

and that is too many combinations. Also, if the entropy is higher the number of combinations drastically increases.
Someone correct me if I am wrong.

███████████████████████████
███████▄████████████▄██████
████████▄████████▄████████
███▀█████▀▄███▄▀█████▀███
█████▀█▀▄██▀▀▀██▄▀█▀█████
███████▄███████████▄███████
███████████████████████████
███████▀███████████▀███████
████▄██▄▀██▄▄▄██▀▄██▄████
████▄████▄▀███▀▄████▄████
██▄███▀▀█▀██████▀█▀███▄███
██▀█▀████████████████▀█▀███
███████████████████████████
.
.Duelbits.
..........UNLEASH..........
THE ULTIMATE
GAMING EXPERIENCE
DUELBITS
FANTASY
SPORTS
████▄▄█████▄▄
░▄████
███████████▄
▐███
███████████████▄
███
████████████████
███
████████████████▌
███
██████████████████
████████████████▀▀▀
███████████████▌
███████████████▌
████████████████
████████████████
████████████████
████▀▀███████▀▀
.
▬▬
VS
▬▬
████▄▄▄█████▄▄▄
░▄████████████████▄
▐██████████████████▄
████████████████████
████████████████████▌
█████████████████████
███████████████████
███████████████▌
███████████████▌
████████████████
████████████████
████████████████
████▀▀███████▀▀
/// PLAY FOR  FREE  ///
WIN FOR REAL
..PLAY NOW..
Pages: « 1 [2]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!