Bitcoin Forum
September 07, 2025, 01:43:38 AM *
News: Latest Bitcoin Core release: 29.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2]  All
  Print  
Author Topic: A probabilistic prefix search - puzzle btc 32  (Read 623 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic. (1 post by 1+ user deleted.)
WanderingPhilospher
Sr. Member
****
Offline Offline

Activity: 1400
Merit: 271

Shooters Shoot...


View Profile
March 01, 2025, 05:41:51 PM
 #21

So since you did not answer on the other thread, I thought I would look to see if you had mentioned your method because you said it was better than the other method, and I find it lol.

Here is my point, your method may work better with a lot of luck versus the systematic approach like the other method. I feel it would have to be luck, but without running thousands of tests, I won't know for sure.

Here is what I can tell you:

I've ran almost 2,000 44 bit ranges, searching for a 44 bit mask, 44 bit leading characters of a h160. So in that aspect, our methods are similar. But, what I can tell you, the probabilities say, there should be 0 hits in a single range, but I have found 4 matches in about 30 of those ranges.

So if I were to use your method, it would find one of those matches, store it, and never search in that range again, correct? That is what I gather from reading your script. And that is the part that many can't see. Yes, you can run this method and if you are lucky, maybe you find the exact match you are looking for. But you also run the chance of missing the match you are looking for. That would always be my concern if I am searching, a serious search, not just tinkering, to find the key to any of the puzzles / challenges.

Where as the other method, chooses random ranges, runs them 100%, so no keys or ranges are skipped, and on average, finds the exact match / key at 50%.

So if one has or is going to invest some decent $ into the challenge, what method would you suggest they use??

I understand that you have doubts. My current approach is to search for prefixes using a random+sequential method (these ranges go to the database to be omitted). If I find prefixes based on the low probability of the prefix (length), I omit a percentage around the prefix (according to its length) and continue.

Statistically, it is more likely to find the target than to omit it, but if that is the case, the ranges are recalculated to a smaller percentage and continue until it is found.

In the end, you prioritize the mathematical probabilities, but you could also cover the entire range without leaving a gap. It's all a matter of perspectives. You might ask, will the database be overloaded with data? It depends, since the hashtable prioritizes high matches and avoids overlapping ranges.

For example, if a range is 1:100 and another comes as 90:150, the hashtable would unite them in 1:150, so eventually the database will reduce its weight.

In conclusion, we are searching in the entire range while prioritizing the mathematical probabilities first.

I will give you this, this is a good way to search if you only have a CPU. I like the concept, my concern is how small to make the initial ranges that will be omitted, as to not miss the key in the first run lol. It can be done I guess with large match and smaller omits, i.e. something like 40 bit matches with 24 bit omits, just as an example; probably needs more anaylis over a decent sized range, and then run the same in several other same size ranges, to come up with a better average.

So in the current script, after 100% range search, the database/script auto recalculates, or one would have to manually do that with the create.py script?
kTimesG
Full Member
***
Offline Offline

Activity: 588
Merit: 198


View Profile
March 01, 2025, 05:55:11 PM
 #22

I've ran almost 2,000 44 bit ranges, searching for a 44 bit mask, 44 bit leading characters of a h160. So in that aspect, our methods are similar. But, what I can tell you, the probabilities say, there should be 0 hits in a single range, but I have found 4 matches in about 30 of those ranges.

That's not abnormal, it's actually exactly what one would expect.

Code:
# Prob. to get exactly 4 matches, with leading 44 bits of 0, in 2**44 trials
p4 = binom.pmf(4, 2**44, 1/2**44)

# Number of times that would happen, in 2000 experiments
n = p4 * 2000
print(n)            # 30.66

Now, are you sure you haven't also found 6 or 7 ranges, having 5 or more matches? Smiley That would be abnormal.

Off the grid, training pigeons to broadcast signed messages.
mcdouglasx (OP)
Sr. Member
****
Offline Offline

Activity: 756
Merit: 398



View Profile WWW
March 01, 2025, 06:08:35 PM
Last edit: May 12, 2025, 08:25:19 AM by mprep
 #23

I will give you this, this is a good way to search if you only have a CPU. I like the concept, my concern is how small to make the initial ranges that will be omitted, as to not miss the key in the first run lol. It can be done I guess with large match and smaller omits, i.e. something like 40 bit matches with 24 bit omits, just as an example; probably needs more anaylis over a decent sized range, and then run the same in several other same size ranges, to come up with a better average.

So in the current script, after 100% range search, the database/script auto recalculates, or one would have to manually do that with the create.py script?

Yes, you're right, it would be better to search for long matches to avoid omitting at the first try. Although it is less likely, it is possible. This script is basically a prototype. I am now editing it based on Keyhunt to present the complete idea, covering all the previously mentioned points. I haven't yet included automatic percentage reduction; I would do it manually. However, automating it is a good idea. It hadn't occurred to me because the case hasn't presented itself in the tests so far.

example:

./keyhunt -m vanity -t 4 -l compress -R -r 100000000000000000:200000000000000000 -v 61eb8a50c86b0584bb727dd65bed8d2400d6d5aa -n 0x10000000 -pct 30 -pm 8

pct= Percentage to omit according to the length of the prefix.

pm= Minimum number of h160 prefixes to consider.

n= Number of sequential keys starting from the random. (0x1000000....0x1000000000000).



I've ran almost 2,000 44 bit ranges, searching for a 44 bit mask, 44 bit leading characters of a h160. So in that aspect, our methods are similar. But, what I can tell you, the probabilities say, there should be 0 hits in a single range, but I have found 4 matches in about 30 of those ranges.

That's not abnormal, it's actually exactly what one would expect.

Code:
# Prob. to get exactly 4 matches, with leading 44 bits of 0, in 2**44 trials
p4 = binom.pmf(4, 2**44, 1/2**44)

# Number of times that would happen, in 2000 experiments
n = p4 * 2000
print(n)            # 30.66

Now, are you sure you haven't also found 6 or 7 ranges, having 5 or more matches? Smiley That would be abnormal.

One thing is the probability of longer prefixes sneaking in, and another is the probability of one of those being the target. They are not the same, so we cannot generalize, as it messes up the calculations.

[moderator's note: consecutive posts merged]

▄▄█████████████████▄▄
▄█████████████████████▄
███▀▀█████▀▀░░▀▀███████

██▄░░▀▀░░▄▄██▄░░█████
█████░░░████████░░█████
████▌░▄░░█████▀░░██████
███▌░▐█▌░░▀▀▀▀░░▄██████
███░░▌██░░▄░░▄█████████
███▌░▀▄▀░░█▄░░█████████
████▄░░░▄███▄░░▀▀█▀▀███
██████████████▄▄░░░▄███
▀█████████████████████▀
▀▀█████████████████▀▀
Rainbet.com
CRYPTO CASINO & SPORTSBOOK
|
█▄█▄█▄███████▄█▄█▄█
███████████████████
███████████████████
███████████████████
█████▀█▀▀▄▄▄▀██████
█████▀▄▀████░██████
█████░██░█▀▄███████
████▄▀▀▄▄▀███████
█████████▄▀▄███
█████████████████
███████████████████
██████████████████
███████████████████
 
 $20,000 
WEEKLY RAFFLE
|



█████████
█████████ ██
▄▄█░▄░▄█▄░▄░█▄▄
▀██░▐█████▌░██▀
▄█▄░▀▀▀▀▀░▄█▄
▀▀▀█▄▄░▄▄█▀▀▀
▀█▀░▀█▀
10K
WEEKLY
RACE
100K
MONTHLY
RACE
|

██









█████
███████
███████
█▄
██████
████▄▄
█████████████▄
███████████████▄
░▄████████████████▄
▄██████████████████▄
███████████████▀████
██████████▀██████████
██████████████████
░█████████████████▀
░░▀███████████████▀
████▀▀███
███████▀▀
████████████████████   ██
 
[..►PLAY..]
 
████████   ██████████████
WanderingPhilospher
Sr. Member
****
Offline Offline

Activity: 1400
Merit: 271

Shooters Shoot...


View Profile
March 01, 2025, 06:56:33 PM
 #24

I've ran almost 2,000 44 bit ranges, searching for a 44 bit mask, 44 bit leading characters of a h160. So in that aspect, our methods are similar. But, what I can tell you, the probabilities say, there should be 0 hits in a single range, but I have found 4 matches in about 30 of those ranges.

That's not abnormal, it's actually exactly what one would expect.

Code:
# Prob. to get exactly 4 matches, with leading 44 bits of 0, in 2**44 trials
p4 = binom.pmf(4, 2**44, 1/2**44)

# Number of times that would happen, in 2000 experiments
n = p4 * 2000
print(n)            # 30.66

Now, are you sure you haven't also found 6 or 7 ranges, having 5 or more matches? Smiley That would be abnormal.

Hmmm, let me rephrase lol.

First, I would have to go back and check if any found 5+ matches.

I guess what I was trying to say, of 2,000 ranges searched, 1,500 of them found nothing. So that could or would put this type of script in an endless loop, possibly.
kTimesG
Full Member
***
Offline Offline

Activity: 588
Merit: 198


View Profile
March 01, 2025, 07:15:31 PM
 #25

Hmmm, let me rephrase lol.

First, I would have to go back and check if any found 5+ matches.

I guess what I was trying to say, of 2,000 ranges searched, 1,500 of them found nothing. So that could or would put this type of script in an endless loop, possibly.

It would be way, way off if you indeed had 1500 of them with 0 matches. To the point of calling the uniformness of the distribution biased with a very high degree of confidence. For 2000 runs, we should get between 1880 and 2120 matches in total (99.9% confidence). You only have a few hundred? That's infinitely small to even compute the chances lol.

The binomial distribution according to your parameters should look something like this:

0 matches: 736 ranges.
1 matches: 736 ranges
2 matches: 368 ranges
3 matches: 123 ranges
4 matches:  31 ranges
5 matches:  6 ranges
> 5 matches: 1 ranges

Are you sure the mask is correct?

Off the grid, training pigeons to broadcast signed messages.
WanderingPhilospher
Sr. Member
****
Offline Offline

Activity: 1400
Merit: 271

Shooters Shoot...


View Profile
March 01, 2025, 11:29:41 PM
Last edit: March 02, 2025, 03:49:33 AM by WanderingPhilospher
 #26

Hmmm, let me rephrase lol.

First, I would have to go back and check if any found 5+ matches.

I guess what I was trying to say, of 2,000 ranges searched, 1,500 of them found nothing. So that could or would put this type of script in an endless loop, possibly.

It would be way, way off if you indeed had 1500 of them with 0 matches. To the point of calling the uniformness of the distribution biased with a very high degree of confidence. For 2000 runs, we should get between 1880 and 2120 matches in total (99.9% confidence). You only have a few hundred? That's infinitely small to even compute the chances lol.

The binomial distribution according to your parameters should look something like this:

0 matches: 736 ranges.
1 matches: 736 ranges
2 matches: 368 ranges
3 matches: 123 ranges
4 matches:  31 ranges
5 matches:  6 ranges
> 5 matches: 1 ranges

Are you sure the mask is correct?

44 and 44, h160 mask and range size. 2,061 ranges ran and only matches found in 620 of the ranges.

Edit: A total of 738 found matches.
kTimesG
Full Member
***
Offline Offline

Activity: 588
Merit: 198


View Profile
March 02, 2025, 09:00:35 AM
Last edit: March 02, 2025, 09:15:35 AM by kTimesG
 #27

44 and 44, h160 mask and range size. 2,061 ranges ran and only matches found in 620 of the ranges.

Edit: A total of 738 found matches.

That is absolutely unbelievable. I'm not saying it can't happen, though I'd rather think something went wrong with your experiment.

If you ran 2000 different ranges, each with a size of 2**44, and counted the h160 that start with 44 bits of 0 (or whatever other mask), we would expect, on average, a total of 2000 matches.

The chances to get 738 matches (plus all the chances to get anything below 738) is 6.629921781279535e-231. That's 231 fractional digits of 0.

So if your data (and procedure) is correct it would mean the probability to get 44 bits of 0 is not 1 in 2**44. With a confidence of 100% minus the 10**-231 number above.

LE: This 738 number looks awfully close to the expected number of matches for "0 found" and "1 found" above. Are you counting stuff correctly?

Off the grid, training pigeons to broadcast signed messages.
WanderingPhilospher
Sr. Member
****
Offline Offline

Activity: 1400
Merit: 271

Shooters Shoot...


View Profile
March 02, 2025, 09:32:27 PM
 #28

44 and 44, h160 mask and range size. 2,061 ranges ran and only matches found in 620 of the ranges.

Edit: A total of 738 found matches.

That is absolutely unbelievable. I'm not saying it can't happen, though I'd rather think something went wrong with your experiment.

If you ran 2000 different ranges, each with a size of 2**44, and counted the h160 that start with 44 bits of 0 (or whatever other mask), we would expect, on average, a total of 2000 matches.

The chances to get 738 matches (plus all the chances to get anything below 738) is 6.629921781279535e-231. That's 231 fractional digits of 0.

So if your data (and procedure) is correct it would mean the probability to get 44 bits of 0 is not 1 in 2**44. With a confidence of 100% minus the 10**-231 number above.

LE: This 738 number looks awfully close to the expected number of matches for "0 found" and "1 found" above. Are you counting stuff correctly?

So in a range of 2,000 different ranges, the average should be as you stated above. If these were consecutive ranges, then I would be surprised as well. But, since they are not consecutive, and spread out all over a larger range, I am not. I imagine if I completed the larger range, the numbers would come close to the average.

That is why I try telling people, like Bibli, McD, and others, that averages are just that, averages. Unless you do tests to see if it holds true over x amount of ranges, spread all over a larger range, how can one know how much or less to "skip". I probably could run 2,000 different ranges, and the numbers would be different, maybe even closer to the expected averages.

I guess my PC's ability to pick random ranges where less matches will be found, is pretty dang good lol.
mcdouglasx (OP)
Sr. Member
****
Offline Offline

Activity: 756
Merit: 398



View Profile WWW
March 03, 2025, 05:44:01 PM
 #29

That is why I try telling people, like Bibli, McD, and others, that averages are just that, averages. Unless you do tests to see if it holds true over x amount of ranges, spread all over a larger range, how can one know how much or less to "skip". I probably could run 2,000 different ranges, and the numbers would be different, maybe even closer to the expected averages.

Your mistake is in minimizing the use of statistical and probabilistic methods. Saying that it’s 'just an average' is catastrophic in mathematics, because searching within random ranges combined with sequential methods (like pools) can lead you to scan 90% of the entire range without hitting the target. What I'm saying is that to validate your approach, you focus on 'the worst-case scenario,' which is highly improbable and a flawed idea. I see nothing wrong with a probabilistic search that covers all failure points, such as losing progress or skipping the target. My approach perfectly covers these points because, in the 'worst-case scenario,' I will always find the target 100% of the time.

▄▄█████████████████▄▄
▄█████████████████████▄
███▀▀█████▀▀░░▀▀███████

██▄░░▀▀░░▄▄██▄░░█████
█████░░░████████░░█████
████▌░▄░░█████▀░░██████
███▌░▐█▌░░▀▀▀▀░░▄██████
███░░▌██░░▄░░▄█████████
███▌░▀▄▀░░█▄░░█████████
████▄░░░▄███▄░░▀▀█▀▀███
██████████████▄▄░░░▄███
▀█████████████████████▀
▀▀█████████████████▀▀
Rainbet.com
CRYPTO CASINO & SPORTSBOOK
|
█▄█▄█▄███████▄█▄█▄█
███████████████████
███████████████████
███████████████████
█████▀█▀▀▄▄▄▀██████
█████▀▄▀████░██████
█████░██░█▀▄███████
████▄▀▀▄▄▀███████
█████████▄▀▄███
█████████████████
███████████████████
██████████████████
███████████████████
 
 $20,000 
WEEKLY RAFFLE
|



█████████
█████████ ██
▄▄█░▄░▄█▄░▄░█▄▄
▀██░▐█████▌░██▀
▄█▄░▀▀▀▀▀░▄█▄
▀▀▀█▄▄░▄▄█▀▀▀
▀█▀░▀█▀
10K
WEEKLY
RACE
100K
MONTHLY
RACE
|

██









█████
███████
███████
█▄
██████
████▄▄
█████████████▄
███████████████▄
░▄████████████████▄
▄██████████████████▄
███████████████▀████
██████████▀██████████
██████████████████
░█████████████████▀
░░▀███████████████▀
████▀▀███
███████▀▀
████████████████████   ██
 
[..►PLAY..]
 
████████   ██████████████
WanderingPhilospher
Sr. Member
****
Offline Offline

Activity: 1400
Merit: 271

Shooters Shoot...


View Profile
March 03, 2025, 08:37:42 PM
 #30

That is why I try telling people, like Bibli, McD, and others, that averages are just that, averages. Unless you do tests to see if it holds true over x amount of ranges, spread all over a larger range, how can one know how much or less to "skip". I probably could run 2,000 different ranges, and the numbers would be different, maybe even closer to the expected averages.

Your mistake is in minimizing the use of statistical and probabilistic methods. Saying that it’s 'just an average' is catastrophic in mathematics, because searching within random ranges combined with sequential methods (like pools) can lead you to scan 90% of the entire range without hitting the target. What I'm saying is that to validate your approach, you focus on 'the worst-case scenario,' which is highly improbable and a flawed idea. I see nothing wrong with a probabilistic search that covers all failure points, such as losing progress or skipping the target. My approach perfectly covers these points because, in the 'worst-case scenario,' I will always find the target 100% of the time.
I do not understand what you are saying, to be honest.

The pool's approach will find the target 100% of the time as well.

I haven't ran the numbers, but I am not so sure your current method would find it 100% of time. I've seen some flaws in the script and have made some adjustments so that it will indeed find the key 100% of the time. I know you will say yours is a prototype, which is fine, this is me just giving some feedback, your thread/script is wasting considerable time as it is currently written and I don't think it will find the key 100% of the time. There are holes, like an abyss, that it could fall into.

The other thing I noticed, with your default settings, you are finding a small prefix (bit wise) and adding and subtracting a much larger range (bit wise) on both sides.

I think you have created a cool script!  It's had me thinking about different ways to approach it/make it better. As I stated, if I was in a CPU only search mode, I would use this script.

My whole point in talking about skipping any range, based on stats or probabilities, is that I would not do it, if I had a lot of $ invested. Especially, investors money. I would do the old boring brute force method, because averages are just that, averages.
mcdouglasx (OP)
Sr. Member
****
Offline Offline

Activity: 756
Merit: 398



View Profile WWW
March 04, 2025, 08:28:26 PM
 #31

That is why I try telling people, like Bibli, McD, and others, that averages are just that, averages. Unless you do tests to see if it holds true over x amount of ranges, spread all over a larger range, how can one know how much or less to "skip". I probably could run 2,000 different ranges, and the numbers would be different, maybe even closer to the expected averages.

Your mistake is in minimizing the use of statistical and probabilistic methods. Saying that it’s 'just an average' is catastrophic in mathematics, because searching within random ranges combined with sequential methods (like pools) can lead you to scan 90% of the entire range without hitting the target. What I'm saying is that to validate your approach, you focus on 'the worst-case scenario,' which is highly improbable and a flawed idea. I see nothing wrong with a probabilistic search that covers all failure points, such as losing progress or skipping the target. My approach perfectly covers these points because, in the 'worst-case scenario,' I will always find the target 100% of the time.
I do not understand what you are saying, to be honest.

The pool's approach will find the target 100% of the time as well.

I haven't ran the numbers, but I am not so sure your current method would find it 100% of time. I've seen some flaws in the script and have made some adjustments so that it will indeed find the key 100% of the time. I know you will say yours is a prototype, which is fine, this is me just giving some feedback, your thread/script is wasting considerable time as it is currently written and I don't think it will find the key 100% of the time. There are holes, like an abyss, that it could fall into.

The other thing I noticed, with your default settings, you are finding a small prefix (bit wise) and adding and subtracting a much larger range (bit wise) on both sides.

I think you have created a cool script!  It's had me thinking about different ways to approach it/make it better. As I stated, if I was in a CPU only search mode, I would use this script.

My whole point in talking about skipping any range, based on stats or probabilities, is that I would not do it, if I had a lot of $ invested. Especially, investors money. I would do the old boring brute force method, because averages are just that, averages.

I always do this: I propose an idea and create a script as a kind of sketch or auxiliary material so that the idea has a base or starting point that makes it easier to understand. This one, the published one, does not save the scanned ranges, which should be a fundamental requirement. It also does not recalculate the percentages automatically. My main idea was that, in a dataset of length X, the probabilities of a prefix being close to another increase according to its length. It seems somewhat unreasonable, but it's a fact. It is unlikely that the prefixes you find relatively close are precisely the target. Imagine taking 1000 needles and painting 1 gold, then distributing them uniformly in a haystack. Although we know that one of the needles is closer to another than the average, it does not mean that this happens to the golden needle because it generally has a higher chance of being within the expected average.

▄▄█████████████████▄▄
▄█████████████████████▄
███▀▀█████▀▀░░▀▀███████

██▄░░▀▀░░▄▄██▄░░█████
█████░░░████████░░█████
████▌░▄░░█████▀░░██████
███▌░▐█▌░░▀▀▀▀░░▄██████
███░░▌██░░▄░░▄█████████
███▌░▀▄▀░░█▄░░█████████
████▄░░░▄███▄░░▀▀█▀▀███
██████████████▄▄░░░▄███
▀█████████████████████▀
▀▀█████████████████▀▀
Rainbet.com
CRYPTO CASINO & SPORTSBOOK
|
█▄█▄█▄███████▄█▄█▄█
███████████████████
███████████████████
███████████████████
█████▀█▀▀▄▄▄▀██████
█████▀▄▀████░██████
█████░██░█▀▄███████
████▄▀▀▄▄▀███████
█████████▄▀▄███
█████████████████
███████████████████
██████████████████
███████████████████
 
 $20,000 
WEEKLY RAFFLE
|



█████████
█████████ ██
▄▄█░▄░▄█▄░▄░█▄▄
▀██░▐█████▌░██▀
▄█▄░▀▀▀▀▀░▄█▄
▀▀▀█▄▄░▄▄█▀▀▀
▀█▀░▀█▀
10K
WEEKLY
RACE
100K
MONTHLY
RACE
|

██









█████
███████
███████
█▄
██████
████▄▄
█████████████▄
███████████████▄
░▄████████████████▄
▄██████████████████▄
███████████████▀████
██████████▀██████████
██████████████████
░█████████████████▀
░░▀███████████████▀
████▀▀███
███████▀▀
████████████████████   ██
 
[..►PLAY..]
 
████████   ██████████████
Pages: « 1 [2]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!