Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: mcdouglasx on February 14, 2025, 08:32:40 PM



Title: A probabilistic prefix search - puzzle btc 32
Post by: mcdouglasx on February 14, 2025, 08:32:40 PM

This technique is based on the compound probability that a specific prefix appears in a set or range of hashes, discarding the less probable percentages.

requirements:

secp256k1

https://github.com/iceland2k14/secp256k1

using ProbPrefix.py

https://github.com/Mcdouglas-X/ProbPrefix

This Script make infinite loop continuously generates random private keys within the specified range and converts them to hashed addresses.

If the random number falls within a previously checked range, it skips to the next iteration.

The address generated from the random private key is compared to the target.

If an exact match is found, the private key and address are written to TargetFound.txt and the loop breaks.

If a partial match is found (determined by the chk_p function), the range around the random number is calculated and skipped in the future by adding it to the HashTable.

This script essentially performs a brute-force search for a hash that matches a given target hash. It uses ranges to optimize the search by skipping areas that have already been checked.

using CreateRanges.py

In case of having a series of pre-stored prefixes, they are added to the script by copying their private keys to "range.txt".This iterates over private keys obtained from the "range.txt" file, creating a new "Ht.bin" if it does not exist. The found prefixes are added to ranges in case you want to generate a new database using other percentages.

Disclaimer: This script is only a demonstration, based on my own statistics which may seem counterintuitive due to the depth of probabilities. It is not intended to be fast either. Create your own version in C and Cuda if you desire an optimal environment. I do not have the resources to explore with Cuda.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: the crack on February 15, 2025, 09:47:09 AM
how are u sure if the partial match exist the target is in that range?


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: jovica888 on February 15, 2025, 02:07:54 PM
I just can not understand what this code represent... And how to use it


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: clonwu on February 15, 2025, 02:16:17 PM
Give a demonstration


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: mcdouglasx on February 15, 2025, 06:36:08 PM
how are u sure if the partial match exist the target is in that range?


You can't be sure of anything in probabilistic software. The less probable ranges are discarded, but saying 'it's less probable' doesn't guarantee your success 100%.

I just can not understand what this code represent... And how to use it

It's just a probabilistic script that omits certain sub-ranges where a prefix has already appeared.

Give a demonstration

Given that the probability of obtaining the first 4 digits of a hex "abcd" is 1/65536, the probability of finding 5 repeated prefixes "abcde" within the same range occurs with at least a 3% frequency. Therefore, if we discard 65536 keys around a 5-digit prefix, we have a 97% probability that the prefix is not within the omitted range.

This takes into account all prefixes and combinations within a given range. However, when choosing a specific prefix, the probabilities of approximation are much lower, as you add a known factor or element to the statistics and probabilities.


As you increase the length of the prefix n, the probability of two prefixes of n characters colliding nearby decreases.

Of all the possible combinations of 5 prefixes (16**5=1,048,576) in 65,537 keys, you get approximately 2000 collisions.


Test script:

Code:
import secp256k1 as ice
import random

pref_dict = {}
rand = random.randint(1, 2**256)

for i in range(1, 65537):
    x = ice.privatekey_to_h160(0, 1, rand + i).hex()
    prefix = x[:5]
    sav = str(x) + " = " + str(i) + "\n"
    if prefix not in pref_dict:
        pref_dict[prefix] = []
    pref_dict[prefix].append(sav)

entries = []
for prefix in sorted(pref_dict):
    if len(pref_dict[prefix]) > 1:
        entries.extend(pref_dict[prefix])

entries.sort()

with open('output.txt', 'w') as f:
    for entry in entries:
        f.write(entry)


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: the crack on February 15, 2025, 06:53:51 PM
so u trying to find prefix hex of the address?


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: kTimesG on February 15, 2025, 07:39:48 PM
Given that the probability of obtaining the first 4 digits of a hex "abcd" is 1/65536, the probability of finding 5 repeated prefixes "abcde" within the same range occurs with at least a 3% frequency.

False. It's basically 0% likely to encounter at least 5 "abcde" prefixes in a 65536 range.
What is true: it is 93.94% likely to not encounter any "abcde" prefix at all in a 65536 range.
Hence, 6.06% likely to encounter it at least once.
But, only 0.19% likely to find it more than once.

Where did you end up with the 3%?

Therefore, if we discard 65536 keys around a 5-digit prefix, we have a 97% probability that the prefix is not within the omitted range.

You have a 100% probability that the prefix is in the range, because it's sitting in the middle of it. It was just found, or is the range not "around a 5-digit prefix"?

One would say that you are mixing formulas that relate to birthday paradox (collide any two persons) with the formulas that relate to finding a specific person, and calling this pruning as valid. It is not, neither from a probabilistic or logical perspective. But good luck with your experiments!


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: mcdouglasx on February 15, 2025, 08:11:06 PM
so u trying to find prefix hex of the address?

yes, prefix h160.

Given that the probability of obtaining the first 4 digits of a hex "abcd" is 1/65536, the probability of finding 5 repeated prefixes "abcde" within the same range occurs with at least a 3% frequency.

False. It's basically 0% likely to encounter at least 5 "abcde" prefixes in a 65536 range.
What is true: it is 93.94% likely to not encounter any "abcde" prefix at all in a 65536 range.
Hence, 6.06% likely to encounter it at least once.
But, only 0.19% likely to find it more than once.

Where did you end up with the 3%?

Therefore, if we discard 65536 keys around a 5-digit prefix, we have a 97% probability that the prefix is not within the omitted range.

You have a 100% probability that the prefix is in the range, because it's sitting in the middle of it. It was just found, or is the range not "around a 5-digit prefix"?

One would say that you are mixing formulas that relate to birthday paradox (collide any two persons) with the formulas that relate to finding a specific person, and calling this pruning as valid. It is not, neither from a probabilistic or logical perspective. But good luck with your experiments!

You fall into the same negligence as the mathematicians against Marilyn vos Savant. 3% of 65536 is almost 2000, which is approximately the number of times 5 identical prefixes collide in 65536 keys. The script above demonstrates it, did you bother to try it? The rest of your questioning is a misinterpretation of the statistics that AI is not yet capable of understanding. That is to say, if the Monty Hall paradox were eliminated from AI and the internet, AI would never have given the solution because it is counterintuitive.

Similarly, you made it clear in another post that we will never agree on this, so I don't know why you want to continue an endless debate. Where I believe I'm right, and you do too, it would be a waste of our time.

edit:

I don't have to prove anything. You have the right to question which part of my script is wrong, and that's why I leave it open for everyone. @ktimesg, I'm sorry, but your message was deleted for saying 'I don't need to run your script.' It is bad conduct to refute by assuming this or that without testing it. Science and rigorous analysis require not only presenting solid arguments but also supporting them with evidence and proof. From my side, I only ask readers to refute with solid arguments, evidence, and respect.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: jovica888 on February 16, 2025, 01:20:59 AM
I was making a similar thing for puzzle 135... I put in babystep file only public keys that starts with "145d" and sometimes I had 2000 keys between 2 "145d" points and sometimes I had 120.000 keys between 2 "145d"...

So in average it will be 65.000 but I can not skip 65.000 keys each time when I find "145d" point


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: mcdouglasx on February 16, 2025, 02:03:26 AM
I was making a similar thing for puzzle 135... I put in babystep file only public keys that starts with "145d" and sometimes I had 2000 keys between 2 "145d" points and sometimes I had 120.000 keys between 2 "145d"...

So in average it will be 65.000 but I can not skip 65.000 keys each time when I find "145d" point

I haven't tested with public keys because, although they have similarities with a hash, they are not exactly a hash and can behave differently. You need to do exhaustive tests based on the distribution over public keys, but you cannot rely on low prefix matches to omit; you need to know the rest of the chain. My script examines in reverse 21 prefixes from most to least to make the calculation with the highest match and does not omit 65,000, but rather a smaller percentage depending on the prefix length.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: kTimesG on February 16, 2025, 01:37:25 PM
No worries, I'm not disturbed by having my post deleted.

Let's try this a different way. And, for the love of God, I hope you won't insist it's AI-powered nonsense.

You ended up at some ~2000 number of collisions of 5-char prefixes, on the basis that 2**20 (total possible combinations) prefixes are spread around 65537 keys. Is this where the 3% comes up from?

The problem with this is that this is not how a uniform distribution works. Check this out:

- p = probability of success: 1/2**20
- k = number of keys: 2**20

Since p = 1/k (basically, you randomly select k values), the probability of not encountering a specific key AT ALL in all of the 2**20 trials is 36.6%. You can check this value using CDF, Poisson, real tests, simulations, and whatever you wish. It's also the value you get as p and k approach 0+, respectively infinity (e.g., infinite amount of tries, with an infinitesimally small, but positive chance of success):

And yes,  there is an exact value for this, derived from computing the limit of the num_successes / num_possibilities formula. It equals 1/e after some math transformations.

Logically, since the sum of all probabilities must be 1, there is a 63.4% probability to have more than zero encounters (single + repeats).

An uniform distribution does not imply that you get an average amount of each item, when running over some whatever length number of trials. It just means that ANY element has an EQUAL probability of occurring. I think this is where you insist on having it interpreted the other way around. A uniform distribution will definitely never spread evenly and totally over a range that has the same size as the distribution. It would be in conflict with all the probabilities discussed so far, and that cannot happen. Think about it: there exists just ONE case (in a very, very large amount of cases) where you would get an exact even amount of each item.

I hope that this makes sense to you, why you cannot simply "average out" all the elements of some set, and continue flawed logic forwardly, based on this.

And while you may continue to believe that just because you have some collisions of any keys  can ever help you find a specific key, this is not the case, because the key you are targeting may either be missing, appear once, twice, or a lot of times. You can't apply the answer to one question, to have a response for a totally different question.

It's OK if you do not agree with any of these. Wish you well.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: mcdouglasx on February 16, 2025, 02:53:30 PM
snip~

Your problem lies in using a basic theoretical framework for your explanations. The probability of a prefix of a hash, a hex pubkey secp2556k1 is different because it is dependent, or a set of hashes, so in probability, you cannot be simplistic. It requires empirical evidence, not just theorizing. The mathematics of probabilities challenges our initial assumptions. It's as if there is an invisible connection between unrelated individual events that come together like pieces of a complete puzzle. As I mentioned previously, refute the code and obtain evidence, not just theories. Like what happens with RetiredCoder, you never support your arguments with tangible things, only theories. Even conspiracy theorists and flat-earthers have theories that convince people, just like religions. Let's not turn mathematics into a preaching religion; it is a ridiculously anti-scientific mistake.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: kTimesG on February 16, 2025, 04:52:25 PM
Code:
import os
from collections import defaultdict

from scipy.stats import binom

total_values = 2**20
total_tries = total_values

obs = defaultdict(int)

for _ in range(total_tries):
    prefix = int.from_bytes(os.urandom(3)) >> 4
    obs[prefix] += 1

num_distinct_obs = len(obs)
print(f'Unique observed values: {num_distinct_obs} ({num_distinct_obs / total_values:%})')

num_zero_encounters = total_values - num_distinct_obs

# compute final histogram of frequencies
hf = [0] * (1 + max(obs.values()))
hf[0] = num_zero_encounters
for f in obs.values():
    hf[f] += 1

# compute THEORETICAL histogram
tf = [0] * len(hf)
for f in range(len(tf)):
    # compute prob to get EXACTLY the specified amount of encounters
    tf[f] = binom.pmf(f, total_tries, 1 / total_values)

# display EMPIRICAL vs THEORETICAL
for f, v in enumerate(hf):
    print(f'Values observed {f} times: {v} ({v / total_values:%})')
    print(f'      predicted {f} times: {tf[f] * total_values:.0f} ({tf[f]:%})')

Results:

Code:
Unique observed values: 662854 (63.214684%)
Values observed 0 times: 385722 (36.785316%)
      predicted 0 times: 385749 (36.787927%)
Values observed 1 times: 385906 (36.802864%)
      predicted 1 times: 385750 (36.787962%)
Values observed 2 times: 192713 (18.378544%)
      predicted 2 times: 192875 (18.393981%)
Values observed 3 times: 64303 (6.132412%)
      predicted 3 times: 64292 (6.131321%)
Values observed 4 times: 16077 (1.533222%)
      predicted 4 times: 16073 (1.532827%)
Values observed 5 times: 3203 (0.305462%)
      predicted 5 times: 3215 (0.306565%)
Values observed 6 times: 566 (0.053978%)
      predicted 6 times: 536 (0.051094%)
Values observed 7 times: 73 (0.006962%)
      predicted 7 times: 77 (0.007299%)
Values observed 8 times: 12 (0.001144%)
      predicted 8 times: 10 (0.000912%)
Values observed 9 times: 1 (0.000095%)
      predicted 9 times: 1 (0.000101%)

Process finished with exit code 0

What gives? Theory checks out OK, AFAICS. Your turn?...

---

HIDDEN CONNECTION IN DOUBLE HASHES

Code:
for _ in range(total_tries):
    # OK, fuck random integers, let's use a hash160 of 32 bytes
    # prefix = int.from_bytes(os.urandom(3)) >> 4
    sha = hashlib.sha256(os.urandom(32)).digest()
    h160 = hashlib.new('ripemd160', sha).digest()
    prefix = int.from_bytes(h160[:3]) >> 4
    obs[prefix] += 1

Results:

Code:
Unique observed values: 663267 (63.254070%)
Values observed 0 times: 385309 (36.745930%)
      predicted 0 times: 385749 (36.787927%)
Values observed 1 times: 386338 (36.844063%)
      predicted 1 times: 385750 (36.787962%)
Values observed 2 times: 193101 (18.415546%)
      predicted 2 times: 192875 (18.393981%)
Values observed 3 times: 63827 (6.087017%)
      predicted 3 times: 64292 (6.131321%)
Values observed 4 times: 16182 (1.543236%)
      predicted 4 times: 16073 (1.532827%)
Values observed 5 times: 3188 (0.304031%)
      predicted 5 times: 3215 (0.306565%)
Values observed 6 times: 546 (0.052071%)
      predicted 6 times: 536 (0.051094%)
Values observed 7 times: 71 (0.006771%)
      predicted 7 times: 77 (0.007299%)
Values observed 8 times: 12 (0.001144%)
      predicted 8 times: 10 (0.000912%)
Values observed 9 times: 2 (0.000191%)
      predicted 9 times: 1 (0.000101%)

Disclaimer: no, I did not try to hash actual public keys.  ;D


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: mcdouglasx on February 16, 2025, 05:27:46 PM
snip~

I'm sorry, this does not represent the reality of my script, only the reality of independent hashes (without external factors). Do you know how to refute an idea? It seems you just want to force your reasoning to be accepted. Do you also realize that you unintentionally endorse that it is less likely for two prefixes to be "close" as the number of characters (prefixes) to search increases? So, what is it that you disagree with? Or are you just trying to annoy? Nothing more. I feel a bit embarrassed for you, bro.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: kTimesG on February 16, 2025, 06:37:24 PM
It seems you just want to force your reasoning to be accepted. Do you also realize that you unintentionally endorse that it is less likely for two prefixes to be "close" as the number of characters (prefixes) to search increases? So, what is it that you disagree with?

Please do not put words or ideas in my mouth, I never endorsed or stated anything even remotely similar to what you mention above. That is YOUR interpretation of things (I don't really care how you ended up at this conclusion), but please keep it to yourself, if that is what you are left with from everything that I stated. Thank you!


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: mcdouglasx on February 16, 2025, 07:24:39 PM
It seems you just want to force your reasoning to be accepted. Do you also realize that you unintentionally endorse that it is less likely for two prefixes to be "close" as the number of characters (prefixes) to search increases? So, what is it that you disagree with?

Please do not put words or ideas in my mouth, I never endorsed or stated anything even remotely similar to what you mention above. That is YOUR interpretation of things (I don't really care how you ended up at this conclusion), but please keep it to yourself, if that is what you are left with from everything that I stated. Thank you!

I base it on the results of your own code, I am not putting words in your mouth; it is what you are measuring in your example: collisions of prefixes in hashes and, the longer the length, the less likely it is to find two hashes with identical prefixes, or am I wrong? What I don't understand is why you dislike the idea of probabilistic software that omits less likely ranges. It is just that: 'statistical software'.

Code:
import os
from collections import defaultdict

from scipy.stats import binom

total_values = 2**20
total_tries = total_values

obs = defaultdict(int)

for _ in range(total_tries):
    prefix = int.from_bytes(os.urandom(3)) >> 4
    obs[prefix] += 1

num_distinct_obs = len(obs)
print(f'Unique observed values: {num_distinct_obs} ({num_distinct_obs / total_values:%})')

num_zero_encounters = total_values - num_distinct_obs

# compute final histogram of frequencies
hf = [0] * (1 + max(obs.values()))
hf[0] = num_zero_encounters
for f in obs.values():
    hf[f] += 1

# compute THEORETICAL histogram
tf = [0] * len(hf)
for f in range(len(tf)):
    # compute prob to get EXACTLY the specified amount of encounters
    tf[f] = binom.pmf(f, total_tries, 1 / total_values)

# display EMPIRICAL vs THEORETICAL
for f, v in enumerate(hf):
    print(f'Values observed {f} times: {v} ({v / total_values:%})')
    print(f'      predicted {f} times: {tf[f] * total_values:.0f} ({tf[f]:%})')


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: kTimesG on February 16, 2025, 08:01:38 PM
I base it on the results of your own code, I am not putting words in your mouth; it is what you are measuring in your example: collisions of prefixes in hashes and, the longer the length, the less likely it is to find two hashes with identical prefixes, or am I wrong?

That's not what my script does. It is counting the frequency of apparition of each possible prefix. All prefixes have the same length (20 bits, 5 hex chars).

After that, it groups all possible prefixes by the observed number of apparitions and compares that to the theoretical value.

You know, the only purpose of the script was to validate that I wasn't bullshitting one post earlier, and you requested empirical proof, which I provided. So, my question still holds: even with this proof, do you still believe that you can average out 2**20 5-char prefixes over 65537-sized ranges to come up with some 3% "frequency" (not sure what you meant here), and then based on this continue with the 97% range-pruning assumptions? Because something doesn't hold.

As for "closeness", here's a mind experiment: every time you run such an experiment, and the theoretical (and hence, empirical result) states that the frequency 9 encounters appears once or twice (e.g. there is one or two prefixes that were encountered nine times), what are the chances of that luck prefix to be the one you're looking for? Or, would the lucky prefix change after every experiment? Or in fact, it won't matter, because you can never know which prefix will appear how many times (0 times up to 9 times, or more in extreme cases).


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: mcdouglasx on February 16, 2025, 09:03:05 PM
snip~

In summary, to keep it brief, this is a method with a probabilistic approach (PROBABILITIES). It is a good method, especially when working with immensely large search spaces like the puzzle. Ultimately, the software backs up the found private keys and gives you the option to recalculate the database by omitting a smaller percentage in case of failure. I don't see why this isn't more reasonable for you compared to any current method that relies on brute force or scanning the entire range indefinitely.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: WanderingPhilospher on March 01, 2025, 05:02:08 PM
So since you did not answer on the other thread, I thought I would look to see if you had mentioned your method because you said it was better than the other method, and I find it lol.

Here is my point, your method may work better with a lot of luck versus the systematic approach like the other method. I feel it would have to be luck, but without running thousands of tests, I won't know for sure.

Here is what I can tell you:

I've ran almost 2,000 44 bit ranges, searching for a 44 bit mask, 44 bit leading characters of a h160. So in that aspect, our methods are similar. But, what I can tell you, the probabilities say, there should be 0 hits in a single range, but I have found 4 matches in about 30 of those ranges.

So if I were to use your method, it would find one of those matches, store it, and never search in that range again, correct? That is what I gather from reading your script. And that is the part that many can't see. Yes, you can run this method and if you are lucky, maybe you find the exact match you are looking for. But you also run the chance of missing the match you are looking for. That would always be my concern if I am searching, a serious search, not just tinkering, to find the key to any of the puzzles / challenges.

Where as the other method, chooses random ranges, runs them 100%, so no keys or ranges are skipped, and on average, finds the exact match / key at 50%.

So if one has or is going to invest some decent $ into the challenge, what method would you suggest they use??


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: mcdouglasx on March 01, 2025, 05:27:01 PM
So since you did not answer on the other thread, I thought I would look to see if you had mentioned your method because you said it was better than the other method, and I find it lol.

Here is my point, your method may work better with a lot of luck versus the systematic approach like the other method. I feel it would have to be luck, but without running thousands of tests, I won't know for sure.

Here is what I can tell you:

I've ran almost 2,000 44 bit ranges, searching for a 44 bit mask, 44 bit leading characters of a h160. So in that aspect, our methods are similar. But, what I can tell you, the probabilities say, there should be 0 hits in a single range, but I have found 4 matches in about 30 of those ranges.

So if I were to use your method, it would find one of those matches, store it, and never search in that range again, correct? That is what I gather from reading your script. And that is the part that many can't see. Yes, you can run this method and if you are lucky, maybe you find the exact match you are looking for. But you also run the chance of missing the match you are looking for. That would always be my concern if I am searching, a serious search, not just tinkering, to find the key to any of the puzzles / challenges.

Where as the other method, chooses random ranges, runs them 100%, so no keys or ranges are skipped, and on average, finds the exact match / key at 50%.

So if one has or is going to invest some decent $ into the challenge, what method would you suggest they use??

I understand that you have doubts. My current approach is to search for prefixes using a random+sequential method (these ranges go to the database to be omitted). If I find prefixes based on the low probability of the prefix (length), I omit a percentage around the prefix (according to its length) and continue.

Statistically, it is more likely to find the target than to omit it, but if that is the case, the ranges are recalculated to a smaller percentage and continue until it is found.

In the end, you prioritize the mathematical probabilities, but you could also cover the entire range without leaving a gap. It's all a matter of perspectives. You might ask, will the database be overloaded with data? It depends, since the hashtable prioritizes high matches and avoids overlapping ranges.

For example, if a range is 1:100 and another comes as 90:150, the hashtable would unite them in 1:150, so eventually the database will reduce its weight.

In conclusion, we are searching in the entire range while prioritizing the mathematical probabilities first.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: WanderingPhilospher on March 01, 2025, 05:41:51 PM
So since you did not answer on the other thread, I thought I would look to see if you had mentioned your method because you said it was better than the other method, and I find it lol.

Here is my point, your method may work better with a lot of luck versus the systematic approach like the other method. I feel it would have to be luck, but without running thousands of tests, I won't know for sure.

Here is what I can tell you:

I've ran almost 2,000 44 bit ranges, searching for a 44 bit mask, 44 bit leading characters of a h160. So in that aspect, our methods are similar. But, what I can tell you, the probabilities say, there should be 0 hits in a single range, but I have found 4 matches in about 30 of those ranges.

So if I were to use your method, it would find one of those matches, store it, and never search in that range again, correct? That is what I gather from reading your script. And that is the part that many can't see. Yes, you can run this method and if you are lucky, maybe you find the exact match you are looking for. But you also run the chance of missing the match you are looking for. That would always be my concern if I am searching, a serious search, not just tinkering, to find the key to any of the puzzles / challenges.

Where as the other method, chooses random ranges, runs them 100%, so no keys or ranges are skipped, and on average, finds the exact match / key at 50%.

So if one has or is going to invest some decent $ into the challenge, what method would you suggest they use??

I understand that you have doubts. My current approach is to search for prefixes using a random+sequential method (these ranges go to the database to be omitted). If I find prefixes based on the low probability of the prefix (length), I omit a percentage around the prefix (according to its length) and continue.

Statistically, it is more likely to find the target than to omit it, but if that is the case, the ranges are recalculated to a smaller percentage and continue until it is found.

In the end, you prioritize the mathematical probabilities, but you could also cover the entire range without leaving a gap. It's all a matter of perspectives. You might ask, will the database be overloaded with data? It depends, since the hashtable prioritizes high matches and avoids overlapping ranges.

For example, if a range is 1:100 and another comes as 90:150, the hashtable would unite them in 1:150, so eventually the database will reduce its weight.

In conclusion, we are searching in the entire range while prioritizing the mathematical probabilities first.

I will give you this, this is a good way to search if you only have a CPU. I like the concept, my concern is how small to make the initial ranges that will be omitted, as to not miss the key in the first run lol. It can be done I guess with large match and smaller omits, i.e. something like 40 bit matches with 24 bit omits, just as an example; probably needs more anaylis over a decent sized range, and then run the same in several other same size ranges, to come up with a better average.

So in the current script, after 100% range search, the database/script auto recalculates, or one would have to manually do that with the create.py script?


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: kTimesG on March 01, 2025, 05:55:11 PM
I've ran almost 2,000 44 bit ranges, searching for a 44 bit mask, 44 bit leading characters of a h160. So in that aspect, our methods are similar. But, what I can tell you, the probabilities say, there should be 0 hits in a single range, but I have found 4 matches in about 30 of those ranges.

That's not abnormal, it's actually exactly what one would expect.

Code:
# Prob. to get exactly 4 matches, with leading 44 bits of 0, in 2**44 trials
p4 = binom.pmf(4, 2**44, 1/2**44)

# Number of times that would happen, in 2000 experiments
n = p4 * 2000
print(n)            # 30.66

Now, are you sure you haven't also found 6 or 7 ranges, having 5 or more matches? :) That would be abnormal.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: mcdouglasx on March 01, 2025, 06:08:35 PM
I will give you this, this is a good way to search if you only have a CPU. I like the concept, my concern is how small to make the initial ranges that will be omitted, as to not miss the key in the first run lol. It can be done I guess with large match and smaller omits, i.e. something like 40 bit matches with 24 bit omits, just as an example; probably needs more anaylis over a decent sized range, and then run the same in several other same size ranges, to come up with a better average.

So in the current script, after 100% range search, the database/script auto recalculates, or one would have to manually do that with the create.py script?

Yes, you're right, it would be better to search for long matches to avoid omitting at the first try. Although it is less likely, it is possible. This script is basically a prototype. I am now editing it based on Keyhunt to present the complete idea, covering all the previously mentioned points. I haven't yet included automatic percentage reduction; I would do it manually. However, automating it is a good idea. It hadn't occurred to me because the case hasn't presented itself in the tests so far.

example:

./keyhunt -m vanity -t 4 -l compress -R -r 100000000000000000:200000000000000000 -v 61eb8a50c86b0584bb727dd65bed8d2400d6d5aa -n 0x10000000 -pct 30 -pm 8

pct= Percentage to omit according to the length of the prefix.

pm= Minimum number of h160 prefixes to consider.

n= Number of sequential keys starting from the random. (0x1000000....0x1000000000000).



I've ran almost 2,000 44 bit ranges, searching for a 44 bit mask, 44 bit leading characters of a h160. So in that aspect, our methods are similar. But, what I can tell you, the probabilities say, there should be 0 hits in a single range, but I have found 4 matches in about 30 of those ranges.

That's not abnormal, it's actually exactly what one would expect.

Code:
# Prob. to get exactly 4 matches, with leading 44 bits of 0, in 2**44 trials
p4 = binom.pmf(4, 2**44, 1/2**44)

# Number of times that would happen, in 2000 experiments
n = p4 * 2000
print(n)            # 30.66

Now, are you sure you haven't also found 6 or 7 ranges, having 5 or more matches? :) That would be abnormal.

One thing is the probability of longer prefixes sneaking in, and another is the probability of one of those being the target. They are not the same, so we cannot generalize, as it messes up the calculations.

[moderator's note: consecutive posts merged]


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: WanderingPhilospher on March 01, 2025, 06:56:33 PM
I've ran almost 2,000 44 bit ranges, searching for a 44 bit mask, 44 bit leading characters of a h160. So in that aspect, our methods are similar. But, what I can tell you, the probabilities say, there should be 0 hits in a single range, but I have found 4 matches in about 30 of those ranges.

That's not abnormal, it's actually exactly what one would expect.

Code:
# Prob. to get exactly 4 matches, with leading 44 bits of 0, in 2**44 trials
p4 = binom.pmf(4, 2**44, 1/2**44)

# Number of times that would happen, in 2000 experiments
n = p4 * 2000
print(n)            # 30.66

Now, are you sure you haven't also found 6 or 7 ranges, having 5 or more matches? :) That would be abnormal.

Hmmm, let me rephrase lol.

First, I would have to go back and check if any found 5+ matches.

I guess what I was trying to say, of 2,000 ranges searched, 1,500 of them found nothing. So that could or would put this type of script in an endless loop, possibly.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: kTimesG on March 01, 2025, 07:15:31 PM
Hmmm, let me rephrase lol.

First, I would have to go back and check if any found 5+ matches.

I guess what I was trying to say, of 2,000 ranges searched, 1,500 of them found nothing. So that could or would put this type of script in an endless loop, possibly.

It would be way, way off if you indeed had 1500 of them with 0 matches. To the point of calling the uniformness of the distribution biased with a very high degree of confidence. For 2000 runs, we should get between 1880 and 2120 matches in total (99.9% confidence). You only have a few hundred? That's infinitely small to even compute the chances lol.

The binomial distribution according to your parameters should look something like this:

0 matches: 736 ranges.
1 matches: 736 ranges
2 matches: 368 ranges
3 matches: 123 ranges
4 matches:  31 ranges
5 matches:  6 ranges
> 5 matches: 1 ranges

Are you sure the mask is correct?


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: WanderingPhilospher on March 01, 2025, 11:29:41 PM
Hmmm, let me rephrase lol.

First, I would have to go back and check if any found 5+ matches.

I guess what I was trying to say, of 2,000 ranges searched, 1,500 of them found nothing. So that could or would put this type of script in an endless loop, possibly.

It would be way, way off if you indeed had 1500 of them with 0 matches. To the point of calling the uniformness of the distribution biased with a very high degree of confidence. For 2000 runs, we should get between 1880 and 2120 matches in total (99.9% confidence). You only have a few hundred? That's infinitely small to even compute the chances lol.

The binomial distribution according to your parameters should look something like this:

0 matches: 736 ranges.
1 matches: 736 ranges
2 matches: 368 ranges
3 matches: 123 ranges
4 matches:  31 ranges
5 matches:  6 ranges
> 5 matches: 1 ranges

Are you sure the mask is correct?

44 and 44, h160 mask and range size. 2,061 ranges ran and only matches found in 620 of the ranges.

Edit: A total of 738 found matches.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: kTimesG on March 02, 2025, 09:00:35 AM
44 and 44, h160 mask and range size. 2,061 ranges ran and only matches found in 620 of the ranges.

Edit: A total of 738 found matches.

That is absolutely unbelievable. I'm not saying it can't happen, though I'd rather think something went wrong with your experiment.

If you ran 2000 different ranges, each with a size of 2**44, and counted the h160 that start with 44 bits of 0 (or whatever other mask), we would expect, on average, a total of 2000 matches.

The chances to get 738 matches (plus all the chances to get anything below 738) is 6.629921781279535e-231. That's 231 fractional digits of 0.

So if your data (and procedure) is correct it would mean the probability to get 44 bits of 0 is not 1 in 2**44. With a confidence of 100% minus the 10**-231 number above.

LE: This 738 number looks awfully close to the expected number of matches for "0 found" and "1 found" above. Are you counting stuff correctly?


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: WanderingPhilospher on March 02, 2025, 09:32:27 PM
44 and 44, h160 mask and range size. 2,061 ranges ran and only matches found in 620 of the ranges.

Edit: A total of 738 found matches.

That is absolutely unbelievable. I'm not saying it can't happen, though I'd rather think something went wrong with your experiment.

If you ran 2000 different ranges, each with a size of 2**44, and counted the h160 that start with 44 bits of 0 (or whatever other mask), we would expect, on average, a total of 2000 matches.

The chances to get 738 matches (plus all the chances to get anything below 738) is 6.629921781279535e-231. That's 231 fractional digits of 0.

So if your data (and procedure) is correct it would mean the probability to get 44 bits of 0 is not 1 in 2**44. With a confidence of 100% minus the 10**-231 number above.

LE: This 738 number looks awfully close to the expected number of matches for "0 found" and "1 found" above. Are you counting stuff correctly?

So in a range of 2,000 different ranges, the average should be as you stated above. If these were consecutive ranges, then I would be surprised as well. But, since they are not consecutive, and spread out all over a larger range, I am not. I imagine if I completed the larger range, the numbers would come close to the average.

That is why I try telling people, like Bibli, McD, and others, that averages are just that, averages. Unless you do tests to see if it holds true over x amount of ranges, spread all over a larger range, how can one know how much or less to "skip". I probably could run 2,000 different ranges, and the numbers would be different, maybe even closer to the expected averages.

I guess my PC's ability to pick random ranges where less matches will be found, is pretty dang good lol.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: mcdouglasx on March 03, 2025, 05:44:01 PM
That is why I try telling people, like Bibli, McD, and others, that averages are just that, averages. Unless you do tests to see if it holds true over x amount of ranges, spread all over a larger range, how can one know how much or less to "skip". I probably could run 2,000 different ranges, and the numbers would be different, maybe even closer to the expected averages.

Your mistake is in minimizing the use of statistical and probabilistic methods. Saying that it’s 'just an average' is catastrophic in mathematics, because searching within random ranges combined with sequential methods (like pools) can lead you to scan 90% of the entire range without hitting the target. What I'm saying is that to validate your approach, you focus on 'the worst-case scenario,' which is highly improbable and a flawed idea. I see nothing wrong with a probabilistic search that covers all failure points, such as losing progress or skipping the target. My approach perfectly covers these points because, in the 'worst-case scenario,' I will always find the target 100% of the time.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: WanderingPhilospher on March 03, 2025, 08:37:42 PM
That is why I try telling people, like Bibli, McD, and others, that averages are just that, averages. Unless you do tests to see if it holds true over x amount of ranges, spread all over a larger range, how can one know how much or less to "skip". I probably could run 2,000 different ranges, and the numbers would be different, maybe even closer to the expected averages.

Your mistake is in minimizing the use of statistical and probabilistic methods. Saying that it’s 'just an average' is catastrophic in mathematics, because searching within random ranges combined with sequential methods (like pools) can lead you to scan 90% of the entire range without hitting the target. What I'm saying is that to validate your approach, you focus on 'the worst-case scenario,' which is highly improbable and a flawed idea. I see nothing wrong with a probabilistic search that covers all failure points, such as losing progress or skipping the target. My approach perfectly covers these points because, in the 'worst-case scenario,' I will always find the target 100% of the time.
I do not understand what you are saying, to be honest.

The pool's approach will find the target 100% of the time as well.

I haven't ran the numbers, but I am not so sure your current method would find it 100% of time. I've seen some flaws in the script and have made some adjustments so that it will indeed find the key 100% of the time. I know you will say yours is a prototype, which is fine, this is me just giving some feedback, your thread/script is wasting considerable time as it is currently written and I don't think it will find the key 100% of the time. There are holes, like an abyss, that it could fall into.

The other thing I noticed, with your default settings, you are finding a small prefix (bit wise) and adding and subtracting a much larger range (bit wise) on both sides.

I think you have created a cool script!  It's had me thinking about different ways to approach it/make it better. As I stated, if I was in a CPU only search mode, I would use this script.

My whole point in talking about skipping any range, based on stats or probabilities, is that I would not do it, if I had a lot of $ invested. Especially, investors money. I would do the old boring brute force method, because averages are just that, averages.


Title: Re: A probabilistic prefix search - puzzle btc 32
Post by: mcdouglasx on March 04, 2025, 08:28:26 PM
That is why I try telling people, like Bibli, McD, and others, that averages are just that, averages. Unless you do tests to see if it holds true over x amount of ranges, spread all over a larger range, how can one know how much or less to "skip". I probably could run 2,000 different ranges, and the numbers would be different, maybe even closer to the expected averages.

Your mistake is in minimizing the use of statistical and probabilistic methods. Saying that it’s 'just an average' is catastrophic in mathematics, because searching within random ranges combined with sequential methods (like pools) can lead you to scan 90% of the entire range without hitting the target. What I'm saying is that to validate your approach, you focus on 'the worst-case scenario,' which is highly improbable and a flawed idea. I see nothing wrong with a probabilistic search that covers all failure points, such as losing progress or skipping the target. My approach perfectly covers these points because, in the 'worst-case scenario,' I will always find the target 100% of the time.
I do not understand what you are saying, to be honest.

The pool's approach will find the target 100% of the time as well.

I haven't ran the numbers, but I am not so sure your current method would find it 100% of time. I've seen some flaws in the script and have made some adjustments so that it will indeed find the key 100% of the time. I know you will say yours is a prototype, which is fine, this is me just giving some feedback, your thread/script is wasting considerable time as it is currently written and I don't think it will find the key 100% of the time. There are holes, like an abyss, that it could fall into.

The other thing I noticed, with your default settings, you are finding a small prefix (bit wise) and adding and subtracting a much larger range (bit wise) on both sides.

I think you have created a cool script!  It's had me thinking about different ways to approach it/make it better. As I stated, if I was in a CPU only search mode, I would use this script.

My whole point in talking about skipping any range, based on stats or probabilities, is that I would not do it, if I had a lot of $ invested. Especially, investors money. I would do the old boring brute force method, because averages are just that, averages.

I always do this: I propose an idea and create a script as a kind of sketch or auxiliary material so that the idea has a base or starting point that makes it easier to understand. This one, the published one, does not save the scanned ranges, which should be a fundamental requirement. It also does not recalculate the percentages automatically. My main idea was that, in a dataset of length X, the probabilities of a prefix being close to another increase according to its length. It seems somewhat unreasonable, but it's a fact. It is unlikely that the prefixes you find relatively close are precisely the target. Imagine taking 1000 needles and painting 1 gold, then distributing them uniformly in a haystack. Although we know that one of the needles is closer to another than the average, it does not mean that this happens to the golden needle because it generally has a higher chance of being within the expected average.