Bitcoin Forum
April 01, 2026, 06:16:59 PM *
News: Latest Bitcoin Core release: 30.2 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3]  All
  Print  
Author Topic: BITCOIN PUZZLE: THE PREFIX DILEMMA  (Read 951 times)
fixedpaul
Member
**
Offline Offline

Activity: 86
Merit: 27


View Profile WWW
November 19, 2025, 08:04:47 PM
 #41


It doesn't tell me that the following keys have a lower probability; what it tells me is, "I've already found a promising candidate in this block, and statistically it's optimal to continue searching in another block."


Why is it optimal to skip?
If the probability that the winning key is in one of the next keys in the range is the same as in any other range (1/N), why is it optimal? It seems contradictory to me, in my humble opinion.

One question: in the prefix method I’m currently using blocks of contiguous keys. would it also work with blocks of non-contiguous keys?
Let me explain better: I build N blocks, each composed of 4096 keys K chosen at random from the space. It should work the same, right? Because the probability of finding at least one prefix in the block is always 63%, so when i found a prefix match i found a promising candidate. Or do the keys somehow “know” that they’re far apart, so it no longer works?
kTimesG
Full Member
***
Offline Offline

Activity: 784
Merit: 242


View Profile
November 19, 2025, 08:59:35 PM
Last edit: November 19, 2025, 09:18:11 PM by kTimesG
 #42

if you can't see the contradictions in your post, I can't do anything for you, because the purpose of the post is lost in debate. You've gone from CUDA to random, to postfixes... what's next?

The prefix uses information correlated with the specific objective, while your random method uses statistical noise. In cryptography, that difference is enormous.

You forgot we're on the same page. I was agreeing with you, remember?

Now you're back stating the Prefix method is better than the other 20767445112335933694764617899 technically equivalent methods that all have the same, best, statistical advantage. I was under the impression you already agreed on this fact, or did you change your mind since a few hours ago?!

So, the one contradicting himself here is you. Let me give you a quick example:

You start scanning some block, and you are at index 0 (the very first key).

Now, if we try out, one by one, all of the 20767445112335933694764617900 technically equivalent methods (all of them having the same statistical advantage), there is basically an almost 100% chance that, for at least one of them (if not a really lot of them) will have some set of 24 bits (lets call this set of bits the "method ID" of each technically equivalent method) matching the target - hence, a technically equivalent match that skips the rest of the block.

Now, this same fact applies to key at position 1.

Identically, the same fact applies to key at index 2.

And so on. So, at the end, what we end up with is:

Even though we have 20767445112335933694764617900 technically equivalent methods that all have the same statistical edge, somehow we stop the search either at:

position 0 (for a lot of the methods)

or position 1 (for another lot of the methods)

or position 2 (same principle)

or position 3 (because some 24 bits matched in some quadrillion other methods)

(repeat for any position you want)

So, I am asking you again: which one of these 20767445112335933694764617900 statistically advantageous equivalent best methods is actually the ONE TO USE?

Now, before you have something to complain about, remember: all of the methods that stop at these different positions are the best methods, all of them with the statistical edge. You agreed on that already

So, since you seem to have an issue with this problem, I encourage you to refer your whatever question to the OP of this thread.

Off the grid, training pigeons to broadcast signed messages.
mcdouglasx (OP)
Hero Member
*****
Offline Offline

Activity: 952
Merit: 532



View Profile WWW
November 19, 2025, 09:07:24 PM
 #43


It doesn't tell me that the following keys have a lower probability; what it tells me is, "I've already found a promising candidate in this block, and statistically it's optimal to continue searching in another block."


Why is it optimal to skip?
If the probability that the winning key is in one of the next keys in the range is the same as in any other range (1/N), why is it optimal? It seems contradictory to me, in my humble opinion.

One question: in the prefix method I’m currently using blocks of contiguous keys. would it also work with blocks of non-contiguous keys?
Let me explain better: I build N blocks, each composed of 4096 keys K chosen at random from the space. It should work the same, right? Because the probability of finding at least one prefix in the block is always 63%, so when i found a prefix match i found a promising candidate. Or do the keys somehow “know” that they’re far apart, so it no longer works?

The jump is optimal (in this case, for search engines with limited resources) because when 16^L ≈ N, you have exhausted the expected statistical return of that block. It's not that the subsequent keys have lower individual probability, but rather that the marginal cost of checking the remaining 37% exceeds the expected benefit of the method. Remember that the statistical advantage is efficiently discarding 37% of the block.

It's still searching within the same block of 4096, so searching the prefix sequentially or randomly is statistically the same (but when I say random I mean the sequence of the keys, not stopping at a random position independent of the prefix). However, even though it's statistically the same, it requires more unnecessary work because by randomizing, you lose the advantage of elliptic curve optimized operations when dealing with contiguous keys, which is exploited by the computer. Furthermore, it requires a system that prevents repeating keys within that randomness in the block. In other words, their proposal is inefficient, even though it maintains the statistical result.

...So, since you seem to have an issue with this problem, I encourage you to refer your whatever question to the OP of this thread.

Thanks for admitting you use AI after all.
kTimesG
Full Member
***
Offline Offline

Activity: 784
Merit: 242


View Profile
November 19, 2025, 09:32:16 PM
 #44

In other words, their proposal is inefficient, even though it maintains the statistical result.

Who's "they"? GPT post-edit failure?

...So, since you seem to have an issue with this problem, I encourage you to refer your whatever question to the OP of this thread.

Thanks for admitting you use AI after all.

Not really. I was encouraging you to address any issues you had with your self-contradictions to mcdouglasx.

I think 30+ posts that just try to explain why whatever 24 bits of whatever strong hash are mathematically equivalent to flipping a fair coin 24 times is enough. Here seems only that the OP claims that the fair coin is rigged and the flips are a sham, while the hash leaks secret information (let me guess: we didn't account for the proximity bias of the keys, haven't we?)

This got boring 28 posts ago. Back to real work.

Off the grid, training pigeons to broadcast signed messages.
mcdouglasx (OP)
Hero Member
*****
Offline Offline

Activity: 952
Merit: 532



View Profile WWW
November 19, 2025, 09:47:49 PM
 #45

In other words, their proposal is inefficient, even though it maintains the statistical result.

Who's "they"? GPT post-edit failure?

...So, since you seem to have an issue with this problem, I encourage you to refer your whatever question to the OP of this thread.

Thanks for admitting you use AI after all.

Not really. I was encouraging you to address any issues you had with your self-contradictions to mcdouglasx.
...

In case you haven't noticed, I make those mistakes because "mi idioma nativo es el español" I'm sorry for not being perfect, but your justification with your phrase is the biggest stupidity.
I will simply ignore you in the future!
fixedpaul
Member
**
Offline Offline

Activity: 86
Merit: 27


View Profile WWW
November 19, 2025, 10:09:06 PM
Merited by kTimesG (7)
 #46

Okay perfect, now everything is starting to become much clearer to me. To recap, what’s needed is to divide the space into blocks, contiguous, random, multiples of any number, it doesn’t matter. The important thing is to look for a prefix match, or a suffix, or any possible combination of L characters of the hash160. All these methods optimize my search with limited hardware. I can also use concatenated hash functions. Basically, this thread made us realize that there are many ways to make the search more efficient thanks to the cryptographic guidance of hash functions, cleverly exploiting the uniformity of their output.

As you rightly say, it’s better to use sequential keys within the range rather than random ones because I can take advantage of modular batch inversion. Of course, if I stop at some point I would lose some keys for which I computed the inverse of dx almost “for free,” and I’d lose a bit of efficiency, but it’s an acceptable risk.

But batch inversion also works if I scan keys in descending order, right? So I start from key 4096 in the range and go backwards until I find a prefix match, and this gives me a statistical advantage.

I just wonder which method is better: searching the keys forward or backward? Let me explain better: I build my N blocks of 4096 contiguous keys and I want to decide whether to scan forward or backward. From what I understand, both methods give me the same statistical advantage, so it shouldn’t make any difference.

But when I think about it, these two methods do exactly the opposite of each other and scan completely different keys (except for the individual prefix matches). So how can they give me the same statistical advantage if they behave in exactly opposite ways? My brain is starting to melt.
mcdouglasx (OP)
Hero Member
*****
Offline Offline

Activity: 952
Merit: 532



View Profile WWW
November 19, 2025, 10:26:47 PM
 #47

Okay perfect, now everything is starting to become much clearer to me. To recap, what’s needed is to divide the space into blocks, contiguous, random, multiples of any number, it doesn’t matter. The important thing is to look for a prefix match, or a suffix, or any possible combination of L characters of the hash160. All these methods optimize my search with limited hardware. I can also use concatenated hash functions. Basically, this thread made us realize that there are many ways to make the search more efficient thanks to the cryptographic guidance of hash functions, cleverly exploiting the uniformity of their output.

As you rightly say, it’s better to use sequential keys within the range rather than random ones because I can take advantage of modular batch inversion. Of course, if I stop at some point I would lose some keys for which I computed the inverse of dx almost “for free,” and I’d lose a bit of efficiency, but it’s an acceptable risk.

But batch inversion also works if I scan keys in descending order, right? So I start from key 4096 in the range and go backwards until I find a prefix match, and this gives me a statistical advantage.

I just wonder which method is better: searching the keys forward or backward? Let me explain better: I build my N blocks of 4096 contiguous keys and I want to decide whether to scan forward or backward. From what I understand, both methods give me the same statistical advantage, so it shouldn’t make any difference.

But when I think about it, these two methods do exactly the opposite of each other and scan completely different keys (except for the individual prefix matches). So how can they give me the same statistical advantage if they behave in exactly opposite ways? My brain is starting to melt.

You're right to note that the omitted subsets are different, but the key is that the position of the first prefix is ​​equally likely anywhere in the block, regardless of direction.
Although you omit different keys in each direction, the expected amount of work per block is identical due to the uniform distribution of the hashes. The statistical advantage depends on the set of keys you examine, not the order in which you examine them.
kTimesG
Full Member
***
Offline Offline

Activity: 784
Merit: 242


View Profile
November 19, 2025, 11:08:40 PM
 #48

Although you omit different keys in each direction, the expected amount of work per block is identical due to the uniform distribution of the hashes. The statistical advantage depends on the set of keys you examine, not the order in which you examine them.

Dude's tryna' tell you that if the Magic Method has a X% chance of success when scanning Y% of the range (backwards) then you can't also have X% chance of success when scanning only the other keys (which is the remaining (100 - Y)%) when using the same exact Magic Method forward. This is basic arithmetic, not rocket science.

Just another non-sense effect, using your own weapons, based on the brain-melting idiocies you keep on grindin' on.

Also your imaginary illusion on any statistical BS falls apart when you actually put in practice what you just said above (that is, finally allowing people to change the order of the scanned keys!!!!!!!!!!!)

But it's OK, I'm already waiting on seven other responses from you on very on-topic directed questions which you have no answer for simply because you are well aware that no answer can exist, so you prefer to ignore everything that doesn't fit under your self-created fake reality rules.

Off the grid, training pigeons to broadcast signed messages.
fixedpaul
Member
**
Offline Offline

Activity: 86
Merit: 27


View Profile WWW
November 19, 2025, 11:49:42 PM
Last edit: November 20, 2025, 12:00:59 AM by fixedpaul
 #49


You're right to note that the omitted subsets are different, but the key is that the position of the first prefix is ​​equally likely anywhere in the block, regardless of direction.
Although you omit different keys in each direction, the expected amount of work per block is identical due to the uniform distribution of the hashes. The statistical advantage depends on the set of keys you examine, not the order in which you examine them.

I completely agree with you on this. But this thread is about comparing with the sequential method.

If I run a simulation with N blocks, if the forward prefix method “wins” against the sequential method, then the backward method “loses” against the sequential one, and vice versa: if forward loses, backward wins. This is certain 100% true, precisely because they behave in exactly opposite ways. If it’s a tie, then it’s a tie for all three.

Your thesis is that by running many simulations with N blocks, the forward prefix method is on average BETTER than the sequential method. This implies that, over many simulations, the backward prefix method is on average WORSE than the sequential method. Exactly because they behave in opposite ways, and when forward wins against the sequential method, backward loses against it.

But earlier you said that forward and backward are equivalent and give me the same statistical advantage when compared to the sequential method.

How can both of these conclusions be true? Isn’t this what in logic is called a contradiction?

Finally, It’s not even clear to me why, in the script, it doesn’t count as a win for the sequential method when the sequential one finds the key and the prefix method does not.
mcdouglasx (OP)
Hero Member
*****
Offline Offline

Activity: 952
Merit: 532



View Profile WWW
November 20, 2025, 12:15:02 AM
 #50


You're right to note that the omitted subsets are different, but the key is that the position of the first prefix is ​​equally likely anywhere in the block, regardless of direction.
Although you omit different keys in each direction, the expected amount of work per block is identical due to the uniform distribution of the hashes. The statistical advantage depends on the set of keys you examine, not the order in which you examine them.

I completely agree with you on this. But this thread is about comparing with the sequential method.

If I run a simulation with N blocks, if the forward prefix method “wins” against the sequential method, then the backward method “loses” against the sequential one, and vice versa: if forward loses, backward wins. This is certain 100% true, precisely because they behave in exactly opposite ways. If it’s a tie, then it’s a tie for all three.

Your thesis is that by running many simulations with N blocks, the forward prefix method is on average BETTER than the sequential method. This implies that, over many simulations, the backward prefix method is on average WORSE than the sequential method. Exactly because they behave in opposite ways, and when forward wins against the sequential method, backward loses against it.

But earlier you said that forward and backward are equivalent and give me the same statistical advantage when compared to the sequential method.

How can both of these conclusions be true? Isn’t this what in logic is called a contradiction?

There is no contradiction. You are confusing individual results with statistical averages.
The apparent 'paradox' is resolved by the Law of Large Numbers. Opposing results average toward the same efficiency.
The fact that one method checks more than the other in a specific block does not invalidate the fact that, on average, both(both ways) are equally better than the sequential method in this context.
Uniform distribution + stopping on first prefix = same expected efficiency in either direction.

There is proven efficiency. Both directions (forward/backward) share the same statistical advantage because both benefit from the same principle: stopping when the block has reached its potential.

The sequential method is only there to demonstrate that the prefix method is more efficient in this context of "resource-constrained searchers," since the sequential method examines entire blocks because it searches for the original target, not a prefix.

Sequential: Only stops when it finds the target.
Prefix: Can stop earlier due to a prefix match, but still checks if it is the target.

As I mentioned when I started the thread, this strategy is useful for "users with limited resources" since scanning the entire block is a waste of time, and prefixes make discarding the rational search in this context.



And to avoid being redundant regarding randomness, I quote myself to avoid it in the thread.

here...
fixedpaul
Member
**
Offline Offline

Activity: 86
Merit: 27


View Profile WWW
November 20, 2025, 01:01:59 AM
 #51

Yes sorry, I had forgotten about the law of large numbers.

But, according to this law of large numbers, shouldn’t I get the same results if, for each range, I simply scan the first 63% of the range?
The average number of checks I need to perform would be the same, right?
At least according to the law of large numbers, unless I’m provably missing some other law of probabilistic statistics?
mcdouglasx (OP)
Hero Member
*****
Offline Offline

Activity: 952
Merit: 532



View Profile WWW
November 20, 2025, 01:16:53 PM
 #52

Yes sorry, I had forgotten about the law of large numbers.

But, according to this law of large numbers, shouldn’t I get the same results if, for each range, I simply scan the first 63% of the range?
The average number of checks I need to perform would be the same, right?
At least according to the law of large numbers, unless I’m provably missing some other law of probabilistic statistics?

In theory, yes, but are you willing to risk excluding the final 37% of the block, which gives you zero chance if the target is at the end of the block, instead of using the prefix-stopping strategy, whose chance of finding the target at the end of the block is maintained in cases where a prefix hasn't been found previously?
In other words, with prefix-stopping, your chance of finding it at the end would no longer be zero, so the prefix strategy would still be superior. Remember, you're trying to solve a puzzle, not check that the probabilities are similar.

That said, if both methods have similar average efficiency, but one is guaranteed to fail in specific regions while the other maintains probability throughout the entire space, why would you choose the one with guaranteed blind spots?
fixedpaul
Member
**
Offline Offline

Activity: 86
Merit: 27


View Profile WWW
November 20, 2025, 01:52:41 PM
Merited by vapourminer (1)
 #53

Yes sorry, I had forgotten about the law of large numbers.

But, according to this law of large numbers, shouldn’t I get the same results if, for each range, I simply scan the first 63% of the range?
The average number of checks I need to perform would be the same, right?
At least according to the law of large numbers, unless I’m provably missing some other law of probabilistic statistics?

In theory, yes, but are you willing to risk excluding the final 37% of the block, which gives you zero chance if the target is at the end of the block, instead of using the prefix-stopping strategy, whose chance of finding the target at the end of the block is maintained in cases where a prefix hasn't been found previously?
In other words, with prefix-stopping, your chance of finding it at the end would no longer be zero, so the prefix strategy would still be superior. Remember, you're trying to solve a puzzle, not check that the probabilities are similar.

That said, if both methods have similar average efficiency, but one is guaranteed to fail in specific regions while the other maintains probability throughout the entire space, why would you choose the one with guaranteed blind spots?

Aaah now I get it! Of course, with that method I would leave blind spots in the block, how silly of me.
Whereas with the prefix method, you don’t leave many blind spots, or if you do, they’re calculated risk blind spots.
Because in that block you’ve already found a prefix, so let’s say that block is no longer very promising. It’s better to move on, since it already contains a prefix similar to the target, so it’s unlikely to find another one.
We can say that prefixes don’t really like to stay in the same block; generally they prefer to each take their own block, and we’re deliberately exploiting this property. Now it makes sense

But what if it were the opposite? I mean, we have about a 63% chance that a block contains at least one prefix match.
But if I scan the first 63% and don’t find a single prefix match, damn, that would make me think that the block is kind of “defective” (statistically), and that maybe it’s better to move on, as if that block isn’t very promising for hosting a prefix match.
It’s a bit like when you’re looking for mushrooms: you choose a patch of ground, but you don’t check all of it, only part of it. If you don’t find even one mushroom, it’s better to move somewhere else. But maybe I’m talking nonsense
kTimesG
Full Member
***
Offline Offline

Activity: 784
Merit: 242


View Profile
November 20, 2025, 03:28:35 PM
Merited by fixedpaul (1)
 #54

But what if it were the opposite? I mean, we have about a 63% chance that a block contains at least one prefix match.
But if I scan the first 63% and don’t find a single prefix match, damn, that would make me think that the block is kind of “defective” (statistically), and that maybe it’s better to move on, as if that block isn’t very promising for hosting a prefix match.

It doesn't need to make you think, it's the perfect reasonable rational thing to do, because for an independent event with probability 1/N, in N trials, we have:

P(X = 0) == P(X = 1) == 36.8%
P(X > 0) == 1 - P(X = 0) == 63.2%

That is, the unconditional probability to have either exactly 0 or exactly 1 matches in a block of size N, is identical. So, if the mushroom isn't there, the chances of not finding one in the rest of the block is identical to the case where, if the mushroom is there, then it's the single mushroom in the entire block.

But we're basically discussing how on Earth an algo that has 63% success rate (with a 37% miss rate) with an average workload of 31.5% + additional work is better than one that does 100% success + 0% fails (with an average workload of 50%) so I'm doubtful science matters so much around here.

Off the grid, training pigeons to broadcast signed messages.
fixedpaul
Member
**
Offline Offline

Activity: 86
Merit: 27


View Profile WWW
November 20, 2025, 04:31:25 PM
 #55

I think I figured out that in the end it depends on what you’re looking for, mushrooms or prefixes.

Mushrooms like to grow close to each other, if you don’t find any, it’s better to move somewhere else. Prefixes, on the other hand, absolutely don’t. They hate each other. If you find one, you can be sure the next one will be far away. Found a prefix? Better savor it, be content with it, because in this block it’s the only one you’re going to get !!

With everything clear now, I can step away from this very constructive thread
mcdouglasx (OP)
Hero Member
*****
Offline Offline

Activity: 952
Merit: 532



View Profile WWW
November 20, 2025, 05:07:20 PM
 #56

snip

You're correct about the Poisson distribution (36.8% for 0 and 1 prefixes), but your strategic conclusion is incorrect.

You're arguing against a straw man; no one here is suggesting using Sequential. The real debate is which optimized method to choose. Read the thread again.
For those of us with limited resources, the question isn't '100% vs. 63%' but rather, 'Which 63% should we choose?'

You forgot to calculate this probability:

P(target at 37% final | 0 prefixes in 63%) > 37% because this probability increases when you don't find any prefixes.

The absence of evidence is not evidence of absence. Not finding prefixes in 63% does not make the block 'faulty'; it simply means you haven't found the evidence yet.


I'm sharing a code with you, modified with AI since it's your preferred tool for finding unnecessary debates or thoughts to invalidate theories, but with the difference that I used it as an educational reality tool.

Code:
import hashlib
import random
import time
import math
import statistics
import scipy.stats as stats
from math import ceil

# Configuration
TOTAL_SIZE = 100_000
RANGE_SIZE = 4_096
PREFIX_LENGTH = 3
SIMULATIONS = 2000

SECP256K1_ORDER = int("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141", 16)

print(f"""
=== Configuration ===
Total numbers: {100_000:,}
Block size: {4_096:,}
Total blocks needed: {ceil(100_000/4_096)}
Prefix: {3} characters (16^{3} = {16**3:,} combinations)
Simulations per method: {100}
secp256k1 order: {SECP256K1_ORDER}
""")

def generate_h160(data):
    h = hashlib.new('ripemd160', str(data).encode('utf-8'))
    return h.hexdigest()

def shuffled_blck(total_blocks):
    blocks = list(range(total_blocks))
    random.shuffle(blocks)
    return blocks

def prefix_search(dataset, block_size, target_hash, block_order):
    prefix = target_hash[:PREFIX_LENGTH]
    checks = 0
   
    for block_idx in block_order:
        start = block_idx * block_size
        end = min(start + block_size, len(dataset))
        found_prefix = False
       
        for i in range(start, end):
            checks += 1
            current_hash = generate_h160(dataset[i])
           
            if current_hash == target_hash:
                return {"checks": checks, "found": True, "index": i}
           
            if not found_prefix and current_hash.startswith(prefix):
                found_prefix = True
                break
   
    return {"checks": checks, "found": False}

def random_break_probability(dataset, block_size, target_hash, block_order):
    checks = 0
   
    for block_idx in block_order:
        start = block_idx * block_size
        end = min(start + block_size, len(dataset))
       
        for i in range(start, end):
            checks += 1
           
            if random.randint(0, 4095) == 0:
                break
           
            if generate_h160(dataset[i]) == target_hash:
                return {"checks": checks, "found": True, "index": i}
           
    return {"checks": checks, "found": False}

def random_break_modulo(dataset, block_size, target_hash, block_order):
    checks = 0
   
    for block_idx in block_order:
        start = block_idx * block_size
        end = min(start + block_size, len(dataset))
       
        for i in range(start, end):
            checks += 1
           
            if i % 4096 == random.randint(0, 4095):
                break
           
            if generate_h160(dataset[i]) == target_hash:
                return {"checks": checks, "found": True, "index": i}
           
    return {"checks": checks, "found": False}

def fixed_63_percent_search(dataset, block_size, target_hash, block_order):
    checks = 0
    fixed_scan_per_block = int(block_size * 0.63)
   
    for block_idx in block_order:
        start = block_idx * block_size
        end = min(start + block_size, len(dataset))
       
        scan_end = min(start + fixed_scan_per_block, end)
        for i in range(start, scan_end):
            checks += 1
            if generate_h160(dataset[i]) == target_hash:
                return {"checks": checks, "found": True, "index": i}
   
    return {"checks": checks, "found": False}

def run_method_simulations(method_func, method_name, simulations):
    results = {
        "checks": [],
        "times": [],
        "successful_runs": 0,
        "total_runs": simulations
    }
    total_blocks = ceil(TOTAL_SIZE / RANGE_SIZE)
   
    for sim in range(simulations):
        max_offset = SECP256K1_ORDER - TOTAL_SIZE - 1
        offset = random.randint(0, max_offset)
        dataset = [offset + i for i in range(TOTAL_SIZE)]
        random.shuffle(dataset)
       
        target_num = random.choice(dataset)
        target_hash = generate_h160(target_num)
       
        block_order = shuffled_blck(total_blocks)
       
        start = time.perf_counter()
        result = method_func(dataset, RANGE_SIZE, target_hash, block_order)
        elapsed = time.perf_counter() - start
       
        if result["found"]:
            results["checks"].append(result["checks"])
            results["times"].append(elapsed)
            results["successful_runs"] += 1
       
        if (sim + 1) % (simulations // 10) == 0:
            success_rate = results['successful_runs'] / (sim + 1) * 100
            print(f"{method_name}: {sim + 1}/{simulations} ({success_rate:.1f}% success)")
   
    return results

def compare_methods():
   
    prefix_results = run_method_simulations(prefix_search, "Prefix", SIMULATIONS) 
    print()
    random_prob_results = run_method_simulations(random_break_probability, "Random Break (Prob)", SIMULATIONS)
    print()
    random_mod_results = run_method_simulations(random_break_modulo, "Random Break (Mod)", SIMULATIONS)
    print()
    fixed_63_results = run_method_simulations(fixed_63_percent_search, "63% Fixed", SIMULATIONS)
    print()
   
    def get_stats(data):
        if not data:
            return {"mean": 0, "min": 0, "max": 0, "median": 0, "stdev": 0}
        return {
            "mean": statistics.mean(data),
            "min": min(data),
            "max": max(data),
            "median": statistics.median(data),
            "stdev": statistics.stdev(data) if len(data) > 1 else 0
        }
   
    pre_stats = get_stats(prefix_results["checks"])
    rand_prob_stats = get_stats(random_prob_results["checks"])
    rand_mod_stats = get_stats(random_mod_results["checks"])
    fixed_stats = get_stats(fixed_63_results["checks"])
   
    def statistical_analysis(list1, list2):
        analysis = {}
        if len(list1) > 1 and len(list2) > 1:
            try:
                analysis['t_test'] = stats.ttest_ind(list1, list2, equal_var=False)
                pooled_std = math.sqrt((statistics.stdev(list1)**2 + statistics.stdev(list2)**2) / 2)
                if pooled_std > 0:
                    analysis['cohen_d'] = (statistics.mean(list1) - statistics.mean(list2)) / pooled_std
                else:
                    analysis['cohen_d'] = 0
            except:
                analysis['t_test'] = None
                analysis['cohen_d'] = 0
        else:
            analysis['t_test'] = None
            analysis['cohen_d'] = 0
        return analysis
   
    stats_fixed_vs_prefix = statistical_analysis(fixed_63_results["checks"], prefix_results["checks"])
    stats_rand_prob_vs_prefix = statistical_analysis(random_prob_results["checks"], prefix_results["checks"])
    stats_rand_mod_vs_prefix = statistical_analysis(random_mod_results["checks"], prefix_results["checks"])
    stats_rand_prob_vs_mod = statistical_analysis(random_prob_results["checks"], random_mod_results["checks"])
   
    print(f"""
=== ANALYSIS ===
Successful runs per method:
Prefix:               {prefix_results['successful_runs']}/{SIMULATIONS} ({prefix_results['successful_runs']/SIMULATIONS:.1%})
Random Break (Prob):  {random_prob_results['successful_runs']}/{SIMULATIONS} ({random_prob_results['successful_runs']/SIMULATIONS:.1%})
Random Break (Mod):   {random_mod_results['successful_runs']}/{SIMULATIONS} ({random_mod_results['successful_runs']/SIMULATIONS:.1%})
63% Fixed:            {fixed_63_results['successful_runs']}/{SIMULATIONS} ({fixed_63_results['successful_runs']/SIMULATIONS:.1%})

[Performance Metrics - Average Checks per Successful Run]
               | Prefix              | Random Break (Prob)| Random Break (Mod) | 63% Fixed
---------------+---------------------+--------------------+--------------------+--------------------
Checks (mean)  | {pre_stats['mean']:>12,.1f} ± {pre_stats['stdev']:,.1f} | {rand_prob_stats['mean']:>12,.1f} ± {rand_prob_stats['stdev']:,.1f} | {rand_mod_stats['mean']:>12,.1f} ± {rand_mod_stats['stdev']:,.1f} | {fixed_stats['mean']:>12,.1f} ± {fixed_stats['stdev']:,.1f}
Min checks     | {pre_stats['min']:>12,} | {rand_prob_stats['min']:>12,} | {rand_mod_stats['min']:>12,} | {fixed_stats['min']:>12,}
Max checks     | {pre_stats['max']:>12,} | {rand_prob_stats['max']:>12,} | {rand_mod_stats['max']:>12,} | {fixed_stats['max']:>12,}

[Efficiency vs Prefix Baseline]
Random Break (Prob):  {((pre_stats['mean'] - rand_prob_stats['mean']) / pre_stats['mean'] * 100 if pre_stats['mean'] > 0 else 0):.1f}% more efficient
Random Break (Mod):   {((pre_stats['mean'] - rand_mod_stats['mean']) / pre_stats['mean'] * 100 if pre_stats['mean'] > 0 else 0):.1f}% more efficient
63% Fixed:            {((pre_stats['mean'] - fixed_stats['mean']) / pre_stats['mean'] * 100 if pre_stats['mean'] > 0 else 0):.1f}% more efficient

[Key Comparisons - Statistical Significance]
63% Fixed vs Prefix:
Welch's t-test: {'t = %.3f, p = %.4f' % (stats_fixed_vs_prefix['t_test'].statistic, stats_fixed_vs_prefix['t_test'].pvalue) if stats_fixed_vs_prefix['t_test'] else 'N/A'}
Effect Size: {stats_fixed_vs_prefix['cohen_d']:.3f}

Random Break (Prob) vs Prefix:
Welch's t-test: {'t = %.3f, p = %.4f' % (stats_rand_prob_vs_prefix['t_test'].statistic, stats_rand_prob_vs_prefix['t_test'].pvalue) if stats_rand_prob_vs_prefix['t_test'] else 'N/A'}
Effect Size: {stats_rand_prob_vs_prefix['cohen_d']:.3f}

Random Break (Mod) vs Prefix:
Welch's t-test: {'t = %.3f, p = %.4f' % (stats_rand_mod_vs_prefix['t_test'].statistic, stats_rand_mod_vs_prefix['t_test'].pvalue) if stats_rand_mod_vs_prefix['t_test'] else 'N/A'}
Effect Size: {stats_rand_mod_vs_prefix['cohen_d']:.3f}

Random Break (Prob) vs (Mod):
Welch's t-test: {'t = %.3f, p = %.4f' % (stats_rand_prob_vs_mod['t_test'].statistic, stats_rand_prob_vs_mod['t_test'].pvalue) if stats_rand_prob_vs_mod['t_test'] else 'N/A'}
Effect Size: {stats_rand_prob_vs_mod['cohen_d']:.3f}

[Success Rate Ranking]
1. Prefix:               {prefix_results['successful_runs']/SIMULATIONS:.1%}
2. Random Break (Prob):  {random_prob_results['successful_runs']/SIMULATIONS:.1%}
3. Random Break (Mod):   {random_mod_results['successful_runs']/SIMULATIONS:.1%}
4. 63% Fixed:            {fixed_63_results['successful_runs']/SIMULATIONS:.1%}
""")

if __name__ == '__main__':
    compare_methods()

result:

Code:
=== ANALYSIS ===
Successful runs per method:
Prefix:               1298/2000 (64.9%)
Random Break (Prob):  1271/2000 (63.5%)
Random Break (Mod):   1246/2000 (62.3%)
63% Fixed:            1249/2000 (62.5%)

[Performance Metrics - Average Checks per Successful Run]
               | Prefix              | Random Break (Prob)| Random Break (Mod) | 63% Fixed
---------------+---------------------+--------------------+--------------------+--------------------
Checks (mean)  |     32,481.9 ± 18,538.5 |     32,941.2 ± 19,381.0 |     31,509.0 ± 18,870.2 |     31,371.8 ± 18,240.5
Min checks     |           70 |          205 |           49 |           48
Max checks     |       77,300 |       81,094 |       77,133 |       63,542

[Efficiency vs Prefix Baseline]
Random Break (Prob):  -1.4% more efficient
Random Break (Mod):   3.0% more efficient
63% Fixed:            3.4% more efficient

[Key Comparisons - Statistical Significance]
63% Fixed vs Prefix:
Welch's t-test: t = -1.523, p = 0.1278
Effect Size: -0.060

Random Break (Prob) vs Prefix:
Welch's t-test: t = 0.614, p = 0.5395
Effect Size: 0.024

Random Break (Mod) vs Prefix:
Welch's t-test: t = -1.311, p = 0.1899
Effect Size: -0.052

Random Break (Prob) vs (Mod):
Welch's t-test: t = 1.878, p = 0.0604
Effect Size: 0.075

[Success Rate Ranking]
1. Prefix:               64.9%
2. Random Break (Prob):  63.5%
3. Random Break (Mod):   62.3%
4. 63% Fixed:            62.5%


Prefixes wins because
:

- It stops early when it finds evidence (prefixes), intelligently and with reliable cryptographic validity.

- It keeps searching even when it doesn't find evidence.

63% Fixed loses because:

- It assumes there are no targets without evidence.

- It never looks at the final 37%, no matter what.

Random loses because:

It's not a strategy with a solid mathematical foundation; it's just noise.

To put it simply: For limited resources, Prefixes offers the best balance: Maximum effectiveness (64.9%) + adaptive intelligence + no blind spots.

The other options, although statistically similar, have design flaws that make Prefixes the more viable option, even if we want to believe that if the statistics are the same, it doesn't matter which search method we use.

For an intelligent user, the consistency of the method must always prevail over statistical similarity, since it is more likely that 63 fixed will skip the target if it is discarded at the end, random will skip it due to stopping without logic, or that prefix will skip it because there are 2 prefixes of the same length in a block and it just happens to be in second position.



I think I figured out that in the end it depends on what you’re looking for, mushrooms or prefixes.

Mushrooms like to grow close to each other, if you don’t find any, it’s better to move somewhere else. Prefixes, on the other hand, absolutely don’t. They hate each other. If you find one, you can be sure the next one will be far away. Found a prefix? Better savor it, be content with it, because in this block it’s the only one you’re going to get !!

With everything clear now, I can step away from this very constructive thread

Exactly, the structure of the search space determines the optimal strategy.
kTimesG
Full Member
***
Offline Offline

Activity: 784
Merit: 242


View Profile
November 20, 2025, 05:34:31 PM
 #57

I'm sharing a code with you, modified with AI since it's your preferred tool for finding unnecessary debates or thoughts to invalidate theories, but with the difference that I used it as an educational reality tool.

AI failed you. Here's what my non-AI brain has to say about it:

1. random_break_probability is bugged: it introduces bias by potentially discarding the target match. Where in the world have you ever saw the Scooby Doo method doing such a thing?

Fix: first check for target match, and only then discard.

2. random_break_modulo is bugged: same exact issue as above.

3. fixed_63_percent_search is bugged: the scanned amount is computed with a low precision (only 2 decimals) of 1 - 1/e which creates a small bias because the block size (4096) is scanned in a less amount.

Now, if you fix these 3 issues and do a 10.000 iteration comparison, what you will find is that:

All the 4 compared methods yield identical (below the error margin) results

.... which is 63.21% for all of them.

You're welcome. I'm out as well.

Off the grid, training pigeons to broadcast signed messages.
mcdouglasx (OP)
Hero Member
*****
Offline Offline

Activity: 952
Merit: 532



View Profile WWW
November 20, 2025, 05:57:02 PM
 #58

I'm sharing a code with you, modified with AI since it's your preferred tool for finding unnecessary debates or thoughts to invalidate theories, but with the difference that I used it as an educational reality tool.

AI failed you. Here's what my non-AI brain has to say about it:

1. random_break_probability is bugged: it introduces bias by potentially discarding the target match. Where in the world have you ever saw the Scooby Doo method doing such a thing?

Fix: first check for target match, and only then discard.

2. random_break_modulo is bugged: same exact issue as above.

3. fixed_63_percent_search is bugged: the scanned amount is computed with a low precision (only 2 decimals) of 1 - 1/e which creates a small bias because the block size (4096) is scanned in a less amount.

Now, if you fix these 3 issues and do a 10.000 iteration comparison, what you will find is that:

All the 4 compared methods yield identical (below the error margin) results

.... which is 63.21% for all of them.

You're welcome. I'm out as well.

Even if all methods converge to the theoretical 63.21%, the strategic difference remains:

63% Fixed: Never considers the final 36.79%
Random: Fails arbitrarily; it's an unpredictable risk
Prefix: Fails only in statistically rare cases

Do you prefer a method that fails intelligently or one that fails randomly/arrogantly?

Your attempt to treat all options as equal in this context is absurd. Bon voyage!
Pages: « 1 2 [3]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!