Probabilistic search of prefixes vs random+sequential(solved)

This script compares two ways of searching for a single element (identified by its RIPEMD-160 hash) within a list of numbers. The idea is to see which of the two methods is faster and more efficient at finding that element.

To ensure a fair comparison, several simulations are run, always using the same rules:

- First, the list is divided into "blocks".
- The blocks are analyzed in random order.
- Within that list, the number we want to find is randomly selected.
- The process is run multiple times to gather statistics: we count how many checks each method makes and how many times each method "wins" (i.e., finds the target with the fewest checks).

Two methods:

1. Sequential Search:This method is very straightforward. It goes through each block from beginning to end, checking one by one if the element's hash matches the one we are looking for. It's like reviewing each page of a book without skipping a single one. Although it's simple and serves as a reference, if the element is at the end, the method ends up doing a lot of checking.

2. Prefix-Based Search:This method does something more clever. Although it also goes block by block in random order, before checking the entire block, it first checks if the hash of any element begins with the same first three characters as the hash we're searching for. In hexadecimal, each character has 16 options, which means that with three characters, you have 16³, or 4,096 possible combinations.

Ideally, if the block had 4,096 elements, each combination would appear, on average, only once. Therefore, if it finds an element that has the correct prefix, it focuses on that part instead of checking the entire block. This saves a lot of checking.

In short, while the sequential method checks each element unconditionally, the prefix-based method filters the input and focuses only on where it's most likely to find the target. Thus, at the end of all the simulations, an "average success rate" is calculated (total number of checks divided by the number of times each method won) to see which method uses the least effort to find the desired element.

The goal is to demonstrate that prefix-based search, thanks to this intelligent filtering, is generally more efficient than traditional sequential search, since fewer checks are performed when it wins.

script

Code:

import hashlib
import random
import time

# Configuration
TOTAL_SIZE = 100_000
RANGE_SIZE = 4_096
PREFIX_LENGTH = 3
SIMULATIONS = 500
SECP256K1_ORDER = int("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141", 16)

print(f"""
=== Configuration ===
Total numbers: {TOTAL_SIZE:,}
Block size: {RANGE_SIZE:,}
Prefix: {PREFIX_LENGTH} characters (16^{PREFIX_LENGTH} combinations)
Simulations: {SIMULATIONS}
secp256k1 Order: {SECP256K1_ORDER}
""")

def generate_h160(data):
    h = hashlib.new('ripemd160', str(data).encode('utf-8'))
    return h.hexdigest()

def shuffled_range(n):
    arr = list(range(n + 1))
    random.shuffle(arr)
    return arr

def sequential_search(dataset, block, target_hash, order):
    checks = 0
    for idx in order:
        start = idx * block
        end = start + block
        for i in range(start, end):
            checks += 1
            if generate_h160(dataset[i]) == target_hash:
                return {"checks": checks, "found": True}
    return {"checks": checks, "found": False}

def precise_search(dataset, block, prefix, target_hash, order):
    prefix_hash = target_hash[:prefix]
    checks = 0
    ranges = []
    for idx in order:
        start = idx * block
        end = start + block
        found_prefix = False
        for i in range(start, end):
            checks += 1
            h = generate_h160(dataset[i])
            if h == target_hash:
                return {"checks": checks, "found": True}
            if not found_prefix and h.startswith(prefix_hash):
                found_prefix = True
                ranges.append({"start": i + 1, "end": end})
                break
    for r in ranges:
        for i in range(r["end"] - 1, r["start"] - 1, -1):
            checks += 1
            if generate_h160(dataset[i]) == target_hash:
                return {"checks": checks, "found": True}
    return {"checks": checks, "found": False}

def compare_methods():
    results = {
        "sequential": {"wins": 0, "total_checks": 0, "total_time": 0},
        "precise": {"wins": 0, "total_checks": 0, "total_time": 0},
        "ties": 0
    }

    for i in range(SIMULATIONS):
        max_range = SECP256K1_ORDER - TOTAL_SIZE - 1
        random_offset = random.randrange(max_range)
        R = 1 + random_offset

        dataset = [R + i for i in range(TOTAL_SIZE)]
        target_num = random.choice(dataset)
        target_hash = generate_h160(target_num)
        blocks = TOTAL_SIZE // RANGE_SIZE
        order = shuffled_range(blocks - 1)

        start_time = time.perf_counter()
        seq_result = sequential_search(dataset, RANGE_SIZE, target_hash, order)
        end_time = time.perf_counter()
        seq_time = end_time - start_time

        start_time = time.perf_counter()
        pre_result = precise_search(dataset, RANGE_SIZE, PREFIX_LENGTH, target_hash, order)
        end_time = time.perf_counter()
        pre_time = end_time - start_time

        results["sequential"]["total_checks"] += seq_result["checks"]
        results["precise"]["total_checks"] += pre_result["checks"]
        results["sequential"]["total_time"] += seq_time
        results["precise"]["total_time"] += pre_time

        if seq_result["checks"] < pre_result["checks"]:
            results["sequential"]["wins"] += 1
        elif seq_result["checks"] > pre_result["checks"]:
            results["precise"]["wins"] += 1
        else:
            results["ties"] += 1

        print(f"Simulation {i + 1}: Sequential = {seq_result['checks']} checks in {seq_time:.6f}s | Prefix = {pre_result['checks']} checks in {pre_time:.6f}s")

    avg_success_rate_sequential = (results["sequential"]["total_checks"] / results["sequential"]["wins"]
                                   if results["sequential"]["wins"] > 0 else float('inf'))
    avg_success_rate_precise = (results["precise"]["total_checks"] / results["precise"]["wins"]
                                if results["precise"]["wins"] > 0 else float('inf'))
    avg_time_sequential = (results["sequential"]["total_time"] / results["sequential"]["wins"]
                           if results["sequential"]["wins"] > 0 else float('inf'))
    avg_time_precise = (results["precise"]["total_time"] / results["precise"]["wins"]
                        if results["precise"]["wins"] > 0 else float('inf'))

    print(f"""
=== FINAL RESULTS ===
Wins:
Sequential: {results['sequential']['wins']}
Prefix: {results['precise']['wins']}
Ties: {results['ties']}

Total Checks:

Sequential: {results['sequential']['total_checks']}
Prefix: {results['precise']['total_checks']}
Total Time:

Sequential: {results['sequential']['total_time']:.6f} seconds
Prefix: {results['precise']['total_time']:.6f} seconds

Averages (Total Time / Wins):

Sequential : {avg_time_sequential:.6f} seconds/victory
Prefix : {avg_time_precise:.6f} seconds/victory

Checks per Win:
Sequential : {avg_success_rate_sequential:.2f} checks/win
Prefix : {avg_success_rate_precise:.2f} checks/win
""")

if __name__ == '__main__':
    compare_methods()

Results

Code:

=== FINAL RESULTS ===
Wins:
Sequential: 194
Prefix: 286
Ties: 20
Total Checks:
Sequential: 26611276
Prefix: 25342870
Average Success Rates:
Total Avg / Wins
Sequential(1 victory for each): 137171.53
Prefix(1 1 victory for each): 88611.43

Let's analyze what these results show:

In a total of 500 simulations, the sequential method won 194 times, while the prefix-based method won 286 times (with 20 ties). This already tells us that, in more than half of the cases, the prefix method found the target using fewer checks than the sequential method.

In addition, the total number of verifications performed by each method was measured:

Code:

Sequential: 26,611,276 verifications

Code:

Prefixes: 25,342,870 verifications

Now, the key factor is the so-called "average success rate", which in this case is calculated by dividing the total verifications by the number of wins. This gives:

Code:

Sequential: 137,171.53 verifications per win

Code:

Prefixes: 88,611.43 verifications per win

This means that, on average, each time the prefix method "wins" (i.e.finds the target with fewer verifications), significantly less computational effort was used than in the case of the sequential method.

Why is this? The trick lies in the relationship between prefix size and block size. When using a 3-character prefix in hexadecimal, there are 16³ = 4096 possible combinations. If the block fits this size (or is close enough), each prefix combination will appear on average only once in the block.

Thus, instead of having to exhaustively check each element (as the sequential method does), the prefix-based method "gets ahead" by searching for prefix matches. This allows it to focus its efforts only on the part of the block where an exact match is most likely to be found, thus avoiding unnecessary checks.

In short, these numbers tell us that the prefix-based method is more efficient because, thanks to this smart filter, it reduces the number of checks required to find the target.

Recommendation: Reproduce the test with the number of simulations you deem appropriate.

others test

Code:

=== FINAL RESULTS ===
Wins:
Sequential: 345
Prefix: 607
Ties: 48

Total Checks:

Sequential: 51230824
Prefix: 49793474
Total Time:

Sequential: 1257.816749 seconds
Prefix: 1254.539876 seconds

Averages (Total Time / Wins):

Sequential : 3.645846 seconds/victory
Prefix : 2.066787 seconds/victory

Checks per Win:
Sequential : 148495.14 checks/win
Prefix : 82032.08 checks/win

Code:

=== FINAL RESULTS ===
Wins:
Sequential: 358
Prefix: 577
Ties: 65

Total Checks:

Sequential: 50308208
Prefix: 50190906
Total Time:

Sequential: 1056.564673 seconds
Prefix: 1077.734207 seconds

Averages (Total Time / Wins):

Sequential : 2.951298 seconds/victory
Prefix : 1.867824 seconds/victory

Checks per Win:
Sequential : 140525.72 checks/win
Prefix : 86985.97 checks/win

Another test simulating mining

script

Code:

import hashlib
import random
import time
from typing import List, Dict

# ==============================
# Simulation Configuration
# ==============================
TOTAL_NONCES = 100_000      # Total number of nonces simulated
BLOQUE_SIZE = 5_000         # Size of each block (sequential nonces)
SIMULATIONS = 500           # Number of simulations to run
TARGET_PREFIX = "0000"      # Full target (difficulty)
PREFIX_LENGTH = 3           # Length of the short prefix used as a signal
HEADER_LIST_SIZE = 10       # Number of headers to generate per simulation

# ===================================
# Class to store statistics, including a success flag.
# ===================================
class Statistics:
    def __init__(self, name: str, checks: int = 0, time: float = 0.0, success: bool = False):
        self.name = name
        self.checks = checks
        self.time = time
        self.success = success

# ============================================
# Blockheader Generator (simple simulation)
# ============================================
def generate_blockheader() -> bytes:
    return b''.join([
        random.getrandbits(32).to_bytes(4, 'little'),
        random.randbytes(32),
        random.randbytes(32),
        int(time.time()).to_bytes(4, 'big'),
        bytes.fromhex("1d00ffff")
    ])

# ============================================
# Double SHA-256 (Bitcoin-style calculation)
# ============================================
def double_sha256(header: bytes, nonce: int) -> str:
    nonce_bytes = nonce.to_bytes(4, 'little')
    block = header + nonce_bytes
    hash1 = hashlib.sha256(block).digest()
    return hashlib.sha256(hash1).hexdigest()

# =====================================================
# Sequential Method (original mining, sequential order)
# =====================================================
def sequential_method(header: bytes, nonces: List[int]) -> Statistics:
    stats = Statistics("Sequential")
    start = time.perf_counter()
    for nonce in nonces:
        stats.checks += 1
        h = double_sha256(header, nonce)
        if h.startswith(TARGET_PREFIX):
            stats.time = time.perf_counter() - start
            stats.success = True
            return stats
    stats.time = time.perf_counter() - start
    return stats

# =====================================================
# Prefix Method (optimized: skip remaining nonces in a block on short prefix)
# =====================================================
def prefix_method(header: bytes, nonces: List[int]) -> Statistics:
    stats = Statistics("Prefix")
    blocks = [nonces[i:i+BLOQUE_SIZE] for i in range(0, len(nonces), BLOQUE_SIZE)]
    random.shuffle(blocks)
    short_prefix = TARGET_PREFIX[:PREFIX_LENGTH]

    start = time.perf_counter()
    for block in blocks:
        for nonce in block:
            stats.checks += 1
            h = double_sha256(header, nonce)
            if h.startswith(TARGET_PREFIX):
                stats.time = time.perf_counter() - start
                stats.success = True
                return stats
            if h.startswith(short_prefix):
                break
    stats.time = time.perf_counter() - start
    return stats

# =====================================
# Run a simulation using a common list of headers.
# =====================================
def run_simulation_with_header_list() -> Dict[str, Statistics]:
    header_list = [generate_blockheader() for _ in range(HEADER_LIST_SIZE)]
    nonces = list(range(TOTAL_NONCES))

    seq_total_checks, seq_total_time, seq_success = 0, 0.0, False
    for header in header_list:
        stats = sequential_method(header, nonces)
        seq_total_checks += stats.checks
        seq_total_time += stats.time
        if stats.success:
            seq_success = True
            break

    pre_total_checks, pre_total_time, pre_success = 0, 0.0, False
    for header in header_list:
        stats = prefix_method(header, nonces)
        pre_total_checks += stats.checks
        pre_total_time += stats.time
        if stats.success:
            pre_success = True
            break

    return {
        "sequential": Statistics("Sequential", seq_total_checks, seq_total_time, seq_success),
        "prefix": Statistics("Prefix", pre_total_checks, pre_total_time, pre_success)
    }

# =====================================
# Main function: runs all simulations and prints results.
# =====================================
def main():
    print(f"""
=== Configuration ===
Total Nonces: {TOTAL_NONCES:,}
Block size: {BLOQUE_SIZE:,}
Complete Target: "{TARGET_PREFIX}"
Short Prefix: "{TARGET_PREFIX[:PREFIX_LENGTH]}"
Simulations: {SIMULATIONS}
Header List Size: {HEADER_LIST_SIZE}
""")
   
    wins = {"sequential": 0, "prefix": 0}
    total_checks = {"sequential": 0, "prefix": 0}
    total_time = {"sequential": 0.0, "prefix": 0.0}
   
    for i in range(SIMULATIONS):
        result = run_simulation_with_header_list()
        seq_stats = result["sequential"]
        pre_stats = result["prefix"]

        total_checks["sequential"] += seq_stats.checks
        total_checks["prefix"] += pre_stats.checks
        total_time["sequential"] += seq_stats.time
        total_time["prefix"] += pre_stats.time

        winner = "None"
        if seq_stats.success and not pre_stats.success:
            wins["sequential"] += 1
            winner = "Sequential"
        elif pre_stats.success and not seq_stats.success:
            wins["prefix"] += 1
            winner = "Prefix"
        elif seq_stats.success and pre_stats.success:
            if seq_stats.checks < pre_stats.checks:
                wins["sequential"] += 1
                winner = "Sequential"
            elif pre_stats.checks < seq_stats.checks:
                wins["prefix"] += 1
                winner = "Prefix"
            else:
                winner = "Tie"

        print(f"\nSimulation {i+1}:")
        print(f"  Sequential: Checks = {seq_stats.checks} | Time = {seq_stats.time:.4f}s | Success = {seq_stats.success}")
        print(f"  Prefix:     Checks = {pre_stats.checks} | Time = {pre_stats.time:.4f}s | Success = {pre_stats.success}")
        print(f"  Winner: {winner}")
   
    print("\n=== Final Results ===")
    print(f"Sequential Wins: {wins['sequential']}/{SIMULATIONS}")
    print(f"Prefix Wins:     {wins['prefix']}/{SIMULATIONS}")
   
    print("\nAverage Metrics:")
    for method in ["sequential", "prefix"]:
        avg_checks = total_checks[method] / max(wins[method], 1)
        avg_time = total_time[method] / max(wins[method], 1)
        print(f"\nMethod {method.capitalize()}:")
        print(f"  Average Checks: {avg_checks:,.0f}")
        print(f"  Average Time: {avg_time:.4f}s")

if __name__ == "__main__":
    main()

test #1

Code:

=== Final Results ===
Sequential Wins: 211/500
Prefix Wins:     287/500

Average Metrics:

Method Sequential:
  Average Checks: 149,921
  Average Time: 1.3175s

Method Prefix:
  Average Checks: 109,233
  Average Time: 0.9764s

test #2

Code:

=== Final Results ===
Sequential Wins: 209/500
Prefix Wins:     290/500

Average Metrics:

Method Sequential:
  Average Checks: 161,961
  Average Time: 1.3663s

Method Prefix:
  Average Checks: 114,693
  Average Time: 0.9794s

update:

Conclusion

code:

Code:

import hashlib
import random
import time
import math
import statistics
import scipy.stats as st
from math import ceil

# Configuration
TOTAL_SIZE = 100_000
RANGE_SIZE = 4_096
PREFIX_LENGTH = 3
SIMULATIONS = 1000
SECP256K1_ORDER = int("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141", 16)

print(f"""
=== Configuration ===
Total numbers: {TOTAL_SIZE:,}
Block size: {RANGE_SIZE:,}
Total blocks needed: {ceil(TOTAL_SIZE/RANGE_SIZE)}
Prefix: {PREFIX_LENGTH} characters (16^{PREFIX_LENGTH} = {16**PREFIX_LENGTH:,} combinations)
Simulations: {SIMULATIONS}
secp256k1 order: {SECP256K1_ORDER}
""")

def generate_h160(data):
    h = hashlib.new('ripemd160', str(data).encode('utf-8'))
    return h.hexdigest()

def shuffled_block_order(total_blocks):
    blocks = list(range(total_blocks))
    random.shuffle(blocks)
    return blocks

def sequential_search(dataset, block_size, target_hash, block_order):
    checks = 0
    for block_idx in block_order:
        start = block_idx * block_size
        end = min(start + block_size, len(dataset))
        for i in range(start, end):
            checks += 1
            if generate_h160(dataset[i]) == target_hash:
                return {"checks": checks, "found": True, "index": i}
    return {"checks": checks, "found": False}

def prefix_search(dataset, block_size, prefix_len, target_hash, block_order):
    prefix_hash = target_hash[:prefix_len]
    checks = 0
    ranges_to_scan = []
    skip_counter = 0
    scan_increment = 1

    for block_idx in block_order:
        start = block_idx * block_size
        end = min(start + block_size, len(dataset))  # Límite seguro
        found_prefix = False

        for i in range(start, end):
            checks += 1
            h = generate_h160(dataset[i])
            if h == target_hash:
                return {"checks": checks, "found": True, "index": i}
            if not found_prefix and h.startswith(prefix_hash):
                found_prefix = True
                ranges_to_scan.append({"start": i + 1, "end": end})
                skip_counter += 1
                break

        if skip_counter >= 4 and ranges_to_scan:
            for _ in range(min(scan_increment, len(ranges_to_scan))):
                r = ranges_to_scan.pop(0)
                for i in range(r["start"], r["end"]):
                    checks += 1
                    if generate_h160(dataset[i]) == target_hash:
                        return {"checks": checks, "found": True, "index": i}
            skip_counter = 0
            scan_increment += 1

    for r in ranges_to_scan:
        for i in range(r["start"], r["end"]):
            checks += 1
            if generate_h160(dataset[i]) == target_hash:
                return {"checks": checks, "found": True, "index": i}

    return {"checks": checks, "found": False}

def compute_cohens_d(list1, list2):
    if len(list1) < 2 or len(list2) < 2:
        return float('nan')
    n1, n2 = len(list1), len(list2)
    m1, m2 = statistics.mean(list1), statistics.mean(list2)
    s1, s2 = statistics.stdev(list1), statistics.stdev(list2)
    
    pooled_std = math.sqrt(((n1-1)*s1**2 + (n2-1)*s2**2) / (n1+n2-2))
    if pooled_std == 0:
        return float('nan')
    return (m1 - m2) / pooled_std

def correct_coefficient_of_variation(data):
    if not data or statistics.mean(data) == 0:
        return float('nan')
    return (statistics.stdev(data) / statistics.mean(data)) * 100

def longest_streak(outcomes, letter):
    max_streak = current = 0
    for o in outcomes:
        current = current + 1 if o == letter else 0
        max_streak = max(max_streak, current)
    return max_streak

def ascii_bar(label, value, max_value, bar_length=50):
    bar_count = int((value / max_value) * bar_length) if max_value > 0 else 0
    return f"{label:12}: {'#' * bar_count} ({value})"

def compare_methods():
    results = {
        "sequential": {"wins": 0, "success": 0, "checks": [], "times": []},
        "prefix": {"wins": 0, "success": 0, "checks": [], "times": []},
        "ties": 0
    }
    outcome_history = []
    total_blocks = ceil(TOTAL_SIZE / RANGE_SIZE)

    for _ in range(SIMULATIONS):
        max_offset = SECP256K1_ORDER - TOTAL_SIZE - 1
        offset = random.randint(0, max_offset)
        dataset = [offset + i for i in range(TOTAL_SIZE)]
        target_num = random.choice(dataset)
        target_hash = generate_h160(target_num)
        block_order = shuffled_block_order(total_blocks)

        start = time.perf_counter()
        seq_res = sequential_search(dataset, RANGE_SIZE, target_hash, block_order)
        seq_time = time.perf_counter() - start

        start = time.perf_counter()
        pre_res = prefix_search(dataset, RANGE_SIZE, PREFIX_LENGTH, target_hash, block_order)
        pre_time = time.perf_counter() - start

        for method, res, t in [("sequential", seq_res, seq_time), ("prefix", pre_res, pre_time)]:
            if res["found"]:
                results[method]["success"] += 1
                results[method]["checks"].append(res["checks"])
                results[method]["times"].append(t)

        if seq_res["found"] and pre_res["found"]:
            if seq_res["checks"] < pre_res["checks"]:
                results["sequential"]["wins"] += 1
                outcome_history.append("S")
            elif pre_res["checks"] < seq_res["checks"]:
                results["prefix"]["wins"] += 1
                outcome_history.append("P")
            else:
                results["ties"] += 1
                outcome_history.append("T")

    def get_stats(data):
        if not data:
            return {"mean": 0, "min": 0, "max": 0, "median": 0, "stdev": 0}
        return {
            "mean": statistics.mean(data),
            "min": min(data),
            "max": max(data),
            "median": statistics.median(data),
            "stdev": statistics.stdev(data) if len(data) > 1 else 0
        }

    seq_stats = get_stats(results["sequential"]["checks"])
    pre_stats = get_stats(results["prefix"]["checks"])
    seq_time_stats = get_stats(results["sequential"]["times"])
    pre_time_stats = get_stats(results["prefix"]["times"])

    seq_success_rate = results["sequential"]["success"] / SIMULATIONS
    pre_success_rate = results["prefix"]["success"] / SIMULATIONS

    total_comparisons = results["sequential"]["wins"] + results["prefix"]["wins"] + results["ties"]
    seq_win_rate = results["sequential"]["wins"] / total_comparisons if total_comparisons > 0 else 0
    pre_win_rate = results["prefix"]["wins"] / total_comparisons if total_comparisons > 0 else 0

    cv_seq = correct_coefficient_of_variation(results["sequential"]["checks"])
    cv_pre = correct_coefficient_of_variation(results["prefix"]["checks"])

    effect_size = compute_cohens_d(results["sequential"]["checks"], results["prefix"]["checks"])
    if len(results["sequential"]["checks"]) > 1 and len(results["prefix"]["checks"]) > 1:
        t_test = st.ttest_ind(results["sequential"]["checks"], results["prefix"]["checks"], equal_var=False)
    else:
        t_test = None

    print(f"""
=== FINAL ANALYSIS (CORRECTED) ===

[Success Rates]
Sequential: {seq_success_rate:.1%} ({results['sequential']['success']}/{SIMULATIONS})
Prefix:    {pre_success_rate:.1%} ({results['prefix']['success']}/{SIMULATIONS})

[Performance Metrics]
               | Sequential          | Prefix
---------------+---------------------+--------------------
Checks (mean)  | {seq_stats['mean']:>12,.1f} ± {seq_stats['stdev']:,.1f} | {pre_stats['mean']:>12,.1f} ± {pre_stats['stdev']:,.1f}
Time (mean ms) | {seq_time_stats['mean']*1000:>12.2f} ± {seq_time_stats['stdev']*1000:.2f} | {pre_time_stats['mean']*1000:>12.2f} ± {pre_time_stats['stdev']*1000:.2f}
Min checks     | {seq_stats['min']:>12,} | {pre_stats['min']:>12,}
Max checks     | {seq_stats['max']:>12,} | {pre_stats['max']:>12,}
Coef. Variation| {cv_seq:>11.1f}% | {cv_pre:>11.1f}%

[Comparison When Both Succeed]
Sequential wins: {results['sequential']['wins']} ({seq_win_rate:.1%})
Prefix wins:    {results['prefix']['wins']} ({pre_win_rate:.1%})
Ties:          {results['ties']}

[Statistical Significance]
Cohen's d: {effect_size:.3f}
Welch's t-test: {'t = %.3f, p = %.4f' % (t_test.statistic, t_test.pvalue) if t_test else 'Insufficient data'}
""")

    non_tie_outcomes = [o for o in outcome_history if o != "T"]
    streak_analysis = f"""
=== STREAK ANALYSIS ===
Longest Sequential streak: {longest_streak(outcome_history, 'S')}
Longest Prefix streak:    {longest_streak(outcome_history, 'P')}
Expected max streak:      {math.log(len(non_tie_outcomes), 2):.1f} (for {len(non_tie_outcomes)} trials)
"""
    print(streak_analysis)

    max_wins = max(results["sequential"]["wins"], results["prefix"]["wins"], results["ties"])
    print("=== WIN DISTRIBUTION ===")
    print(ascii_bar("Sequential", results["sequential"]["wins"], max_wins))
    print(ascii_bar("Prefix", results["prefix"]["wins"], max_wins))
    print(ascii_bar("Ties", results["ties"], max_wins))

if __name__ == '__main__':
    compare_methods()

results:

Code:

=== Configuration ===
Total numbers: 100,000
Block size: 4,096
Total blocks needed: 25
Prefix: 3 characters (16^3 = 4,096 combinations)
Simulations: 1000
secp256k1 order: 115792089237316195423570985008687907852837564279074904382605163141518161494337


=== FINAL ANALYSIS (CORRECTED) ===

[Success Rates]
Sequential: 100.0% (1000/1000)
Prefix:    100.0% (1000/1000)

[Performance Metrics]
               | Sequential          | Prefix
---------------+---------------------+--------------------
Checks (mean)  |     49,886.2 ± 28,945.2 |     49,104.3 ± 28,915.5
Time (mean ms) |       128.96 ± 75.96 |       130.12 ± 77.58
Min checks     |          160 |          160
Max checks     |       99,942 |       99,977
Coef. Variation|        58.0% |        58.9%

[Comparison When Both Succeed]
Sequential wins: 328 (32.8%)
Prefix wins:    609 (60.9%)
Ties:          63

[Statistical Significance]
Cohen's d: 0.027
Welch's t-test: t = 0.604, p = 0.5456


=== STREAK ANALYSIS ===
Longest Sequential streak: 6
Longest Prefix streak:    14
Expected max streak:      9.9 (for 937 trials)

=== WIN DISTRIBUTION ===
Sequential  : ########################## (328)
Prefix      : ################################################## (609)
Ties        : ##### (63)

The analysis performed both in terms of implementation and empirical results demonstrates that both search strategies (sequential and prefix-based) effectively accomplish the goal of finding a hash in a brute-force scenario, but their differences in efficiency are marginal.

Configuration and Execution: The simulation used 100,000 numbers divided into 25 blocks (each containing 4,096 elements) and was run over 1,000 trials. Both methods achieved a 100% success rate in every trial, confirming that the dataset structure and search strategy meet the basic requirement of exhaustiveness.

Performance Metrics: The metrics obtained (such as the average number of checks and execution time) are very similar between the two methods. Although the prefix search "wins" more often in individual comparisons (with 609 wins compared to 328 wins for the sequential method, and 63 ties), a comparison of the average number of checks shows a difference that is statistically insignificant (Cohen’s d of 0.027 and a Welch t-test with a p-value greater than 0.5). This indicates that the reduction in the number of checks achieved by the prefix search in some trials is offset in others, and on average both methods evaluate almost the same number of candidates.

Implications of the Prefix Strategy: The concept behind the prefix search is to early on "filter" blocks that have the potential to contain the target hash, thereby potentially speeding up the search process in some cases. However, when assessing a brute-force search where the primary cost is in calculating the hash and scanning the dataset, this additional filtering does not lead to a substantial improvement in overall performance. Essentially, the structure and randomness of the dataset mean that both strategies, on average, inspect roughly half of the search space before locating the target.

Statistical and Practical Aspects: The statistical tests confirm that the difference observed between the two methods is minimal. The closeness of the average values and the high dispersion of results (as reflected in the standard deviations) suggest that, from a practical standpoint, the benefit of using a prefix filter does not translate into significant improvements in brute-force scenarios.

In conclusion, in a brute-force environment where every possibility is evaluated to find a hash, neither the sequential search nor the prefix-based search demonstrates a significant overall efficiency advantage. The choice between them will therefore depend more on practical and design considerations rather than on pure performance metrics.

This is a self-moderated post meant to maintain a strict conversation, one that is respectful and based on arguments supported by verifiable evidence in practice.

	Author	Topic: Probabilistic search of prefixes vs random+sequential(solved) (Read 635 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.