Bitcoin Forum
September 21, 2025, 08:32:02 PM *
News: Latest Bitcoin Core release: 29.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Probabilistic search of prefixes vs random+sequential(solved)  (Read 635 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
mcdouglasx (OP)
Sr. Member
****
Offline Offline

Activity: 770
Merit: 408



View Profile WWW
April 21, 2025, 05:13:49 PM
Last edit: May 02, 2025, 05:04:42 PM by mcdouglasx
Merited by ABCbits (8), RetiredCoder (5), vapourminer (1)
 #1

This script compares two ways of searching for a single element (identified by its RIPEMD-160 hash) within a list of numbers. The idea is to see which of the two methods is faster and more efficient at finding that element.

To ensure a fair comparison, several simulations are run, always using the same rules:


- First, the list is divided into "blocks".
- The blocks are analyzed in random order.
- Within that list, the number we want to find is randomly selected.
- The process is run multiple times to gather statistics: we count how many checks each method makes and how many times each method "wins" (i.e., finds the target with the fewest checks).

Two methods:

1. Sequential Search:This method is very straightforward. It goes through each block from beginning to end, checking one by one if the element's hash matches the one we are looking for. It's like reviewing each page of a book without skipping a single one. Although it's simple and serves as a reference, if the element is at the end, the method ends up doing a lot of checking.

2. Prefix-Based Search:This method does something more clever. Although it also goes block by block in random order, before checking the entire block, it first checks if the hash of any element begins with the same first three characters as the hash we're searching for. In hexadecimal, each character has 16 options, which means that with three characters, you have 16³, or 4,096 possible combinations.

Ideally, if the block had 4,096 elements, each combination would appear, on average, only once. Therefore, if it finds an element that has the correct prefix, it focuses on that part instead of checking the entire block. This saves a lot of checking.

In short, while the sequential method checks each element unconditionally, the prefix-based method filters the input and focuses only on where it's most likely to find the target. Thus, at the end of all the simulations, an "average success rate" is calculated (total number of checks divided by the number of times each method won) to see which method uses the least effort to find the desired element.

The goal is to demonstrate that prefix-based search, thanks to this intelligent filtering, is generally more efficient than traditional sequential search, since fewer checks are performed when it wins.

script

Code:
import hashlib
import random
import time

# Configuration
TOTAL_SIZE = 100_000
RANGE_SIZE = 4_096
PREFIX_LENGTH = 3
SIMULATIONS = 500
SECP256K1_ORDER = int("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141", 16)

print(f"""
=== Configuration ===
Total numbers: {TOTAL_SIZE:,}
Block size: {RANGE_SIZE:,}
Prefix: {PREFIX_LENGTH} characters (16^{PREFIX_LENGTH} combinations)
Simulations: {SIMULATIONS}
secp256k1 Order: {SECP256K1_ORDER}
""")

def generate_h160(data):
    h = hashlib.new('ripemd160', str(data).encode('utf-8'))
    return h.hexdigest()

def shuffled_range(n):
    arr = list(range(n + 1))
    random.shuffle(arr)
    return arr

def sequential_search(dataset, block, target_hash, order):
    checks = 0
    for idx in order:
        start = idx * block
        end = start + block
        for i in range(start, end):
            checks += 1
            if generate_h160(dataset[i]) == target_hash:
                return {"checks": checks, "found": True}
    return {"checks": checks, "found": False}

def precise_search(dataset, block, prefix, target_hash, order):
    prefix_hash = target_hash[:prefix]
    checks = 0
    ranges = []
    for idx in order:
        start = idx * block
        end = start + block
        found_prefix = False
        for i in range(start, end):
            checks += 1
            h = generate_h160(dataset[i])
            if h == target_hash:
                return {"checks": checks, "found": True}
            if not found_prefix and h.startswith(prefix_hash):
                found_prefix = True
                ranges.append({"start": i + 1, "end": end})
                break
    for r in ranges:
        for i in range(r["end"] - 1, r["start"] - 1, -1):
            checks += 1
            if generate_h160(dataset[i]) == target_hash:
                return {"checks": checks, "found": True}
    return {"checks": checks, "found": False}

def compare_methods():
    results = {
        "sequential": {"wins": 0, "total_checks": 0, "total_time": 0},
        "precise": {"wins": 0, "total_checks": 0, "total_time": 0},
        "ties": 0
    }

    for i in range(SIMULATIONS):
        max_range = SECP256K1_ORDER - TOTAL_SIZE - 1
        random_offset = random.randrange(max_range)
        R = 1 + random_offset

        dataset = [R + i for i in range(TOTAL_SIZE)]
        target_num = random.choice(dataset)
        target_hash = generate_h160(target_num)
        blocks = TOTAL_SIZE // RANGE_SIZE
        order = shuffled_range(blocks - 1)

        start_time = time.perf_counter()
        seq_result = sequential_search(dataset, RANGE_SIZE, target_hash, order)
        end_time = time.perf_counter()
        seq_time = end_time - start_time

        start_time = time.perf_counter()
        pre_result = precise_search(dataset, RANGE_SIZE, PREFIX_LENGTH, target_hash, order)
        end_time = time.perf_counter()
        pre_time = end_time - start_time

        results["sequential"]["total_checks"] += seq_result["checks"]
        results["precise"]["total_checks"] += pre_result["checks"]
        results["sequential"]["total_time"] += seq_time
        results["precise"]["total_time"] += pre_time

        if seq_result["checks"] < pre_result["checks"]:
            results["sequential"]["wins"] += 1
        elif seq_result["checks"] > pre_result["checks"]:
            results["precise"]["wins"] += 1
        else:
            results["ties"] += 1

        print(f"Simulation {i + 1}: Sequential = {seq_result['checks']} checks in {seq_time:.6f}s | Prefix = {pre_result['checks']} checks in {pre_time:.6f}s")

    avg_success_rate_sequential = (results["sequential"]["total_checks"] / results["sequential"]["wins"]
                                   if results["sequential"]["wins"] > 0 else float('inf'))
    avg_success_rate_precise = (results["precise"]["total_checks"] / results["precise"]["wins"]
                                if results["precise"]["wins"] > 0 else float('inf'))
    avg_time_sequential = (results["sequential"]["total_time"] / results["sequential"]["wins"]
                           if results["sequential"]["wins"] > 0 else float('inf'))
    avg_time_precise = (results["precise"]["total_time"] / results["precise"]["wins"]
                        if results["precise"]["wins"] > 0 else float('inf'))

    print(f"""
=== FINAL RESULTS ===
Wins:
Sequential: {results['sequential']['wins']}
Prefix: {results['precise']['wins']}
Ties: {results['ties']}

Total Checks:

Sequential: {results['sequential']['total_checks']}
Prefix: {results['precise']['total_checks']}
Total Time:

Sequential: {results['sequential']['total_time']:.6f} seconds
Prefix: {results['precise']['total_time']:.6f} seconds

Averages (Total Time / Wins):

Sequential : {avg_time_sequential:.6f} seconds/victory
Prefix : {avg_time_precise:.6f} seconds/victory

Checks per Win:
Sequential : {avg_success_rate_sequential:.2f} checks/win
Prefix : {avg_success_rate_precise:.2f} checks/win
""")

if __name__ == '__main__':
    compare_methods()


Results


Code:
=== FINAL RESULTS ===
Wins:
Sequential: 194
Prefix: 286
Ties: 20
Total Checks:
Sequential: 26611276
Prefix: 25342870
Average Success Rates:
Total Avg / Wins
Sequential(1 victory for each): 137171.53
Prefix(1 1 victory for each): 88611.43


Let's analyze what these results show:

In a total of 500 simulations, the sequential method won 194 times, while the prefix-based method won 286 times (with 20 ties). This already tells us that, in more than half of the cases, the prefix method found the target using fewer checks than the sequential method.

In addition, the total number of verifications performed by each method was measured:


Code:
Sequential: 26,611,276 verifications
Code:
Prefixes: 25,342,870 verifications


Now, the key factor is the so-called "average success rate", which in this case is calculated by dividing the total verifications by the number of wins. This gives:


Code:
Sequential: 137,171.53 verifications per win
Code:
Prefixes: 88,611.43 verifications per win


This means that, on average, each time the prefix method "wins" (i.e.finds the target with fewer verifications), significantly less computational effort was used than in the case of the sequential method.


Why is this? The trick lies in the relationship between prefix size and block size. When using a 3-character prefix in hexadecimal, there are 16³ = 4096 possible combinations. If the block fits this size (or is close enough), each prefix combination will appear on average only once in the block.

Thus, instead of having to exhaustively check each element (as the sequential method does), the prefix-based method "gets ahead" by searching for prefix matches. This allows it to focus its efforts only on the part of the block where an exact match is most likely to be found, thus avoiding unnecessary checks.

In short, these numbers tell us that the prefix-based method is more efficient because, thanks to this smart filter, it reduces the number of checks required to find the target.

Recommendation: Reproduce the test with the number of simulations you deem appropriate.





others test

Code:
=== FINAL RESULTS ===
Wins:
Sequential: 345
Prefix: 607
Ties: 48

Total Checks:

Sequential: 51230824
Prefix: 49793474
Total Time:

Sequential: 1257.816749 seconds
Prefix: 1254.539876 seconds

Averages (Total Time / Wins):

Sequential : 3.645846 seconds/victory
Prefix : 2.066787 seconds/victory

Checks per Win:
Sequential : 148495.14 checks/win
Prefix : 82032.08 checks/win


Code:
=== FINAL RESULTS ===
Wins:
Sequential: 358
Prefix: 577
Ties: 65

Total Checks:

Sequential: 50308208
Prefix: 50190906
Total Time:

Sequential: 1056.564673 seconds
Prefix: 1077.734207 seconds

Averages (Total Time / Wins):

Sequential : 2.951298 seconds/victory
Prefix : 1.867824 seconds/victory

Checks per Win:
Sequential : 140525.72 checks/win
Prefix : 86985.97 checks/win





Another test simulating mining



script


Code:
import hashlib
import random
import time
from typing import List, Dict

# ==============================
# Simulation Configuration
# ==============================
TOTAL_NONCES = 100_000      # Total number of nonces simulated
BLOQUE_SIZE = 5_000         # Size of each block (sequential nonces)
SIMULATIONS = 500           # Number of simulations to run
TARGET_PREFIX = "0000"      # Full target (difficulty)
PREFIX_LENGTH = 3           # Length of the short prefix used as a signal
HEADER_LIST_SIZE = 10       # Number of headers to generate per simulation

# ===================================
# Class to store statistics, including a success flag.
# ===================================
class Statistics:
    def __init__(self, name: str, checks: int = 0, time: float = 0.0, success: bool = False):
        self.name = name
        self.checks = checks
        self.time = time
        self.success = success

# ============================================
# Blockheader Generator (simple simulation)
# ============================================
def generate_blockheader() -> bytes:
    return b''.join([
        random.getrandbits(32).to_bytes(4, 'little'),
        random.randbytes(32),
        random.randbytes(32),
        int(time.time()).to_bytes(4, 'big'),
        bytes.fromhex("1d00ffff")
    ])

# ============================================
# Double SHA-256 (Bitcoin-style calculation)
# ============================================
def double_sha256(header: bytes, nonce: int) -> str:
    nonce_bytes = nonce.to_bytes(4, 'little')
    block = header + nonce_bytes
    hash1 = hashlib.sha256(block).digest()
    return hashlib.sha256(hash1).hexdigest()

# =====================================================
# Sequential Method (original mining, sequential order)
# =====================================================
def sequential_method(header: bytes, nonces: List[int]) -> Statistics:
    stats = Statistics("Sequential")
    start = time.perf_counter()
    for nonce in nonces:
        stats.checks += 1
        h = double_sha256(header, nonce)
        if h.startswith(TARGET_PREFIX):
            stats.time = time.perf_counter() - start
            stats.success = True
            return stats
    stats.time = time.perf_counter() - start
    return stats

# =====================================================
# Prefix Method (optimized: skip remaining nonces in a block on short prefix)
# =====================================================
def prefix_method(header: bytes, nonces: List[int]) -> Statistics:
    stats = Statistics("Prefix")
    blocks = [nonces[i:i+BLOQUE_SIZE] for i in range(0, len(nonces), BLOQUE_SIZE)]
    random.shuffle(blocks)
    short_prefix = TARGET_PREFIX[:PREFIX_LENGTH]

    start = time.perf_counter()
    for block in blocks:
        for nonce in block:
            stats.checks += 1
            h = double_sha256(header, nonce)
            if h.startswith(TARGET_PREFIX):
                stats.time = time.perf_counter() - start
                stats.success = True
                return stats
            if h.startswith(short_prefix):
                break
    stats.time = time.perf_counter() - start
    return stats

# =====================================
# Run a simulation using a common list of headers.
# =====================================
def run_simulation_with_header_list() -> Dict[str, Statistics]:
    header_list = [generate_blockheader() for _ in range(HEADER_LIST_SIZE)]
    nonces = list(range(TOTAL_NONCES))

    seq_total_checks, seq_total_time, seq_success = 0, 0.0, False
    for header in header_list:
        stats = sequential_method(header, nonces)
        seq_total_checks += stats.checks
        seq_total_time += stats.time
        if stats.success:
            seq_success = True
            break

    pre_total_checks, pre_total_time, pre_success = 0, 0.0, False
    for header in header_list:
        stats = prefix_method(header, nonces)
        pre_total_checks += stats.checks
        pre_total_time += stats.time
        if stats.success:
            pre_success = True
            break

    return {
        "sequential": Statistics("Sequential", seq_total_checks, seq_total_time, seq_success),
        "prefix": Statistics("Prefix", pre_total_checks, pre_total_time, pre_success)
    }

# =====================================
# Main function: runs all simulations and prints results.
# =====================================
def main():
    print(f"""
=== Configuration ===
Total Nonces: {TOTAL_NONCES:,}
Block size: {BLOQUE_SIZE:,}
Complete Target: "{TARGET_PREFIX}"
Short Prefix: "{TARGET_PREFIX[:PREFIX_LENGTH]}"
Simulations: {SIMULATIONS}
Header List Size: {HEADER_LIST_SIZE}
""")
  
    wins = {"sequential": 0, "prefix": 0}
    total_checks = {"sequential": 0, "prefix": 0}
    total_time = {"sequential": 0.0, "prefix": 0.0}
  
    for i in range(SIMULATIONS):
        result = run_simulation_with_header_list()
        seq_stats = result["sequential"]
        pre_stats = result["prefix"]

        total_checks["sequential"] += seq_stats.checks
        total_checks["prefix"] += pre_stats.checks
        total_time["sequential"] += seq_stats.time
        total_time["prefix"] += pre_stats.time

        winner = "None"
        if seq_stats.success and not pre_stats.success:
            wins["sequential"] += 1
            winner = "Sequential"
        elif pre_stats.success and not seq_stats.success:
            wins["prefix"] += 1
            winner = "Prefix"
        elif seq_stats.success and pre_stats.success:
            if seq_stats.checks < pre_stats.checks:
                wins["sequential"] += 1
                winner = "Sequential"
            elif pre_stats.checks < seq_stats.checks:
                wins["prefix"] += 1
                winner = "Prefix"
            else:
                winner = "Tie"

        print(f"\nSimulation {i+1}:")
        print(f"  Sequential: Checks = {seq_stats.checks} | Time = {seq_stats.time:.4f}s | Success = {seq_stats.success}")
        print(f"  Prefix:     Checks = {pre_stats.checks} | Time = {pre_stats.time:.4f}s | Success = {pre_stats.success}")
        print(f"  Winner: {winner}")
  
    print("\n=== Final Results ===")
    print(f"Sequential Wins: {wins['sequential']}/{SIMULATIONS}")
    print(f"Prefix Wins:     {wins['prefix']}/{SIMULATIONS}")
  
    print("\nAverage Metrics:")
    for method in ["sequential", "prefix"]:
        avg_checks = total_checks[method] / max(wins[method], 1)
        avg_time = total_time[method] / max(wins[method], 1)
        print(f"\nMethod {method.capitalize()}:")
        print(f"  Average Checks: {avg_checks:,.0f}")
        print(f"  Average Time: {avg_time:.4f}s")

if __name__ == "__main__":
    main()



test #1

Code:
=== Final Results ===
Sequential Wins: 211/500
Prefix Wins:     287/500

Average Metrics:

Method Sequential:
  Average Checks: 149,921
  Average Time: 1.3175s

Method Prefix:
  Average Checks: 109,233
  Average Time: 0.9764s




test #2


Code:
=== Final Results ===
Sequential Wins: 209/500
Prefix Wins:     290/500

Average Metrics:

Method Sequential:
  Average Checks: 161,961
  Average Time: 1.3663s

Method Prefix:
  Average Checks: 114,693
  Average Time: 0.9794s





update:


Conclusion

code:

Code:
import hashlib
import random
import time
import math
import statistics
import scipy.stats as st
from math import ceil

# Configuration
TOTAL_SIZE = 100_000
RANGE_SIZE = 4_096
PREFIX_LENGTH = 3
SIMULATIONS = 1000
SECP256K1_ORDER = int("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141", 16)

print(f"""
=== Configuration ===
Total numbers: {TOTAL_SIZE:,}
Block size: {RANGE_SIZE:,}
Total blocks needed: {ceil(TOTAL_SIZE/RANGE_SIZE)}
Prefix: {PREFIX_LENGTH} characters (16^{PREFIX_LENGTH} = {16**PREFIX_LENGTH:,} combinations)
Simulations: {SIMULATIONS}
secp256k1 order: {SECP256K1_ORDER}
""")

def generate_h160(data):
    h = hashlib.new('ripemd160', str(data).encode('utf-8'))
    return h.hexdigest()

def shuffled_block_order(total_blocks):
    blocks = list(range(total_blocks))
    random.shuffle(blocks)
    return blocks

def sequential_search(dataset, block_size, target_hash, block_order):
    checks = 0
    for block_idx in block_order:
        start = block_idx * block_size
        end = min(start + block_size, len(dataset))
        for i in range(start, end):
            checks += 1
            if generate_h160(dataset[i]) == target_hash:
                return {"checks": checks, "found": True, "index": i}
    return {"checks": checks, "found": False}

def prefix_search(dataset, block_size, prefix_len, target_hash, block_order):
    prefix_hash = target_hash[:prefix_len]
    checks = 0
    ranges_to_scan = []
    skip_counter = 0
    scan_increment = 1

    for block_idx in block_order:
        start = block_idx * block_size
        end = min(start + block_size, len(dataset))  # Límite seguro
        found_prefix = False

        for i in range(start, end):
            checks += 1
            h = generate_h160(dataset[i])
            if h == target_hash:
                return {"checks": checks, "found": True, "index": i}
            if not found_prefix and h.startswith(prefix_hash):
                found_prefix = True
                ranges_to_scan.append({"start": i + 1, "end": end})
                skip_counter += 1
                break

        if skip_counter >= 4 and ranges_to_scan:
            for _ in range(min(scan_increment, len(ranges_to_scan))):
                r = ranges_to_scan.pop(0)
                for i in range(r["start"], r["end"]):
                    checks += 1
                    if generate_h160(dataset[i]) == target_hash:
                        return {"checks": checks, "found": True, "index": i}
            skip_counter = 0
            scan_increment += 1

    for r in ranges_to_scan:
        for i in range(r["start"], r["end"]):
            checks += 1
            if generate_h160(dataset[i]) == target_hash:
                return {"checks": checks, "found": True, "index": i}

    return {"checks": checks, "found": False}

def compute_cohens_d(list1, list2):
    if len(list1) < 2 or len(list2) < 2:
        return float('nan')
    n1, n2 = len(list1), len(list2)
    m1, m2 = statistics.mean(list1), statistics.mean(list2)
    s1, s2 = statistics.stdev(list1), statistics.stdev(list2)
    
    pooled_std = math.sqrt(((n1-1)*s1**2 + (n2-1)*s2**2) / (n1+n2-2))
    if pooled_std == 0:
        return float('nan')
    return (m1 - m2) / pooled_std

def correct_coefficient_of_variation(data):
    if not data or statistics.mean(data) == 0:
        return float('nan')
    return (statistics.stdev(data) / statistics.mean(data)) * 100

def longest_streak(outcomes, letter):
    max_streak = current = 0
    for o in outcomes:
        current = current + 1 if o == letter else 0
        max_streak = max(max_streak, current)
    return max_streak

def ascii_bar(label, value, max_value, bar_length=50):
    bar_count = int((value / max_value) * bar_length) if max_value > 0 else 0
    return f"{label:12}: {'#' * bar_count} ({value})"

def compare_methods():
    results = {
        "sequential": {"wins": 0, "success": 0, "checks": [], "times": []},
        "prefix": {"wins": 0, "success": 0, "checks": [], "times": []},
        "ties": 0
    }
    outcome_history = []
    total_blocks = ceil(TOTAL_SIZE / RANGE_SIZE)

    for _ in range(SIMULATIONS):
        max_offset = SECP256K1_ORDER - TOTAL_SIZE - 1
        offset = random.randint(0, max_offset)
        dataset = [offset + i for i in range(TOTAL_SIZE)]
        target_num = random.choice(dataset)
        target_hash = generate_h160(target_num)
        block_order = shuffled_block_order(total_blocks)

        start = time.perf_counter()
        seq_res = sequential_search(dataset, RANGE_SIZE, target_hash, block_order)
        seq_time = time.perf_counter() - start

        start = time.perf_counter()
        pre_res = prefix_search(dataset, RANGE_SIZE, PREFIX_LENGTH, target_hash, block_order)
        pre_time = time.perf_counter() - start

        for method, res, t in [("sequential", seq_res, seq_time), ("prefix", pre_res, pre_time)]:
            if res["found"]:
                results[method]["success"] += 1
                results[method]["checks"].append(res["checks"])
                results[method]["times"].append(t)

        if seq_res["found"] and pre_res["found"]:
            if seq_res["checks"] < pre_res["checks"]:
                results["sequential"]["wins"] += 1
                outcome_history.append("S")
            elif pre_res["checks"] < seq_res["checks"]:
                results["prefix"]["wins"] += 1
                outcome_history.append("P")
            else:
                results["ties"] += 1
                outcome_history.append("T")

    def get_stats(data):
        if not data:
            return {"mean": 0, "min": 0, "max": 0, "median": 0, "stdev": 0}
        return {
            "mean": statistics.mean(data),
            "min": min(data),
            "max": max(data),
            "median": statistics.median(data),
            "stdev": statistics.stdev(data) if len(data) > 1 else 0
        }

    seq_stats = get_stats(results["sequential"]["checks"])
    pre_stats = get_stats(results["prefix"]["checks"])
    seq_time_stats = get_stats(results["sequential"]["times"])
    pre_time_stats = get_stats(results["prefix"]["times"])

    seq_success_rate = results["sequential"]["success"] / SIMULATIONS
    pre_success_rate = results["prefix"]["success"] / SIMULATIONS

    total_comparisons = results["sequential"]["wins"] + results["prefix"]["wins"] + results["ties"]
    seq_win_rate = results["sequential"]["wins"] / total_comparisons if total_comparisons > 0 else 0
    pre_win_rate = results["prefix"]["wins"] / total_comparisons if total_comparisons > 0 else 0

    cv_seq = correct_coefficient_of_variation(results["sequential"]["checks"])
    cv_pre = correct_coefficient_of_variation(results["prefix"]["checks"])

    effect_size = compute_cohens_d(results["sequential"]["checks"], results["prefix"]["checks"])
    if len(results["sequential"]["checks"]) > 1 and len(results["prefix"]["checks"]) > 1:
        t_test = st.ttest_ind(results["sequential"]["checks"], results["prefix"]["checks"], equal_var=False)
    else:
        t_test = None

    print(f"""
=== FINAL ANALYSIS (CORRECTED) ===

[Success Rates]
Sequential: {seq_success_rate:.1%} ({results['sequential']['success']}/{SIMULATIONS})
Prefix:    {pre_success_rate:.1%} ({results['prefix']['success']}/{SIMULATIONS})

[Performance Metrics]
               | Sequential          | Prefix
---------------+---------------------+--------------------
Checks (mean)  | {seq_stats['mean']:>12,.1f} ± {seq_stats['stdev']:,.1f} | {pre_stats['mean']:>12,.1f} ± {pre_stats['stdev']:,.1f}
Time (mean ms) | {seq_time_stats['mean']*1000:>12.2f} ± {seq_time_stats['stdev']*1000:.2f} | {pre_time_stats['mean']*1000:>12.2f} ± {pre_time_stats['stdev']*1000:.2f}
Min checks     | {seq_stats['min']:>12,} | {pre_stats['min']:>12,}
Max checks     | {seq_stats['max']:>12,} | {pre_stats['max']:>12,}
Coef. Variation| {cv_seq:>11.1f}% | {cv_pre:>11.1f}%

[Comparison When Both Succeed]
Sequential wins: {results['sequential']['wins']} ({seq_win_rate:.1%})
Prefix wins:    {results['prefix']['wins']} ({pre_win_rate:.1%})
Ties:          {results['ties']}

[Statistical Significance]
Cohen's d: {effect_size:.3f}
Welch's t-test: {'t = %.3f, p = %.4f' % (t_test.statistic, t_test.pvalue) if t_test else 'Insufficient data'}
""")

    non_tie_outcomes = [o for o in outcome_history if o != "T"]
    streak_analysis = f"""
=== STREAK ANALYSIS ===
Longest Sequential streak: {longest_streak(outcome_history, 'S')}
Longest Prefix streak:    {longest_streak(outcome_history, 'P')}
Expected max streak:      {math.log(len(non_tie_outcomes), 2):.1f} (for {len(non_tie_outcomes)} trials)
"""
    print(streak_analysis)

    max_wins = max(results["sequential"]["wins"], results["prefix"]["wins"], results["ties"])
    print("=== WIN DISTRIBUTION ===")
    print(ascii_bar("Sequential", results["sequential"]["wins"], max_wins))
    print(ascii_bar("Prefix", results["prefix"]["wins"], max_wins))
    print(ascii_bar("Ties", results["ties"], max_wins))

if __name__ == '__main__':
    compare_methods()


results:

Code:
=== Configuration ===
Total numbers: 100,000
Block size: 4,096
Total blocks needed: 25
Prefix: 3 characters (16^3 = 4,096 combinations)
Simulations: 1000
secp256k1 order: 115792089237316195423570985008687907852837564279074904382605163141518161494337


=== FINAL ANALYSIS (CORRECTED) ===

[Success Rates]
Sequential: 100.0% (1000/1000)
Prefix:    100.0% (1000/1000)

[Performance Metrics]
               | Sequential          | Prefix
---------------+---------------------+--------------------
Checks (mean)  |     49,886.2 ± 28,945.2 |     49,104.3 ± 28,915.5
Time (mean ms) |       128.96 ± 75.96 |       130.12 ± 77.58
Min checks     |          160 |          160
Max checks     |       99,942 |       99,977
Coef. Variation|        58.0% |        58.9%

[Comparison When Both Succeed]
Sequential wins: 328 (32.8%)
Prefix wins:    609 (60.9%)
Ties:          63

[Statistical Significance]
Cohen's d: 0.027
Welch's t-test: t = 0.604, p = 0.5456


=== STREAK ANALYSIS ===
Longest Sequential streak: 6
Longest Prefix streak:    14
Expected max streak:      9.9 (for 937 trials)

=== WIN DISTRIBUTION ===
Sequential  : ########################## (328)
Prefix      : ################################################## (609)
Ties        : ##### (63)



The analysis performed both in terms of implementation and empirical results demonstrates that both search strategies (sequential and prefix-based) effectively accomplish the goal of finding a hash in a brute-force scenario, but their differences in efficiency are marginal.

Configuration and Execution: The simulation used 100,000 numbers divided into 25 blocks (each containing 4,096 elements) and was run over 1,000 trials. Both methods achieved a 100% success rate in every trial, confirming that the dataset structure and search strategy meet the basic requirement of exhaustiveness.

Performance Metrics: The metrics obtained (such as the average number of checks and execution time) are very similar between the two methods. Although the prefix search "wins" more often in individual comparisons (with 609 wins compared to 328 wins for the sequential method, and 63 ties), a comparison of the average number of checks shows a difference that is statistically insignificant (Cohen’s d of 0.027 and a Welch t-test with a p-value greater than 0.5). This indicates that the reduction in the number of checks achieved by the prefix search in some trials is offset in others, and on average both methods evaluate almost the same number of candidates.

Implications of the Prefix Strategy: The concept behind the prefix search is to early on "filter" blocks that have the potential to contain the target hash, thereby potentially speeding up the search process in some cases. However, when assessing a brute-force search where the primary cost is in calculating the hash and scanning the dataset, this additional filtering does not lead to a substantial improvement in overall performance. Essentially, the structure and randomness of the dataset mean that both strategies, on average, inspect roughly half of the search space before locating the target.

Statistical and Practical Aspects: The statistical tests confirm that the difference observed between the two methods is minimal. The closeness of the average values and the high dispersion of results (as reflected in the standard deviations) suggest that, from a practical standpoint, the benefit of using a prefix filter does not translate into significant improvements in brute-force scenarios.



In conclusion, in a brute-force environment where every possibility is evaluated to find a hash, neither the sequential search nor the prefix-based search demonstrates a significant overall efficiency advantage. The choice between them will therefore depend more on practical and design considerations rather than on pure performance metrics.

This is a self-moderated post meant to maintain a strict conversation, one that is respectful and based on arguments supported by verifiable evidence in practice.

▄▄█████████████████▄▄
▄█████████████████████▄
███▀▀█████▀▀░░▀▀███████

██▄░░▀▀░░▄▄██▄░░█████
█████░░░████████░░█████
████▌░▄░░█████▀░░██████
███▌░▐█▌░░▀▀▀▀░░▄██████
███░░▌██░░▄░░▄█████████
███▌░▀▄▀░░█▄░░█████████
████▄░░░▄███▄░░▀▀█▀▀███
██████████████▄▄░░░▄███
▀█████████████████████▀
▀▀█████████████████▀▀
Rainbet.com
CRYPTO CASINO & SPORTSBOOK
|
█▄█▄█▄███████▄█▄█▄█
███████████████████
███████████████████
███████████████████
█████▀█▀▀▄▄▄▀██████
█████▀▄▀████░██████
█████░██░█▀▄███████
████▄▀▀▄▄▀███████
█████████▄▀▄███
█████████████████
███████████████████
██████████████████
███████████████████
 
 $20,000 
WEEKLY RAFFLE
|



█████████
█████████ ██
▄▄█░▄░▄█▄░▄░█▄▄
▀██░▐█████▌░██▀
▄█▄░▀▀▀▀▀░▄█▄
▀▀▀█▄▄░▄▄█▀▀▀
▀█▀░▀█▀
10K
WEEKLY
RACE
100K
MONTHLY
RACE
|

██









█████
███████
███████
█▄
██████
████▄▄
█████████████▄
███████████████▄
░▄████████████████▄
▄██████████████████▄
███████████████▀████
██████████▀██████████
██████████████████
░█████████████████▀
░░▀███████████████▀
████▀▀███
███████▀▀
████████████████████   ██
 
[..►PLAY..]
 
████████   ██████████████
RetiredCoder
Full Member
***
Offline Offline

Activity: 134
Merit: 120


No pain, no gain!


View Profile WWW
April 22, 2025, 01:53:41 PM
Merited by mcdouglasx (6), ABCbits (1)
 #2

Yeah, finally I see something interesting here instead of endless blah-blah from people who think that they know everything, have >100 years of expertise etc.
To scan entire range (to get 100% probability of success), your method requires same number of ops as bruteforce (so it does not break any math laws), but it has better changes to find the key faster than bruteforce because it has non-linear probability of success (higher at the beginning and lower at the end comparing to bruteforce). Nice!
I think results can be improved a bit by using more complex logic, but your script demonstrates the main idea very well.
Thanks for the fun! Smiley I donated some btc to your address.

I've solved #120, #125, #130. How: https://github.com/RetiredC
mcdouglasx (OP)
Sr. Member
****
Offline Offline

Activity: 770
Merit: 408



View Profile WWW
April 22, 2025, 02:40:32 PM
 #3

Yeah, finally I see something interesting here instead of endless blah-blah from people who think that they know everything, have >100 years of expertise etc.
To scan entire range (to get 100% probability of success), your method requires same number of ops as bruteforce (so it does not break any math laws), but it has better changes to find the key faster than bruteforce because it has non-linear probability of success (higher at the beginning and lower at the end comparing to bruteforce). Nice!
I think results can be improved a bit by using more complex logic, but your script demonstrates the main idea very well.
Thanks for the fun! Smiley I donated some btc to your address.

Thank you, Rc, it is a very noble gesture on your part, especially during this time in my life. I truly appreciate it infinitely.

▄▄█████████████████▄▄
▄█████████████████████▄
███▀▀█████▀▀░░▀▀███████

██▄░░▀▀░░▄▄██▄░░█████
█████░░░████████░░█████
████▌░▄░░█████▀░░██████
███▌░▐█▌░░▀▀▀▀░░▄██████
███░░▌██░░▄░░▄█████████
███▌░▀▄▀░░█▄░░█████████
████▄░░░▄███▄░░▀▀█▀▀███
██████████████▄▄░░░▄███
▀█████████████████████▀
▀▀█████████████████▀▀
Rainbet.com
CRYPTO CASINO & SPORTSBOOK
|
█▄█▄█▄███████▄█▄█▄█
███████████████████
███████████████████
███████████████████
█████▀█▀▀▄▄▄▀██████
█████▀▄▀████░██████
█████░██░█▀▄███████
████▄▀▀▄▄▀███████
█████████▄▀▄███
█████████████████
███████████████████
██████████████████
███████████████████
 
 $20,000 
WEEKLY RAFFLE
|



█████████
█████████ ██
▄▄█░▄░▄█▄░▄░█▄▄
▀██░▐█████▌░██▀
▄█▄░▀▀▀▀▀░▄█▄
▀▀▀█▄▄░▄▄█▀▀▀
▀█▀░▀█▀
10K
WEEKLY
RACE
100K
MONTHLY
RACE
|

██









█████
███████
███████
█▄
██████
████▄▄
█████████████▄
███████████████▄
░▄████████████████▄
▄██████████████████▄
███████████████▀████
██████████▀██████████
██████████████████
░█████████████████▀
░░▀███████████████▀
████▀▀███
███████▀▀
████████████████████   ██
 
[..►PLAY..]
 
████████   ██████████████
kTimesG
Full Member
***
Offline Offline

Activity: 602
Merit: 206


View Profile
April 22, 2025, 03:22:42 PM
 #4

To scan entire range (to get 100% probability of success), your method requires same number of ops as bruteforce (so it does not break any math laws), but it has better changes to find the key faster than bruteforce because it has non-linear probability of success (higher at the beginning and lower at the end comparing to bruteforce). Nice!

So if I have N different colored balls in an urn, and it takes a hash(ball color) op to label every ball after every extraction, in order to compare the label with the target (or even to check the prefix), you are saying that the prefix method will extract the winning ball sooner, because you're stashing separately groups of balls.

If the prefix check would be a free or cheap operation, I would agree. However, the balls have no markings or clues on them in regard to their eventual label, and the average costs of both methods are the same, so the fact that the prefix method loses more badly (more operations) in the cases when sequential wins, is not accounted at all. Otherwise, the averages would not be identical. There's also no "loser" here since both methods account for the actual total number of ops to solve each hidden value.

For each his own though.

Off the grid, training pigeons to broadcast signed messages.
RetiredCoder
Full Member
***
Offline Offline

Activity: 134
Merit: 120


No pain, no gain!


View Profile WWW
April 22, 2025, 05:22:55 PM
Last edit: April 22, 2025, 05:37:50 PM by RetiredCoder
 #5

Hey, don't spoil magic with your balls! Cheesy
I adore sources that demonstrate something incredible, like fighting with bruteforce, it brings so much fun, I'm going to support it!
Because reality is so boring...

I've solved #120, #125, #130. How: https://github.com/RetiredC
btc11235
Jr. Member
*
Offline Offline

Activity: 35
Merit: 1


View Profile
April 23, 2025, 05:36:02 PM
 #6

Thank you for starting a new thread for this, and providing a full write-up of the method, etc 👍
viceversas
Newbie
*
Offline Offline

Activity: 8
Merit: 0


View Profile
April 23, 2025, 05:48:43 PM
 #7

great!

imo, in the context of prefix filtering for Bitcoin private key recovery, Hamming distance tolerance can be used to widen the match range without heavily sacrificing efficiency. Maybe combining not only prefix address or hash160, but both.  Smiley
DannyHamilton
Legendary
*
Offline Offline

Activity: 3640
Merit: 5177



View Profile
April 23, 2025, 08:21:33 PM
Merited by vapourminer (1)
 #8

- snip -
it first checks if the hash of any element begins with the same first three characters as the hash we're searching for.
- snip -

While I see a few things about this technique that are ignored and not included in the "findings", and I believe significantly better search algorithms already exist, I'm most curious why you chose THREE, specifically, as the number of "characters" to check?  Is there something about THREE that makes it better than TWO or FOUR (or any other number)?  Have you tried your simulations with other prefix sizes to find the optimal prefix size?

Also, it seems very inefficient to take a hash (which is a large binary number), and first convert it to a string representation of its value as represented in hexadecimal, and then use string manipulation to compare "characters"?  Wouldn't it be much faster and simpler to perform a bitwise XOR between some predetermined prefix of bits of the hash computed from the list element and the same prefix of bits on the target hash.

And, if you ARE going to use XOR, is there a benefit to taking the extra steps to generate a bitwise prefix since XOR is so incredibly fast as a comparison method? In the time you spend separating the prefix from the initial hash, the XOR between the full hash values would likely already have completed, wouldn't it?
mcdouglasx (OP)
Sr. Member
****
Offline Offline

Activity: 770
Merit: 408



View Profile WWW
April 23, 2025, 10:09:54 PM
 #9

- snip -
it first checks if the hash of any element begins with the same first three characters as the hash we're searching for.
- snip -

While I see a few things about this technique that are ignored and not included in the "findings", and I believe significantly better search algorithms already exist, I'm most curious why you chose THREE, specifically, as the number of "characters" to check?  Is there something about THREE that makes it better than TWO or FOUR (or any other number)?  Have you tried your simulations with other prefix sizes to find the optimal prefix size?

The probabilistic formula for the distribution of prefixes states that a prefix of length N has a probability of 1/16**N. Therefore, if you divide the range to be explored into blocks of size 16**N (or close), once you find a prefix, the probability of finding another within the same block will be very low/rare.

Consequently, you can stop scanning and save the remaining range. (If you use it for puzzle searches, you can explore it in a second pass if necessary).

Example: If you use a prefix "aabbccde" you should use blocks close to 16**8 or 4,294,967,296.

If you find the prefix at 2,000,000,000, it is very unlikely (though not impossible) to find another in the rest of the block.

Statistically, leaving that segment for a second pass if necessary, is the most convenient approach, saving computational power that would be used for searching the remaining block, where it would be very rare to find the prefix again.


Quote
Also, it seems very inefficient to take a hash (which is a large binary number), and first convert it to a string representation of its value as represented in hexadecimal, and then use string manipulation to compare "characters"?  Wouldn't it be much faster and simpler to perform a bitwise XOR between some predetermined prefix of bits of the hash computed from the list element and the same prefix of bits on the target hash.

And, if you ARE going to use XOR, is there a benefit to taking the extra steps to generate a bitwise prefix since XOR is so incredibly fast as a comparison method? In the time you spend separating the prefix from the initial hash, the XOR between the full hash values would likely already have completed, wouldn't it?

Yes, but since my intention was to compare both methods, the important thing is that the conditions remain the same. However, it is good to take your advice if one intends to implement it in a real scenario.


Quote
I believe significantly better search algorithms already exist

Out of curiosity, which algorithms are you referring to?

▄▄█████████████████▄▄
▄█████████████████████▄
███▀▀█████▀▀░░▀▀███████

██▄░░▀▀░░▄▄██▄░░█████
█████░░░████████░░█████
████▌░▄░░█████▀░░██████
███▌░▐█▌░░▀▀▀▀░░▄██████
███░░▌██░░▄░░▄█████████
███▌░▀▄▀░░█▄░░█████████
████▄░░░▄███▄░░▀▀█▀▀███
██████████████▄▄░░░▄███
▀█████████████████████▀
▀▀█████████████████▀▀
Rainbet.com
CRYPTO CASINO & SPORTSBOOK
|
█▄█▄█▄███████▄█▄█▄█
███████████████████
███████████████████
███████████████████
█████▀█▀▀▄▄▄▀██████
█████▀▄▀████░██████
█████░██░█▀▄███████
████▄▀▀▄▄▀███████
█████████▄▀▄███
█████████████████
███████████████████
██████████████████
███████████████████
 
 $20,000 
WEEKLY RAFFLE
|



█████████
█████████ ██
▄▄█░▄░▄█▄░▄░█▄▄
▀██░▐█████▌░██▀
▄█▄░▀▀▀▀▀░▄█▄
▀▀▀█▄▄░▄▄█▀▀▀
▀█▀░▀█▀
10K
WEEKLY
RACE
100K
MONTHLY
RACE
|

██









█████
███████
███████
█▄
██████
████▄▄
█████████████▄
███████████████▄
░▄████████████████▄
▄██████████████████▄
███████████████▀████
██████████▀██████████
██████████████████
░█████████████████▀
░░▀███████████████▀
████▀▀███
███████▀▀
████████████████████   ██
 
[..►PLAY..]
 
████████   ██████████████
LeTH3knXoDArzm
Newbie
*
Offline Offline

Activity: 11
Merit: 10


View Profile
April 24, 2025, 05:39:41 AM
 #10

I'm from my phone now, I'm not able to scroll the code but as far I understood you are scanning for "aabbcc" in a block, once found you move to next one. Right? If yes, it's like the "?" checksum in minikey, basically you get one valid ripemd starting with "00" once every 256~ hashes. And it's interesting. it's like kangaroo 🦘 for hashes if I got it right.
Bram24732
Member
**
Offline Offline

Activity: 182
Merit: 18


View Profile
April 24, 2025, 07:32:53 AM
 #11

I'm from my phone now, I'm not able to scroll the code but as far I understood you are scanning for "aabbcc" in a block, once found you move to next one. Right? If yes, it's like the "?" checksum in minikey, basically you get one valid ripemd starting with "00" once every 256~ hashes. And it's interesting. it's like kangaroo 🦘 for hashes if I got it right.

Kangaroos work completely differently from what is explained there.
There is a dedicated thread from RetiredCoder with details about the method.
mcdouglasx (OP)
Sr. Member
****
Offline Offline

Activity: 770
Merit: 408



View Profile WWW
May 02, 2025, 05:04:03 PM
 #12

Yeah, finally I see something interesting here instead of endless blah-blah from people who think that they know everything, have >100 years of expertise etc.
To scan entire range (to get 100% probability of success), your method requires same number of ops as bruteforce (so it does not break any math laws), but it has better changes to find the key faster than bruteforce because it has non-linear probability of success (higher at the beginning and lower at the end comparing to bruteforce). Nice!
I think results can be improved a bit by using more complex logic, but your script demonstrates the main idea very well.
Thanks for the fun! Smiley I donated some btc to your address.

Thank you, Rc, it is a very noble gesture on your part, especially during this time in my life. I truly appreciate it infinitely.

Why did you close the thread? The prefix method is better.

Make the donation worthwhile. Stay strong.

We sought to reach the truth, and we did.

If you have a better variation, publish it.

The donation has done its part, thanks to RC, I now have half the components for my new desktop. Wink Grin

▄▄█████████████████▄▄
▄█████████████████████▄
███▀▀█████▀▀░░▀▀███████

██▄░░▀▀░░▄▄██▄░░█████
█████░░░████████░░█████
████▌░▄░░█████▀░░██████
███▌░▐█▌░░▀▀▀▀░░▄██████
███░░▌██░░▄░░▄█████████
███▌░▀▄▀░░█▄░░█████████
████▄░░░▄███▄░░▀▀█▀▀███
██████████████▄▄░░░▄███
▀█████████████████████▀
▀▀█████████████████▀▀
Rainbet.com
CRYPTO CASINO & SPORTSBOOK
|
█▄█▄█▄███████▄█▄█▄█
███████████████████
███████████████████
███████████████████
█████▀█▀▀▄▄▄▀██████
█████▀▄▀████░██████
█████░██░█▀▄███████
████▄▀▀▄▄▀███████
█████████▄▀▄███
█████████████████
███████████████████
██████████████████
███████████████████
 
 $20,000 
WEEKLY RAFFLE
|



█████████
█████████ ██
▄▄█░▄░▄█▄░▄░█▄▄
▀██░▐█████▌░██▀
▄█▄░▀▀▀▀▀░▄█▄
▀▀▀█▄▄░▄▄█▀▀▀
▀█▀░▀█▀
10K
WEEKLY
RACE
100K
MONTHLY
RACE
|

██









█████
███████
███████
█▄
██████
████▄▄
█████████████▄
███████████████▄
░▄████████████████▄
▄██████████████████▄
███████████████▀████
██████████▀██████████
██████████████████
░█████████████████▀
░░▀███████████████▀
████▀▀███
███████▀▀
████████████████████   ██
 
[..►PLAY..]
 
████████   ██████████████
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!