farou9
Newbie
Offline
Activity: 89
Merit: 0
|
 |
April 12, 2025, 06:34:23 PM |
|
I am curious why the curve parameters where chosen with different order for the scalars n and order for the coordinates p
|
|
|
|
|
zahid888
Member

Offline
Activity: 335
Merit: 24
the right steps towards the goal
|
 |
April 12, 2025, 06:44:23 PM |
|
Right now I am doing GPU Cyclone version, it has hashing and comparing to target hash160, on rtx 4060 it has 4.3 Ghash/s, that faster than bitcrack, vanity, etc.
Hey, I’ve got the SHA-256 part nailed down in sha256_gpu.cu, but I’m hitting a wall with RIPEMD-160 in ripemd160_gpu.cu. Right now, my RIPEMD-160 code outputs 499d42dd5724b7522a1c4bd895876e1232957cbe for SHA-256("abc"), but it should be 8eb208f7e05d987a9b044a8e98c6b087f15a0bfc. I’ve been digging into BitCrack’s ripemd160.cuh for clues—tweaking byte orders and round functions—but I’m still stuck. Would you be willing to share how you’ve implemented RIPEMD-160 in CUDA? I’d love to see a working ripemd160_gpu.cu
|
1BGvwggxfCaHGykKrVXX7fk8GYaLQpeixA
|
|
|
|
kTimesG
|
 |
April 12, 2025, 08:11:47 PM |
|
Right now I am doing GPU Cyclone version, it has hashing and comparing to target hash160, on rtx 4060 it has 4.3 Ghash/s, that faster than bitcrack, vanity, etc.
Hey, I’ve got the SHA-256 part nailed down in sha256_gpu.cu, but I’m hitting a wall with RIPEMD-160 in ripemd160_gpu.cu. Right now, my RIPEMD-160 code outputs 499d42dd5724b7522a1c4bd895876e1232957cbe for SHA-256("abc"), but it should be 8eb208f7e05d987a9b044a8e98c6b087f15a0bfc. I’ve been digging into BitCrack’s ripemd160.cuh for clues—tweaking byte orders and round functions—but I’m still stuck. Would you be willing to share how you’ve implemented RIPEMD-160 in CUDA? I’d love to see a working ripemd160_gpu.cu Both of those hashes are incorrect. Why don't you simply print the inputs to debug where the problem is? For example to check if the SHA output is correct (I assume it's not): if (!(threadIdx.x || blockIdx.x)) printf("%08lx %08lx %08lx %08lx\n", buf[0], buf[1], buf[2], buf[3]);
RIPEMD-160 has a 64 byte input (16 uint32) but you need to change the endianness of the four ints that form the SHA.
|
Off the grid, training pigeons to broadcast signed messages.
|
|
|
Desyationer
Jr. Member
Offline
Activity: 64
Merit: 2
|
 |
April 12, 2025, 10:34:59 PM |
|
I'm not a programmer at all — the AI writes all the code for me. But it still couldn't give me a clear answer to my question: is it possible to iterate through private keys within a given range while instantly filtering out "unreliable" keys, without affecting the speed of the iteration itself? Or does the CUDA architecture require a strictly linear brute-force approach with no filtering, in order to maintain high performance — making any real-time filtering too resource-heavy due to the sheer size of the keyspace? I couldn't even implement this in Python with the help of the AI: writing a basic key iterator is easy, but as soon as I add even the simplest filter, the script stops working properly. Despite many attempts, I couldn’t get anywhere with the AI's help.
For example, if we start scanning a range from 0x10000000000 to 0x1FFFFFFFF, it's obvious that many keys like 0x100000001, 0x10001002, and so on are extremely unlikely to be "golden" keys. So applying a filter to aggressively exclude clearly implausible keys could potentially reduce the effective range by up to 30%.
|
|
|
|
|
AlanJohnson
Member

Offline
Activity: 185
Merit: 11
|
 |
April 13, 2025, 05:08:35 AM |
|
I'm not a programmer at all — the AI writes all the code for me. But it still couldn't give me a clear answer to my question: is it possible to iterate through private keys within a given range while instantly filtering out "unreliable" keys, without affecting the speed of the iteration itself? Or does the CUDA architecture require a strictly linear brute-force approach with no filtering, in order to maintain high performance — making any real-time filtering too resource-heavy due to the sheer size of the keyspace? I couldn't even implement this in Python with the help of the AI: writing a basic key iterator is easy, but as soon as I add even the simplest filter, the script stops working properly. Despite many attempts, I couldn’t get anywhere with the AI's help.
For example, if we start scanning a range from 0x10000000000 to 0x1FFFFFFFF, it's obvious that many keys like 0x100000001, 0x10001002, and so on are extremely unlikely to be "golden" keys. So applying a filter to aggressively exclude clearly implausible keys could potentially reduce the effective range by up to 30%.
I was thiniking about it too. But after some checking it turned out it won't help you much. The range still remains enormous.
|
|
|
|
|
Bram24732
Member

Offline
Activity: 322
Merit: 28
|
 |
April 13, 2025, 05:34:39 AM |
|
I'm not a programmer at all — the AI writes all the code for me. But it still couldn't give me a clear answer to my question: is it possible to iterate through private keys within a given range while instantly filtering out "unreliable" keys, without affecting the speed of the iteration itself? Or does the CUDA architecture require a strictly linear brute-force approach with no filtering, in order to maintain high performance — making any real-time filtering too resource-heavy due to the sheer size of the keyspace? I couldn't even implement this in Python with the help of the AI: writing a basic key iterator is easy, but as soon as I add even the simplest filter, the script stops working properly. Despite many attempts, I couldn’t get anywhere with the AI's help.
For example, if we start scanning a range from 0x10000000000 to 0x1FFFFFFFF, it's obvious that many keys like 0x100000001, 0x10001002, and so on are extremely unlikely to be "golden" keys. So applying a filter to aggressively exclude clearly implausible keys could potentially reduce the effective range by up to 30%.
I will try to answer in a non programmer friendly way. Let me know if I go too technical. CUDA is designed to work with GPUs in blocks of 32. So if you don’t need one of those 32 keys, you still need to wait for the “legit” keys to be processed before starting the next batch of 32. You might as well use this time to do the actual calculation instead of leaving the core idle. That’s not to say you can’t code your idea efficiently in CUDA though. But you need to think a little about it. You would need to order keys in a way that all the “bad” keys are at the end of the range, and all “good” keys are up front. This way you only have “good” and “dense” 32 keys blocks which are executed in full without losing speed.
|
I solved 67 and 68 using custom software distributing the load across ~25k GPUs. 4090 stocks speeds : ~8.1Bkeys/sec. Don’t challenge me technically if you know shit about fuck, I’ll ignore you. Same goes if all you can do is LLM reply.
|
|
|
|
nomachine
|
 |
April 13, 2025, 06:22:19 AM |
|
is it possible to iterate through private keys within a given range while instantly filtering out "unreliable" keys, without affecting the speed of the iteration itself?
NO. No, it’s not possible to pre-filter or skip private keys in a meaningful way that speeds up Bitcoin private key brute-forcing. Every integer in the range [1, n–1] (where n is the SECP256K1 curve order) is a valid private key. There’s no cryptographic method to predict whether a key will generate a used or empty address without fully computing its public key and address. There is no "magic circle" or remote viewer who can predict this. Even if you were God the Father, you couldn’t bypass the mathematics.There’s no mathematical shortcut to determine which keys will map to specific address patterns without performing the full computation. All operations must be performed modulo n (where n is the curve order). The mod p operation (for SECP256K1) must be applied to every candidate—this is non-negotiable for valid elliptic curve cryptography (ECC) operations. The relationship between input numbers and their reduced forms is non-linear. The real computational bottleneck isn’t in checking keys but in the elliptic curve multiplication (ModMulK) required to derive public keys from private keys. Even with highly optimized libraries like SECP256K1, you’re limited to roughly 6 million keys per second per CPU core (e.g., using Cyclone) on high-end hardware
|
BTC: bc1qdwnxr7s08xwelpjy3cc52rrxg63xsmagv50fa8
|
|
|
Akito S. M. Hosana
Jr. Member
Offline
Activity: 420
Merit: 8
|
 |
April 13, 2025, 06:36:54 AM |
|
is it possible to iterate through private keys within a given range while instantly filtering out "unreliable" keys, without affecting the speed of the iteration itself?
NO..... Even if you were God the Father, you couldn’t bypass the mathematics.The real computational bottleneck isn’t in checking keys but in the elliptic curve multiplication (ModMulK) required to derive public keys from private keys. So it’s futile for me to pray to God to guess the puzzle. Without an accelerated ModMulK, is there really nothing that can be done? 
|
|
|
|
|
|
nomachine
|
 |
April 13, 2025, 07:04:32 AM |
|
So it’s futile for me to pray to God to guess the puzzle. Without an accelerated ModMulK, is there really nothing that can be done?  To speed up ModMulK, you generally have a few options. There is no new math available that could solve this. If the algorithm is already near-optimal (e.g., using Montgomery reduction), then hardware is the only way to go faster. Rewriting ModMulK in assembly can improve performance, but only if the compiler’s output is suboptimal (e.g., missed register allocations, unnecessary spills, bugs). Example: modMulK: mov rax, rdi ; x mul rsi ; x * K (rdx:rax = full product) mov rcx, rdx ; high bits shld rdx, rax, 64 ; prepare for reduction ; ... Montgomery steps ... ret Replacing div with Montgomery/Barrett Can Give a 5–10x Speedup. These methods replace division with multiplications, shifts, and additions, which are much faster (often 1-5 cycles per operation). However, if the compiler already used Montgomery or Barrett reduction, the improvement might be smaller—perhaps just 10–20% from fine-tuning. You can also exploit CPU-specific features like AVX-256/AVX-512, carry-less multiplication, or fused operations—similar to techniques used in Cyclone. Alternatively, 3,000 GPUs would indeed speed things up—assuming you have enough money to throw it out the window with a shovel. Therefore, you should already be rich enough to participate in this puzzle. This is the "brute-force" approach—scaling horizontally with hardware and a lot of cash.
|
BTC: bc1qdwnxr7s08xwelpjy3cc52rrxg63xsmagv50fa8
|
|
|
JackMazzoni
Jr. Member
Offline
Activity: 207
Merit: 7
|
 |
April 13, 2025, 07:20:30 AM |
|
|
Need Wallet Recovery? PM ME. 100% SAFE
|
|
|
Akito S. M. Hosana
Jr. Member
Offline
Activity: 420
Merit: 8
|
 |
April 13, 2025, 07:58:52 AM |
|
you should already be rich enough to participate in this puzzle.
So, honestly speaking, this puzzle was made by a rich person for rich people—the rich will get richer, and the poor will get poorer. Right? 
|
|
|
|
|
|
nomachine
|
 |
April 13, 2025, 08:07:12 AM |
|
you should already be rich enough to participate in this puzzle.
So, honestly speaking, this puzzle was made by a rich person for rich people—the rich will get richer, and the poor will get poorer. Right?  Yep... Or maybe you manage to use vast.ai for free by hacking their system—exploiting 3,000 GPUs, stolen credit cards, or even the puzzle creator himself, as you’ve already written somewhere. The problem is, you could end up in prison. It’s better to just go fishing. 
|
BTC: bc1qdwnxr7s08xwelpjy3cc52rrxg63xsmagv50fa8
|
|
|
Akito S. M. Hosana
Jr. Member
Offline
Activity: 420
Merit: 8
|
 |
April 13, 2025, 08:21:16 AM |
|
you should already be rich enough to participate in this puzzle.
So, honestly speaking, this puzzle was made by a rich person for rich people—the rich will get richer, and the poor will get poorer. Right?  Yep... Or maybe you manage to use vast.ai for free by hacking their system—exploiting 3,000 GPUs, stolen credit cards, or even the puzzle creator himself, as you’ve already written somewhere. The problem is, you could end up in prison. It’s better to just go fishing.  Do you really think these puzzles are solved fairly? With savings? Who even has that much in savings? Come on! 
|
|
|
|
|
|
nomachine
|
 |
April 13, 2025, 08:26:32 AM |
|
you should already be rich enough to participate in this puzzle.
So, honestly speaking, this puzzle was made by a rich person for rich people—the rich will get richer, and the poor will get poorer. Right?  Yep... Or maybe you manage to use vast.ai for free by hacking their system—exploiting 3,000 GPUs, stolen credit cards, or even the puzzle creator himself, as you’ve already written somewhere. The problem is, you could end up in prison. It’s better to just go fishing.  Do you really think these puzzles are solved fairly? With savings? Who even has that much in savings? Come on!  Here at my place, you can borrow plenty of money from the mafia. The problem is, if you don't pay it back, you'll end up missing some body parts or family members. It’s better to just go fishing. 
|
BTC: bc1qdwnxr7s08xwelpjy3cc52rrxg63xsmagv50fa8
|
|
|
Bram24732
Member

Offline
Activity: 322
Merit: 28
|
 |
April 13, 2025, 08:29:41 AM |
|
you should already be rich enough to participate in this puzzle.
So, honestly speaking, this puzzle was made by a rich person for rich people—the rich will get richer, and the poor will get poorer. Right?  Yep... Or maybe you manage to use vast.ai for free by hacking their system—exploiting 3,000 GPUs, stolen credit cards, or even the puzzle creator himself, as you’ve already written somewhere. The problem is, you could end up in prison. It’s better to just go fishing.  Do you really think these puzzles are solved fairly? With savings? Who even has that much in savings? Come on!  Or you know, you crowdfund money with people and split the reward. You use the money you collect to make deals with GPU farms and benefit from economies of scale. Just an idea, it might just work.
|
I solved 67 and 68 using custom software distributing the load across ~25k GPUs. 4090 stocks speeds : ~8.1Bkeys/sec. Don’t challenge me technically if you know shit about fuck, I’ll ignore you. Same goes if all you can do is LLM reply.
|
|
|
|
nomachine
|
 |
April 13, 2025, 08:45:32 AM |
|
Alternatively, you can borrow the full amount from the bank. However, they will require your house and property as collateral. 
|
BTC: bc1qdwnxr7s08xwelpjy3cc52rrxg63xsmagv50fa8
|
|
|
Bram24732
Member

Offline
Activity: 322
Merit: 28
|
 |
April 13, 2025, 08:47:24 AM |
|
Alternatively, you can borrow the full amount from the bank. However, they will require your house and property as collateral.  Better make sure there's no bug in your code
|
I solved 67 and 68 using custom software distributing the load across ~25k GPUs. 4090 stocks speeds : ~8.1Bkeys/sec. Don’t challenge me technically if you know shit about fuck, I’ll ignore you. Same goes if all you can do is LLM reply.
|
|
|
Akito S. M. Hosana
Jr. Member
Offline
Activity: 420
Merit: 8
|
 |
April 13, 2025, 08:51:39 AM Last edit: April 13, 2025, 09:02:48 AM by Akito S. M. Hosana |
|
you can borrow plenty of money from the mafia. The problem is, if you don't pay it back, you'll end up missing some body parts or family members.
Alternatively, you can borrow the full amount from the bank. However, they will require your house and property as collateral.  There's no big difference here. If you screw up, you can just hang yourself. And all that For 3000 GPUs - no chance.  Or you know, you crowdfund money with people and split the reward. You use the money you collect to make deals with GPU farms and benefit from economies of scale. Just an idea, it might just work.
This sounds reasonable.
|
|
|
|
|
|
kTimesG
|
 |
April 13, 2025, 09:13:01 AM |
|
I'm not a programmer at all — the AI writes all the code for me. But it still couldn't give me a clear answer to my question: is it possible to iterate through private keys within a given range while instantly filtering out "unreliable" keys, without affecting the speed of the iteration itself?
Here's the non-AI (correct) answer. Once you filter out a single key, then computing the very next key would require one of these 2 options: 1. Multiplication with G. 2. Addition with a precomputed delta*G public key. 2 is faster, but you need to know how many precomputed deltas you'll ever need, or come up with a complicated strategy to optimize the "jump" using your precomputed delta keys, without resorting to option #1. Also, the CUDA kernel would need to know what private key is computing, which, without filtering concept, is totally not needed at all. The fastest kernels simply go from pubKey A to pubkey B without ever caring what the private key is associated with each result. The CPU handles this association (and also maybe the initial starting point multiplications, as its only needed one time only), freeing up computational time on the GPU. Oh, and also it's a fallacy to ever think some whatever pattern can't result in the solution. There are multiple puzzles where we had sequences of long 1s or 0s in a row, which would have been skipped if this pattern filtering thing would have been used. LE: there's also option #3: 3. Compute the public key as usual, but skip the hashing part if you don't like how the private key looks like. However, because of the need to do this check, I don't think it will result in a higher speed than the usual way. Needs checking, I cannot be certain about this.
|
Off the grid, training pigeons to broadcast signed messages.
|
|
|
Desyationer
Jr. Member
Offline
Activity: 64
Merit: 2
|
 |
April 13, 2025, 09:53:10 AM |
|
kTimesG nomachine Bram24732 AlanJohnson Thanks for the clarification. It's now clear that if the CUDA algorithm achieves its high speed by computing the delta from the previous key, then applying a filter would likely be ineffective. Even a 30% reduction in the range probably wouldn't offset the performance loss caused by filtering—assuming it's even possible to implement it. On standard CPUs, filtering might be slightly more reasonable, but their speed is vastly inferior compared to GPUs. Overall, even if filtering could be perfectly implemented on a GPU without any performance drop, a 30% reduction in the key space wouldn't make much of a difference for anyone without access to thousands of GPUs. The real benefit would come not from a 30% reduction, but from reducing the range by several orders of magnitude.
|
|
|
|
|
|