Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

Bitcoin Forum

May 26, 2026, 06:59:23 PM

Welcome, Guest. Please login or register.

News: Latest Bitcoin Core release: 31.0 [Torrent]

Home

Help

Bitcoin Forum > Bitcoin > Development & Technical Discussion > Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 24 » All

« previous topic next topic »

	Author	Topic: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo (Read 18457 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic. (27 posts by 5+ users deleted.)

mcdouglasx

Hero Member

Activity: 1008
Merit: 601

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

March 27, 2026, 10:35:43 PM

#441

Rc, don't you think it would be better to change in RCGpuCore.cu:

Code:

jmp_ind = x[0] % JMP_CNT;

Code:

jmp_ind = x[1] % JMP_CNT;

I mean, because in the middle of `x` you'd have a better, more uniform distribution of jumps and less correlation. Or even use the combination `x[1] ^ x[0]`.

I think this would improve uniformity and reduce the loop rate.

I'm only including that line of code as a guide, since there are more that refer to this.

However, if you already have a generated database, this could damage it, but for those performing a new search, I think it would be worthwhile.

██
██
██
██
██
██
██
██
██
██
██
██
██

██
██
██

██
██
██
██
██
██
██

██
██
██
██
██

██
██

██
██
██
██
██
██
██
██
██
██

██
██

██
██
██
██
██
██
██
██
██

██
██
██

██
██
██
██
██
██

██
██
██
██
██
██
██
██
██
██

██
██
██
██
██
██
██
██
██
██
██
██
██

^{⚡ FAST 🔒 SECURE 🛡️ NO KYC} ^{^{EXCHANGE NOW}}

██
██
██
██
██
██
██
██
██
██
██
██
██

██
██

██
██
██
██
██
██

██
██

██
██
██
██
██
██
██
██

██
██
██
██
██
██
██
██
██
██
██

Bram24732

Member

Activity: 322
Merit: 28

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

March 28, 2026, 01:31:29 PM

#442

Quote from: mcdouglasx on March 27, 2026, 10:35:43 PM

Rc, don't you think it would be better to change in RCGpuCore.cu:

Code:

jmp_ind = x[0] % JMP_CNT;

Code:

jmp_ind = x[1] % JMP_CNT;

Why do you think x[1] has a better distribution than x[0] ?

I solved 67 and 68 using custom software distributing the load across ~25k GPUs. 4090 stocks speeds : ~8.1Bkeys/sec. Don’t challenge me technically if you know shit about fuck, I’ll ignore you. Same goes if all you can do is LLM reply.

kTimesG

Sr. Member

Activity: 840
Merit: 251

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

March 28, 2026, 03:41:37 PM

#443

Quote from: Bram24732 on March 28, 2026, 01:31:29 PM

Why do you think x[1] has a better distribution than x[0] ?

No, he is correct. And it's because when the DP uses the same bits as the jump index function, and you get to select a better X between two options, this adds bias for the even indices in the jump table, increasing their probability, hence losing the overall selection uniformity for the pseudo-random walk. This is easily proven (and very visible) by plotting the frequencies of the used jump indices - it's not uniform at all.

Even worse when selecting a DP based on more than one bit (powers of two indices get biased the more bits you get to compare between two X candidates).

Off the grid, training pigeons to broadcast signed messages.

mcdouglasx

Hero Member

Activity: 1008
Merit: 601

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

March 28, 2026, 04:57:27 PM

#444

Quote from: Bram24732 on March 28, 2026, 01:31:29 PM

Why do you think x[1] has a better distribution than x[0] ?

The central bits like x[1] are usually freer from algebraic biases derived from the modular arithmetic of the curve.

██
██
██
██
██
██
██
██
██
██
██
██
██

██
██
██

██
██
██
██
██
██
██

██
██
██
██
██

██
██

██
██
██
██
██
██
██
██
██
██

██
██

██
██
██
██
██
██
██
██
██

██
██
██

██
██
██
██
██
██

██
██
██
██
██
██
██
██
██
██

██
██
██
██
██
██
██
██
██
██
██
██
██

^{⚡ FAST 🔒 SECURE 🛡️ NO KYC} ^{^{EXCHANGE NOW}}

██
██
██
██
██
██
██
██
██
██
██
██
██

██
██

██
██
██
██
██
██

██
██

██
██
██
██
██
██
██
██

██
██
██
██
██
██
██
██
██
██
██

RetiredCoder (OP)

Full Member

Activity: 168
Merit: 171

No pain, no gain!

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

March 28, 2026, 05:55:12 PM

#445

Quote from: mcdouglasx on March 27, 2026, 10:35:43 PM

Rc, don't you think it would be better to change in RCGpuCore.cu:

Code:

jmp_ind = x[0] % JMP_CNT;

Code:

jmp_ind = x[1] % JMP_CNT;

So you think that the lowest bits of the X of secp256k1 points are not uniformly distributed, right? Why do you think so? I see same uniform distribution for any bits.
Anyway, it's very easy to check and I don't see any difference in my tests.

Quote from: kTimesG on March 28, 2026, 03:41:37 PM

No, he is correct. And it's because when the DP uses the same bits as the jump index function, and you get to select a better X between two options, this adds bias for the even indices in the jump table, increasing their probability, hence losing the overall selection uniformity for the pseudo-random walk. This is easily proven (and very visible) by plotting the frequencies of the used jump indices - it's not uniform at all.
Even worse when selecting a DP based on more than one bit (powers of two indices get biased the more bits you get to compare between two X candidates).

Just don't use same bits for DP, jumps and for point selection to avoid such issues

I've solved #120, #125, #130. How: https://github.com/RetiredC

kTimesG

Sr. Member

Activity: 840
Merit: 251

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

March 28, 2026, 06:13:29 PM

#446

Quote from: RetiredCoder on March 28, 2026, 05:55:12 PM

Just don't use same bits for DP, jumps and for point selection to avoid such issues

Exactly what I'm doing, the DP bits never intersect the jump function bits. Anyway, I thought this was the "point" behind McD's suggestion, but I guess he had something else in mind.

Off the grid, training pigeons to broadcast signed messages.

mcdouglasx

Hero Member

Activity: 1008
Merit: 601

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

March 28, 2026, 07:33:50 PM

#447

Quote from: RetiredCoder on March 28, 2026, 05:55:12 PM

RC, I did a quick test with 4 million real pubkeys, replicating the GPU logic exactly (Little-Endian and Y parity).

PUZZLE 67 (low range):

Code:

x[0]: chi2=519.37, bias=0.46%

x[1]: chi2=502.23, bias=0.46%

xor: chi2=499.66, bias=0.45%

xor: Wins by a narrow margin

PUZZLE 135 (mid range):

Code:

x[0]: chi2=507.51, bias=0.46%

x[1]: chi2=514.07, bias=0.48%

xor: chi2=524.48, bias=0.45%

x[0]: Wins.

My suggestion to use x[0] ^ x[1] is valid "on paper" because it reduces the inversion bias (inv_flag) and improves uniformity at low ranges, such as Puzzle 67.

However, in practice, your current implementation (x[0]) achieves near-perfect entropy at Puzzle 135 (Chi2: 507.51), which demonstrates that the bias naturally disappears as the search space grows.

your current code is statistically sound for large puzzles. My proposal to combine fragments acts as a safety net to avoid correlations with the Y coordinate, but it is not strictly necessary for the current kernel performance.

██
██
██
██
██
██
██
██
██
██
██
██
██

██
██
██

██
██
██
██
██
██
██

██
██
██
██
██

██
██

██
██
██
██
██
██
██
██
██
██

██
██

██
██
██
██
██
██
██
██
██

██
██
██

██
██
██
██
██
██

██
██
██
██
██
██
██
██
██
██

██
██
██
██
██
██
██
██
██
██
██
██
██

^{⚡ FAST 🔒 SECURE 🛡️ NO KYC} ^{^{EXCHANGE NOW}}

██
██
██
██
██
██
██
██
██
██
██
██
██

██
██

██
██
██
██
██
██

██
██

██
██
██
██
██
██
██
██

██
██
██
██
██
██
██
██
██
██
██

Bram24732

Member

Activity: 322
Merit: 28

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

March 29, 2026, 11:05:30 AM

#448

Quote from: kTimesG on March 28, 2026, 03:41:37 PM

Quote from: Bram24732 on March 28, 2026, 01:31:29 PM

Why do you think x[1] has a better distribution than x[0] ?

Oh it’s not a matter of curve distribution but more how you use it then. I didn’t read RC’s code so I didn’t spot this behaviour.

olnev

Newbie

Activity: 3
Merit: 0

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

April 03, 2026, 05:36:58 PM

#449

Quote from: castonrecovery on March 24, 2026, 07:02:54 AM

Thanks for testing it out and I am glad to hear it's working with your 6700. Please let me know if you wish to commit your improvements.

It's working on RX6400 (RDNA 2). Speed is ~226 MKey/s.
However I can't make it working on RX7600XT (RDNA 3):

Code:

DBG: RndPnts tame x_init=00000000000000000000000000000000 wild x_init=e25e16cabde26a99bb29878863c7e237
BENCH: Speed: 0 MKeys/s, Err: 0, DPs: 0K/9646K, Time: 0d:00h:00m/213503982334601d:07h:00m
BENCH: Speed: 0 MKeys/s, Err: 0, DPs: 0K/9646K, Time: 0d:00h:00m/213503982334601d:07h:00m
BENCH: Speed: 0 MKeys/s, Err: 0, DPs: 0K/9646K, Time: 0d:00h:00m/213503982334601d:07h:00m
BENCH: Speed: 0 MKeys/s, Err: 0, DPs: 0K/9646K, Time: 0d:00h:00m/213503982334601d:07h:00m

kTimesG

Sr. Member

Activity: 840
Merit: 251

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

April 04, 2026, 06:15:03 PM

#450

Quote from: Ykra on March 10, 2026, 10:54:19 PM

Quote from: kTimesG on March 10, 2026, 09:52:19 AM

How many raw FE mul/second are you getting on a stock RTX 4090 @450 W?

RawFE:: regs=40 threads=1024 block=256 throughput= 87487.5 Mmul/s

This is pre altering SASS.
I'll play around and update after some SASS alterations.

115 Gmul/s @ 413 W .... after some better carry chaining.

Code:

  int32 perf: 19912 G xAdd / s, 19912 G xMul / s
FE limb size: 32
blockDim: 256
Computing FE result [CPU]
        a = 0x3be98b46a0db920cfae7933e4bee03a58d393f4b3a6e8e1b3af8f4a1eed5c226
        b = 0xe6aeff98065a7a533a9705e4c9d949b357df42916c1c3bc1f2c20fe97552baa3
        Total ops: 16777216
        r = 0x183f14a5c2e101b5666fbca1d032228a4d2784225255aeccc96eab8578aa790a
[CPU] Total ops: 16777216 Speed: 40.85 Mo/s
Computing FE result [GPU]
TestFE kernel attributes:
                  registers: 46
        max threads / block: 1024
               local memory: 0 bytes
               const memory: 0 bytes
Launch parameters
         gridDim: 128
        blockDim: 1024
r = 183f14a5 c2e101b5 666fbca1 d032228a 4d278422 5255aecc c96eab85 78aa790a
Total ops: 2199023255552; Wall clock speed: 114399.89 Mo/s (19222250 ticks)
GPU speed: 114482.21 Mo/s (19208.43 ms); Kernel ticks: 51523633975
AVG ticks/op: 3071.05
SUCCESS!

Off the grid, training pigeons to broadcast signed messages.

Bilmehdi93

Newbie

Activity: 1
Merit: 0

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

April 10, 2026, 03:28:47 PM

#451

Hello dear Retired Coder, please can i get the Tame file for the puzzle 120 or 125 please, if this is possible please share it to my email bilmehdi93@gmail.com, thank you so much

RetiredCoder (OP)

Full Member

Activity: 168
Merit: 171

No pain, no gain!

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

April 22, 2026, 08:02:09 AM
Last edit: April 22, 2026, 01:18:25 PM by RetiredCoder

Merited by kTimesG (10), Ykra (10), Cricktor (4)

#452

Finally I had some time to prepare RCAsm for people who are brave enough to make their CUDA kernels faster

https://github.com/RetiredC/RCAsm

Why ASM? PTX is not powerful enough:

- You still cannot control registers usage.
- PTX does not provide all instructions, some of them can be really important if you are going to create really fast code.
- There is no way to declare fast functions: if you define "inline" function, it's just including its code so main code grows every time when you call that function. If it's not inline, calls are very slow.
- There is no way to use uniform registers and instructions directly.
- There is no way to specify control codes.
- There is no good management for carry flags, also some carry-related instruction are missed.
- You have to check what SASS is generated every time, spend time to convince compiler to make it as you want, etc. As a result, often ASM is really faster if you know what you are doing.

RCAsm features:

- sm89 and sm120 support.
- variables for R, UR, P.
- asm functions (include/call).
- supports constants and math expressions.
- automatic kernels injection into .cuasm file.
- simple but convenient editor for asm sources.
- #IF #ELSEIF #ENDIF support.
- open source, written in Python.

I hope you will have a lot of fun with this tool and SASS Cheesy

Also check Kernel01 sample for SASS implementation of MulMod256.

I've solved #120, #125, #130. How: https://github.com/RetiredC

Torin Keepler

Newbie

Activity: 43
Merit: 0

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

April 22, 2026, 04:46:58 PM
Last edit: April 22, 2026, 04:58:57 PM by Torin Keepler

#453

Quote from: RetiredCoder on April 22, 2026, 08:02:09 AM

Finally I had some time to prepare RCAsm for people who are brave enough to make their CUDA kernels faster

https://github.com/RetiredC/RCAsm

Why ASM? PTX is not powerful enough:

Huge thanks for the provided tools and code example!
The project is simply super. I managed to compile everything:
the assembler and the injection went absolutely successfully and without any errors.

Could you please tell me if you plan to release instructions or examples of assembly kernels specifically for the RCKangaroo program,
for different architectures, in the future?

Compiling code...
4 public units found:
  main.asm: CONST STRIDE
  main.asm: CONST INT_SIZE
  main.asm: KERNEL mulKernel
  mul.asm: FUNCTION MulMod256
  kernel mulKernel: compiled, regcnt: 255, asm_lines: 138
   appended functions:
injecting SUCCESSFUL
Done(372ms): compiling+injecting SUCCESSFUL

RetiredCoder (OP)

Full Member

Activity: 168
Merit: 171

No pain, no gain!

⇾ Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

April 22, 2026, 06:19:22 PM

#454

Quote from: Torin Keepler on April 22, 2026, 04:46:58 PM

Could you please tell me if you plan to release instructions or examples of assembly kernels specifically for the RCKangaroo program,
for different architectures, in the future?

Yes I will publish asm sources for my turbo kernels (both sm89 and sm120) for RCKangaroo as soon as #135 is solved (no matter who solves it).
It will happen soon, so you won’t have to wait long.

I've solved #120, #125, #130. How: https://github.com/RetiredC

Torin Keepler

Newbie

Activity: 43
Merit: 0

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

April 22, 2026, 07:17:51 PM
Last edit: April 26, 2026, 06:27:11 PM by Torin Keepler

#455

Quote from: RetiredCoder on April 22, 2026, 06:19:22 PM

Quote from: Torin Keepler on April 22, 2026, 04:46:58 PM

Could you please tell me if you plan to release instructions or examples of assembly kernels specifically for the RCKangaroo program,
for different architectures, in the future?

Yes I will publish asm sources for my turbo kernels (both sm89 and sm120) for RCKangaroo as soon as #135 is solved (no matter who solves it).
It will happen soon, so you won’t have to wait long.

I have questions regarding the current implementation of the RCKangaroo. Could you please provide some clarification? Thank you very much.

Potential race condition in KernelB
Also, I noticed a potential issue in KernelB when writing to LoopTable. The code currently uses BLOCK_X at the end of the index:

Kparams.LoopTable[MD_LEN * BLOCK_SIZE * PNT_GROUP_CNT * BLOCK_X + 2 * MD_LEN * BLOCK_SIZE * gr_ind2 + ind * BLOCK_SIZE + BLOCK_X] = RegsA;

Doesn't this cause all threads in the block to overwrite the exact same memory address at the same time? I changed the last BLOCK_X to THREAD_X so that each thread writes to its own unique column, and it seems to work perfectly. Was BLOCK_X just a typo here, or is there a specific reason for it?

Thanks in advance!

RetiredCoder (OP)

Full Member

Activity: 168
Merit: 171

No pain, no gain!

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

April 22, 2026, 08:42:53 PM

#456

Quote from: Torin Keepler on April 22, 2026, 07:17:51 PM

1. About jump sizes and Out-Of-Bounds drifting
When working with large search ranges, the jumps from Table 2 accumulate over a long period, which causes the kangaroos to eventually drift far outside the boundaries of the search range.
Because of this, do you think it would be better to use smaller jumps in Table 2 to prevent them from drifting out of bounds?

Jumps for all tables can be right or left (depends on Y), so I don't expect any serious drifts. Also I don't like the idea of reducing these jumps, but you can try it.

Quote from: Torin Keepler on April 22, 2026, 07:17:51 PM

2. Potential race condition in KernelB
Also, I noticed a potential issue in KernelB when writing to LoopTable. The code currently uses BLOCK_X at the end of the index:
Kparams.LoopTable[MD_LEN * BLOCK_SIZE * PNT_GROUP_CNT * BLOCK_X + 2 * MD_LEN * BLOCK_SIZE * gr_ind2 + ind * BLOCK_SIZE + BLOCK_X] = RegsA;
Doesn't this cause all threads in the block to overwrite the exact same memory address at the same time? I changed the last BLOCK_X to THREAD_X so that each thread writes to its own unique column, and it seems to work perfectly. Was BLOCK_X just a typo here, or is there a specific reason for it?

Oh, it's a bug, it must be THREAD_X of course! I have version 4.0 with a lot of changes to support asm kernels and some interesting ideas implemented, but I will upload it later (as I said above).

I've solved #120, #125, #130. How: https://github.com/RetiredC

RetiredCoder (OP)

Full Member

Activity: 168
Merit: 171

No pain, no gain!

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

April 22, 2026, 09:37:48 PM

#457

Quote from: Torin Keepler on April 22, 2026, 09:11:37 PM

Regarding the jumps, it is clear that their direction depends on the Y-coordinate, and in most cases,
they tend to bounce around approximately within their own local area (halo).
However, please observe this closely. After just a month of running,
a large number of kangaroos will end up in a space that is 4 bits larger. This is a serious problem.

If you see this issue, probably the best solution is just to restart a kangaroo after it hits DP. Or you can reduce jumps and hope that it wont cause other issues.

Quote from: Torin Keepler on April 22, 2026, 09:11:37 PM

By the way, I added an implementation that restricts kangaroo landings to even X-coordinates 75% of the time.

SOTA+ does this trick with the cheap point. If you managed to do it without using cheap point - you found a security issue of secp256k1.

I've solved #120, #125, #130. How: https://github.com/RetiredC

Torin Keepler

Newbie

Activity: 43
Merit: 0

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

April 22, 2026, 10:03:55 PM
Last edit: April 23, 2026, 01:57:12 PM by Torin Keepler

#458

Quote from: RetiredCoder on April 22, 2026, 09:37:48 PM

Quote from: Torin Keepler on April 22, 2026, 09:11:37 PM

If you see this issue, probably the best solution is just to restart a kangaroo after it hits DP. Or you can reduce jumps and hope that it wont cause other issues.

Quote from: Torin Keepler on April 22, 2026, 09:11:37 PM

By the way, I added an implementation that restricts kangaroo landings to even X-coordinates 75% of the time.

SOTA+ does this trick with the cheap point. If you managed to do it without using cheap point - you found a security issue of secp256k1.

Yes, I have implemented the restart method, but I'm currently using DP30. By the time a kangaroo hits a distinguished point (DP),
it ends up doing quite a bit of redundant work. Therefore, I think it is more optimal to use a different loop exit value.

Regarding the X-coordinate parity, that is exactly how I implemented it - using a "cheap" second point - but for now,
I'm stuck on implementing a cheap loop detector.

RetiredCoder (OP)

Full Member

Activity: 168
Merit: 171

No pain, no gain!

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

April 23, 2026, 06:04:21 AM

#459

Quote from: Torin Keepler on April 22, 2026, 10:03:55 PM

Could you briefly explain why you consider the coefficient in the SOTA+ method to be slightly better?

Because I have proofs: https://github.com/RetiredC/Kang-1

I've solved #120, #125, #130. How: https://github.com/RetiredC

Ykra

Newbie

Activity: 16
Merit: 32

Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

April 23, 2026, 02:40:31 PM

Merited by RetiredCoder (30)

#460

Quote from: kTimesG on April 04, 2026, 06:15:03 PM

Quote from: Ykra on March 10, 2026, 10:54:19 PM

Quote from: kTimesG on March 10, 2026, 09:52:19 AM

How many raw FE mul/second are you getting on a stock RTX 4090 @450 W?

RawFE:: regs=40 threads=1024 block=256 throughput= 87487.5 Mmul/s

This is pre altering SASS.
I'll play around and update after some SASS alterations.

115 Gmul/s @ 413 W .... after some better carry chaining.

Oh huge improvement! I did go ahead with some SASS tuning + testing on the 4090 I was renting.
Best result I achieved was 102318.5 Mmul/s but was hard limited by the 300W power ceiling on that specific card, I've been a bit consumed by other things as of late but always interested in new developments here.

Quote from: RetiredCoder on April 22, 2026, 08:02:09 AM

Finally I had some time to prepare RCAsm for people who are brave enough to make their CUDA kernels faster

https://github.com/RetiredC/RCAsm

This is great, really appreciate you sharing your work. Would +merit but too newbie to do so, so pretend I did.
This would have saved some of my mental when I was making my own IDE (and CuAsm with fixes) for my 5090 work, then again part of the struggle is part of the fun I guess.

I learnt a lot from redplait's blog with his deep diving into SASS amongst other interesting things, can see some of his interesting work here: https://github.com/redplait/denvdis

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 24 » All

Bitcoin Forum > Bitcoin > Development & Technical Discussion > Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

« previous topic next topic »

Jump to: