Bitcoin Forum
March 27, 2026, 01:59:36 PM *
News: Latest Bitcoin Core release: 30.2 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [16] 17 18 19 20 21 22 »  All
  Print  
Author Topic: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo  (Read 16914 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic. (11 posts by 1+ user deleted.)
kTimesG
Full Member
***
Offline Offline

Activity: 784
Merit: 240


View Profile
October 18, 2025, 03:08:20 PM
 #301

k = 0.625  Shocked benchmark test?

 -- Wilds are always Even but the Tames are not

Relax, Biff! k's down to 0.6 only when all points are golden and stuff's jumpin' both left AND right via the cheapos. So, hash table's up 2x to 1.25 sqrt.

By the same logic BSGS's actually faster AND better, bcz the cheapos are in full effect over there and we might as well say that 1.06 sqrt is actually k = 0.53. Bonus: it eats up less space as well (0.71 sqrt).

If you stick your head up from the established norms you might understand why Wildos aren't really just at an even distance (and nah, having jump deltas even becomes irrelevant at that time). The perspective of how you look at them plays a big role. I'm not gonna say more ATM because then it becomes obvious.

Off the grid, training pigeons to broadcast signed messages.
RetiredCoder (OP)
Full Member
***
Offline Offline

Activity: 162
Merit: 141


No pain, no gain!


View Profile WWW
October 18, 2025, 08:40:00 PM
 #302

But you know what? Cycles don't really need so much exhaustive burnout if you mastermind the behavioral headaches of roundabouts in a much more simpler way. No hoverboard needed.

I'm glad that you finally like cycles Smiley

I've solved #120, #125, #130. How: https://github.com/RetiredC
vdog99
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
October 24, 2025, 03:20:28 PM
 #303

Hey RC

How do you go about incorporating SOTA+ with the GPU Version of SOTA In the RCKangaroo implementation of your Code (SOTA+ calculates of P - J and P + J causes a loop: A -> B (Add) and B -> A (Sub)) especially the Loop detection and Escape in Kernel A?

Should we just rely on KernelB Loop D&E ((average latency of STEP_CNT/2 = 500 steps)) and remove the L1S2 which would then Lower K value correct since Kernel B is expensive?

In Kang-2 you used a multi-List method but that would not make sense for a GPU Implementation?

Is there a way to integrate SOTA+ in RCKangaroo while doing accurate and correct Loop Detection?
Efim_Wr
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
October 25, 2025, 11:24:59 PM
 #304

Hi all. Here are the performance figures for my video cards.
CMP 70HX  ~1711 MKeys/s
RTX 5070   ~3400 MKeys/s
CMP 40HX  ~1500 MKeys/s

RetiredCoder, you're a genius. Thanks for all your hard work.
I have some small developments in the field of EC. You're probably familiar with them, but if you have anything, please send me a private message.
Cricktor
Legendary
*
Offline Offline

Activity: 1442
Merit: 3801



View Profile
October 26, 2025, 10:56:11 AM
 #305

Sounds like your "small developments in the field of EC" are not for the public? If so, then why talk about it?

███████████████████████████
███████▄████████████▄██████
████████▄████████▄████████
███▀█████▀▄███▄▀█████▀███
█████▀█▀▄██▀▀▀██▄▀█▀█████
███████▄███████████▄███████
███████████████████████████
███████▀███████████▀███████
████▄██▄▀██▄▄▄██▀▄██▄████
████▄████▄▀███▀▄████▄████
██▄███▀▀█▀██████▀█▀███▄███
██▀█▀████████████████▀█▀███
███████████████████████████
.
.Duelbits PREDICT..
█████████████████████████
█████████████████████████
███████████▀▀░░░░▀▀██████
██████████░░▄████▄░░████
█████████░░████████░░████
█████████░░████████░░████
█████████▄▀██████▀▄████
████████▀▀░░░▀▀▀▀░░▄█████
██████▀░░░░██▄▄▄▄████████
████▀░░░░▄███████████████
█████▄▄█████████████████
█████████████████████████
█████████████████████████
.
.WHERE EVERYTHING IS A MARKET..
█████
██
██







██
██
██████
Will Bitcoin hit $200,000
before January 1st 2027?

    No @1.15         Yes @6.00    
█████
██
██







██
██
██████

  CHECK MORE > 
kTimesG
Full Member
***
Offline Offline

Activity: 784
Merit: 240


View Profile
October 26, 2025, 06:15:32 PM
 #306

Is there a way to integrate SOTA+ in RCKangaroo while doing accurate and correct Loop Detection?

I think SOTA+ simply means always computing both P+J and P-J and choosing the best route of the two. What stops you from doing that, if it's not already in the code?

IMHO there is no such thing as an accurate and correct loop detection. 2-cycles are totally trivial. But what if the loop is of size 1024? Good luck detecting that when you're only tracking the last 8 or 16 distances, even if it's close to impossible for something like that to ever occur during the lifespan of our galaxy. Are there algos to detect very complicated cycles? I bet there must be, but it's most likely complicated useless overkill. If you do the math, you'll find out that from a point on, cycles longer than some very short length don't even matter anymore, compared to the huge amount of walks you're supposed to create and maintain. They're the disposable 0.001% of the entire amount of power needed to solve some DLP. Also choosing very fine-tuned jump distances can avoid a lot of the cycles from the very start.

Off the grid, training pigeons to broadcast signed messages.
RetiredCoder (OP)
Full Member
***
Offline Offline

Activity: 162
Merit: 141


No pain, no gain!


View Profile WWW
October 28, 2025, 06:49:52 AM
 #307

How do you go about incorporating SOTA+ with the GPU Version of SOTA In the RCKangaroo implementation of your Code

It's not difficult to upgrade SOTA to SOTA+ in RCKangaroo, but the group size is 24 or 64 there, so
cheap point is not so cheap and therefore it won't be faster.
In general, SOTA+ is faster only when group size is about 12-16 or less.

I've solved #120, #125, #130. How: https://github.com/RetiredC
vdog99
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
October 28, 2025, 11:29:34 PM
Last edit: October 29, 2025, 02:59:01 PM by vdog99
 #308

How do you go about incorporating SOTA+ with the GPU Version of SOTA In the RCKangaroo implementation of your Code

It's not difficult to upgrade SOTA to SOTA+ in RCKangaroo, but the group size is 24 or 64 there, so
cheap point is not so cheap and therefore it won't be faster.

Oh that makes sense I couldn't get to improve its K than the current one, in spite of everything I did.
kTimesG
Full Member
***
Offline Offline

Activity: 784
Merit: 240


View Profile
October 29, 2025, 06:38:40 PM
 #309

But you know what? Cycles don't really need so much exhaustive burnout if you mastermind the behavioral headaches of roundabouts in a much more simpler way. No hoverboard needed.

I'm glad that you finally like cycles Smiley

Yo Doc. I'm honestly in a loop on loop handling. I frankesteined my way to do a sorta' but not identical SOTA+ (let's call it SOTA++) in CUDA because I want that 1.0 sqrt after all. Was seriously thinking this is what your RCKang does already (based on some code comments in Kang1 or Kang2 that sounded like "on a GPU, we always compute both of x3, because no benefits etc.". My bad. I'm using 8 as group size so basically I'm jumping 80% faster than a forward-only creature.

But it all got nuked when I got to the point of having to escape cycles. Detecting them was a breeze. Escaping them? It's simpler to get out of a Guantanamo Bay prison in one piece.

Check this out: a simple 4 cycle:

A -> B -> C -> B -> A -> D

Now, the problem with this circuit is that B and C are in a 2-cycle that needs to exit at B (to keep up symmetric walk), but also B and A are in a 2-cycle as well (this dude's exiting on A), so D's the exit point because B's just been visited. Crap.

Yeah, the 4-cycle auto-exited by itself during the second 2-cycle handling. That's totally f***d up, but also so beautiful.

I hate cycles.

Off the grid, training pigeons to broadcast signed messages.
vdog99
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
October 29, 2025, 07:52:07 PM
Last edit: October 29, 2025, 08:05:53 PM by vdog99
 #310

But you know what? Cycles don't really need so much exhaustive burnout if you mastermind the behavioral headaches of roundabouts in a much more simpler way. No hoverboard needed.

I'm glad that you finally like cycles Smiley

Yo Doc. I'm honestly in a loop on loop handling. I frankesteined my way to do a sorta' but not identical SOTA+ (let's call it SOTA++) in CUDA because I want that 1.0 sqrt after all. Was seriously thinking this is what your RCKang does already (based on some code comments in Kang1 or Kang2 that sounded like "on a GPU, we always compute both of x3, because no benefits etc.". My bad. I'm using 8 as group size so basically I'm jumping 80% faster than a forward-only creature.

But it all got nuked when I got to the point of having to escape cycles. Detecting them was a breeze. Escaping them? It's simpler to get out of a Guantanamo Bay prison in one piece.

Check this out: a simple 4 cycle:

A -> B -> C -> B -> A -> D

Now, the problem with this circuit is that B and C are in a 2-cycle that needs to exit at B (to keep up symmetric walk), but also B and A are in a 2-cycle as well (this dude's exiting on A), so D's the exit point because B's just been visited. Crap.

Yeah, the 4-cycle auto-exited by itself during the second 2-cycle handling. That's totally f***d up, but also so beautiful.

I hate cycles.


Yeah K.G  It all comes down to Cycles lol From my understanding so far, note it has been less than a month before looking at this and I have a full-time job and school, So take it with a grain of salt but here we go:

1) What is SOTA++ and What the Heuristic you are choosing aka what route are you assigning P+J or P-J?

2) What tracking method are using to Detect Cycles.. RC uses Distance Tracking instead of Point Tracking for efficiency.
     -- So assuming you are using that
     Kang Travelling -> A -> B  -> C
                       Gets assigned P-J. Goes to B
                        Hence A -> B -> C -> B -> C
        In Kernel A if you do Continue on, instead of Escaping, your K value will improve Significant but you also have to Pay Kernel B Cost of Escape_Detection.        


I had lots of these issues as well, couldn't really overcome it. Ill have to Calculate Group Operations and find a way to reduce it as RC mentioned but I never went below 1.37.
I just went back to SOTA, and improved efficiency there. I am getting K=1.01 after statistical Significant sampling but more work needs to be done.

I really like SOTA+ but I cant really get it to work maybe you will have better luck

kTimesG
Full Member
***
Offline Offline

Activity: 784
Merit: 240


View Profile
October 30, 2025, 09:50:20 AM
 #311

1) What is SOTA++ and What the Heuristic you are choosing aka what route are you assigning P+J or P-J?

I compute both X3(P+J) and X3(P-J) and their slopes, before Y3. The route is chosen based on what X3 is better. This keeps symmetry without having to do weird checks for Y parity and conditional branching. I have a lot of other tweaks though for faster computations. Also an important idea is that walks are only recreated if a useless DP collision occurs (otherwise a low K but lots of EC multiplications is called cheating the definition of a low complexity).

                       Hence A -> B -> C -> B -> C

Not really. 2-cycles are very frequent so if there are a lot of jumps and no exiting from them until later, K goes higher. Keeping track of whether the next jump goes to the previous point (storing the last jump index and direction) allows to exit 2-cycles while jumping.

A is start point/distance. Goes to B. B goes to C. C detects a 2-cycle but is not the exit point (because in a B <-> C cycle, the end point must be deterministic otherwise two walks might diverge) so it goes back to B anyway. B detects a 2-cycle, is the exit point, so goes in the other direction, which happens to be A. A detects a 2-cycle, and is the exit point (in the A <-> B cycle) so it goes to D and cycle is exited.

Main problem: cycle detection must keep up with whether the walk exited a cycle. If some long walk is in the middle of some cycle that might exit by itself, the long cycle exiting kinda breaks because the deterministic exit point of a long cycle might never be reached a second time. Sure, recreating the walk is an option that affects speed, but is it lowering K or increasing K?

I had lots of these issues as well, couldn't really overcome it. Ill have to Calculate Group Operations and find a way to reduce it as RC mentioned but I never went below 1.37.
I just went back to SOTA, and improved efficiency there. I am getting K=1.01 after statistical Significant sampling but more work needs to be done.

You must be doing something wrong if you get a K = 1.37. Make sure all jump points are distinct and DP overhead is correct. I get K ~= 1.01 even with a very high DP. But this only when exiting all cycles as soon as they get detected (e.g. on the CPU).

Off the grid, training pigeons to broadcast signed messages.
vdog99
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
October 30, 2025, 04:50:04 PM
Last edit: October 30, 2025, 05:08:43 PM by vdog99
 #312

Quote
I compute both X3(P+J) and X3(P-J) and their slopes, before Y3. The route is chosen based on what X3 is better.

What is the Route you are choosing? How are you doing? this might where I am getting things wrong
I am assuming the idea is to use the 2 Point at the same time to get to K = 0.99 so how do you decide?


Quote
A is start point/distance. Goes to B. B goes to C. C detects a 2-cycle but is not the exit point (because in a B <-> C cycle, the end point must be deterministic otherwise two walks might diverge) so it goes back to B anyway. B detects a 2-cycle, is the exit point, so goes in the other direction, which happens to be A. A detects a 2-cycle, and is the exit point (in the A <-> B cycle) so it goes to D and cycle is exited.

Main problem: cycle detection must keep up with whether the walk exited a cycle. If some long walk is in the middle of some cycle that might exit by itself, the long cycle exiting kinda breaks because the deterministic exit point of a long cycle might never be reached a second time. Sure, recreating the walk is an option that affects speed, but is it lowering K or increasing K?


I am not a good CUDA programmer but do you know a way to debug this. Write it to a file or something else. I can do mathematical analysis and figure out what is happening here, i am in the same boat as well, hating on these cycles.
kTimesG
Full Member
***
Offline Offline

Activity: 784
Merit: 240


View Profile
October 30, 2025, 06:16:28 PM
 #313

What is the Route you are choosing? How are you doing? this might where I am getting things wrong
I am assuming the idea is to use the 2 Point at the same time to get to K = 0.99 so how do you decide?

Obviously, the best/default route is going in the direction that has the higher chances of being a DP. If the DP criteria is on X, then you know what to pick. I assume this is why RC's code tries to skip X3's that are odd, maybe I'm wrong, don't really care, I know my way works.

Note that this removes the need of checking Y parity since it will always be the same best point for any type of walk (tame, wild1, wild2). If you don't compute both points, like RC, and use Y parity and the X bit, or some 2-cycle is detected and need to reverse, then the 2nd point is needed anyway, so not sure why RC says that the cheapo is too slow on GPU. I get almost the same speed whether I compute just one X or both, so maybe it depends on the rest of the code, IDK Smiley

Also note that this means DP density doubles, so the overhead is half then normal. The "worse" point can never be a DP if the "best" point is not a DP itself, so no need to check on it (less than 0.0...01% chances of ever hitting such an alternate DP from some other walk, and chances go lower as DP increases). Even for DP 0, the collision of some other walk straight into a weak DP is less than 1%.

I am not a good CUDA programmer but do you know a way to debug this. Write it to a file or something else. I can do mathematical analysis and figure out what is happening here, i am in the same boat as well, hating on these cycles.

In CUDA we can only printf, and its tough to catch a naughty cycle when 40.000 threads are each jumping multiple walks. But there is no bug here, except that some cycles might exit at 2 different points depending on when they're handled (during jumps or at the end). That's rare though, just a fun fact. Of course, assuming cycles aren't simply nuked and recreated, which kinda sucks Smiley

Off the grid, training pigeons to broadcast signed messages.
Tony8989
Newbie
*
Offline Offline

Activity: 37
Merit: 0


View Profile
November 04, 2025, 02:56:55 PM
 #314

testing bsgs on Puzzle 135
PVK not found. 100.00000 PetaKeys scanned in 11.15 sec. New range
k1
RetiredCoder (OP)
Full Member
***
Offline Offline

Activity: 162
Merit: 141


No pain, no gain!


View Profile WWW
November 19, 2025, 08:09:16 AM
Merited by Cricktor (2), kTimesG (1)
 #315

RCKangaroo v3.1:

- fixed "gpu illegal memory access" bug.
- some small improvements.

This time I've included binary for Windows only, too lazy to build it for Linux Smiley But of course you can build it from sources, for both Windows and Linux.

I've solved #120, #125, #130. How: https://github.com/RetiredC
Ykra
Newbie
*
Offline Offline

Activity: 14
Merit: 2


View Profile
November 21, 2025, 03:59:14 PM
 #316

RCKangaroo v3.1:

- fixed "gpu illegal memory access" bug.
- some small improvements.

This time I've included binary for Windows only, too lazy to build it for Linux Smiley But of course you can build it from sources, for both Windows and Linux.

Thank you for releasing RCKangaroo and the recent fixes. Your sources have been extremely helpful to study.

I’ve built a full-featured implementation using your SOTA approach with some optimizations, giving roughly a 10-12% improvement in Gkeys/s over the public RCKangaroo.
I still can’t reach the higher performance figures you quoted for the 4090 with your private solver, so I would be interested in seeing the level of optimization you applied once you’ve completed 135 should you be willing to share.

On that note, how is progress on 135?
kTimesG
Full Member
***
Offline Offline

Activity: 784
Merit: 240


View Profile
November 21, 2025, 10:57:37 PM
 #317

I’ve built a full-featured implementation using your SOTA approach with some optimizations, giving roughly a 10-12% improvement in Gkeys/s over the public RCKangaroo.
I still can’t reach the higher performance figures you quoted for the 4090 with your private solver

I got to 11.6 GK/s with custom SOTA+ (computing both X3 points) on a 4090 with DP checks and updating jump distances. But just the kernel alone (without any of the EC math) is 3500 lines, full of compilation flags and different things to use (constant memory, shared memory, texture cache, prefetch sizes...) so yeah.

Still nowhere close to the 12.8 GK/s suggested by RC. I don't think that speed can be achieved without hardcore hand-written PTX (the only thing I didn't get into so far), a much more optimized inversion, and probably some big architectural changes.

Off the grid, training pigeons to broadcast signed messages.
Ykra
Newbie
*
Offline Offline

Activity: 14
Merit: 2


View Profile
November 22, 2025, 04:40:51 PM
 #318

I got to 11.6 GK/s with custom SOTA+ (computing both X3 points) on a 4090 with DP checks and updating jump distances. But just the kernel alone (without any of the EC math) is 3500 lines, full of compilation flags and different things to use (constant memory, shared memory, texture cache, prefetch sizes...) so yeah.

Still nowhere close to the 12.8 GK/s suggested by RC. I don't think that speed can be achieved without hardcore hand-written PTX (the only thing I didn't get into so far), a much more optimized inversion, and probably some big architectural changes.

11.6 GKeys/s on a 4090 is impressive. I'm currently getting ~11.5 GKeys/s on a 5090, so I'm clearly behind/missing a lot of things on the optimisation front. My kernel is still a frankenstein build with borrowed RC code and definitely needs a refactor before I take on new improvements.
Thank you for the reply, it's given me some more areas to think on.
Efim_Wr
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
November 25, 2025, 03:07:02 PM
 #319

Hi all. Here are the performance figures for my video cards.
CMP 70HX  ~1711 MKeys/s
RTX 5070   ~3400 MKeys/s
CMP 40HX  ~1500 MKeys/s
CMP 90HX  ~2700-3000 MKeys/s + shunt mod
I was just conducting some observations on the 85th puzzle. The classic Kangaroo didn't finish in 50 minutes, so I guess I need to wait longer, but the SOTA+ finished in 3 minutes. All experiments were run at 9800 MKeys/s.
Uncle Mark
Newbie
*
Offline Offline

Activity: 1
Merit: 0


View Profile
November 26, 2025, 08:06:32 PM
 #320

How long or how many machines do I need to process bit 115 in hours or day using this code?
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [16] 17 18 19 20 21 22 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!