Pollard's kangaroo ECDLP solver

Baboshka

Newbie

Offline

Activity: 13
Merit: 0

Re: Pollard's kangaroo ECDLP solver

May 02, 2024, 10:44:40 PM

#2821

Quote from: Jean_Luc on March 29, 2024, 07:06:29 PM

It is possible to reduce a little bit the complexity of the classic kangaroo algorithm by spreading the starting kangaroo in a non uniform maner.
Roughly speaking, if the private key lies in certain areas it will be found faster. As a counterpart if it lies in certain other areas it will be found slower.
But in average, considering the whole range, the compexity is well reduced.
It you read and understand carefully the reference in the gitbub page, you will find the trick Wink

First welcome back master @Jean_Luc .. so happy to see ur post.

You cant compare your deep knowledge to others, as you have developed those tools and have a deep knowledge in the Math used.

I consider my self good in programming (i have many years experience in C++ , but very new to GPU) .. i have a good Math background but that part of "elliptic curve cryptography" is another world.

So its not easy .. yes i have downloaded ur source code but that not easy at all to understand the magic happening inside.

I am learning a lot, and i must say many thanks for @WanderingPhilospher as he is a big help answering my doubts ...

I don't know if you do stuff in youtube, but i can imagine that some simple videos explaining some of those aspects will be gorgeous .. maybe i wish too much ..

Regards

Black_Pawn

Newbie

Offline

Activity: 9
Merit: 5

Re: Pollard's kangaroo ECDLP solver

May 15, 2024, 10:23:51 AM

#2822

Anyone knows if pubKeys of range 2^120 -2^121 have been saved somewhere? Even for lower ranges like 2^118-2^119 if available. I understand there might be Kangaroo work files already saved in this range? We are trialling something at the college and I remember reading about this but can't find the post anymore. Reward is possible for verified files.

graphite

Jr. Member

Offline

Activity: 49
Merit: 30

Re: Pollard's kangaroo ECDLP solver

May 30, 2024, 03:54:30 PM

#2823

Im having a hard time understanding how distinguished points (DP) work. From what i know you only save the start point, trail length, and DP per a trail and start a new trail when a DP is found. You repeat this until you find a collision DP and some how find the "actual" collision of the pubkey.

I don't understand how finding and only storing DP would lead to finding the pubkey collision. I fully understand how the kangaroo method works by creating 2 maps to find a collision using the birthday paradox. but I'm not sure how you can do this without storing the whole set a points.

madogss

Newbie

Offline

Activity: 53
Merit: 0

Re: Pollard's kangaroo ECDLP solver

June 08, 2024, 04:49:39 AM

#2824

Are there any places where save files are shared for puzzle #130 either using JeanLuc kangaroo or Etars kangaroo

COBRAS

Member

Offline

Activity: 1131
Merit: 25

Re: Pollard's kangaroo ECDLP solver

June 08, 2024, 07:03:45 AM

#2825

Quote from: Black_Pawn on May 15, 2024, 10:23:51 AM

you can subtract from to 2^20 and divide 2^20 times to 2^20 and get 2^20 pubkeys one of them will be 2^100 if you talk about 2^120 pubkey.

[

kTimesG

Full Member

Offline

Activity: 574
Merit: 198

Re: Pollard's kangaroo ECDLP solver

June 08, 2024, 09:50:47 AM

#2826

Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

Off the grid, training pigeons to broadcast signed messages.

Baskentliia

Jr. Member

Offline

Activity: 75
Merit: 1

Re: Pollard's kangaroo ECDLP solver

June 27, 2024, 01:53:30 PM

#2827

I have a question
It is for people who know the Kangaroo program well.

Kangaroo program has a multiple pubkey search feature, but it searches sequentially. that is, key1 key2 key3. It does not search for key2 before key1 is found. Likewise, it does not search for key3 unless key2 is found. I AM GIVING AN EXAMPLE. We have created 500 pubkeys. Is it possible to search all of them at the same time? Is there anyone working on this? can you share?

sdhgfdsjfgsiujdgf

Newbie

Offline

Activity: 11
Merit: 0

Re: Pollard's kangaroo ECDLP solver

June 27, 2024, 07:50:33 PM

#2828

Quote from: Baskentliia on June 27, 2024, 01:53:30 PM

It is like getting a taxi and asking to drive to multiple directions at the same time.

Chail35

Newbie

Offline

Activity: 6
Merit: 0

Re: Pollard's kangaroo ECDLP solver

June 27, 2024, 10:14:45 PM

#2829

Quote from: Baskentliia on June 27, 2024, 01:53:30 PM

You would have to run on 500 different machines if you wanted to execute the search in parallel depending on the bit range, with the same amount of power. Think of it like this lets say you wanted to search 500 pubkeys in 125 bit range all at the same time, the computing power would be spread too thin it would be more efficient to just run a single pubkey search on 500 different machines concurrently, even though it sounds crazy. I think this is where FUTURE quantum computers would come in. now if you wanted to search for 500 pubkeys in say the 50 bit range, with modern a gpu you could do it relatively quick on 1 computer .

nomachine

Full Member

Offline

Activity: 742
Merit: 110

Re: Pollard's kangaroo ECDLP solver

June 28, 2024, 01:12:33 PM

#2830

Quote from: kTimesG on June 08, 2024, 09:50:47 AM

Quote from: kTimesG on June 25, 2024, 05:53:45 PM

I could brag that I managed to write from scratch a completely working Kangaroo that currently works 6x (six times) faster than both the original and all the n00b clones out there. I'm not gonna sell it or release it because for one, I don't need to prove anything to anyone except myself, and secondly, it's really ok if no one believes me that I can squeeze out 900 million jumps/s on an underclocked RTX 3050 laptop GPU that can never reach more than 200 Mops/s with JLP's app. BTW, the stats on JLP's program are biased, the real speed is slower than the one printed on the screen (there's some non-sense smoothing speed computation in there, and the computed time durations are bugged between worker threads and main thread). Real speed is 10% slower due to these bugs. Whatever.

It's impressive that you believe you've developed a version of the Kangaroo that outperforms both the original and all the clones by such a significant margin. Achieving 900 million jumps per second on an underclocked RTX 3050 laptop GPU is indeed a remarkable accomplishment!

Achieving a 6x speed improvement is no small feat, especially considering the limitations you’ve highlighted in the original program. If you truly have a solution that performs this well, sharing it on GitHub isn't just about proving anything to anyone. It's about fostering innovation, collaboration, and growth within the community. By making your work public, you can inspire others, drive progress, and leave a lasting impact on the field. Plus, being part of a collaborative effort can lead to new insights and improvements that benefit everyone.

So, why not contribute to the community and share your impressive work? Grin

BTC: bc1qdwnxr7s08xwelpjy3cc52rrxg63xsmagv50fa8

kTimesG

Full Member

Offline

Activity: 574
Merit: 198

Re: Pollard's kangaroo ECDLP solver

June 28, 2024, 10:16:10 PM
Last edit: June 28, 2024, 10:43:07 PM by kTimesG

Merited by citb0in (1)

#2831

Quote from: nomachine on June 28, 2024, 01:12:33 PM

Quote from: kTimesG on June 08, 2024, 09:50:47 AM

Quote from: kTimesG on June 25, 2024, 05:53:45 PM

This coming from a guy who confuses me with some digaran dude. I came across this puzzle in January this year while searching for some tools to help me recover a wallet.dat password from 2016.

It's not about "believing", and I truly don't care on making an impression. I already indicated multiple times what the bottlenecks are:

- use of GPU global memory due to use of local stacks, due to use of GPU group size and multiple places where you don't even need a stack at all. This is the #1 slowdown.
- parts of the algorithm can be merged together, some other parts are really just handbook methods, unoptimized to CUDA hardware and the parallel paradigm

Using registers only, smaller group size, actually using the fast shared memory on-chip for inter-thread calculations, thinking better what the algorithm implies when run on massively number of threads, do make an implementation run 6x faster. This is not simply some optimizations in JLP's Kangaroo, I am talking about a completely from scratch well-thought system here. It's very nice that this software was used to break 115 bit puzzle, but I will again have my suspicions that this was used for 120 or 125. It is just not up to that level of competence, from a design perspective.

Quote from: kTimesG on June 08, 2024, 09:50:47 AM

Since no one answered, I will assume it's taken for granted that the multiplication used by JLP is correct (2-folding reduction steps). However, according to page 15 of this research paper:

https://eprint.iacr.org/2021/1151.pdf

under the analysis of "Sloppy reduction" it is proven that unless a correctness check is performed, then you need a 3rd reduction step.

Otherwise, the modular multiplication reduction has a chance (around one in 2**193) to be incorrect, because:

Quote

Thus, if sloppy reduction is applied to a randomly chosen number in the interval [0, R² − 1], the probability of an incorrect result is about r(r−1) / 2N

There is a missing potential carry that is assumed it can never happen, after the 2nd reduction. And no one can explain it.

Off the grid, training pigeons to broadcast signed messages.

nomachine

Full Member

Offline

Activity: 742
Merit: 110

Re: Pollard's kangaroo ECDLP solver

June 28, 2024, 10:39:04 PM

#2832

Quote from: kTimesG on June 28, 2024, 10:16:10 PM

This coming from a guy who confuses me with some digaran dude. I came across this puzzle in January this year while searching for some tools to help me recover a wallet.dat password from 2016.

Look, I'm old enough to remember when floppy disks were cutting-edge technology. Back in my day, we didn't have these fancy-schmancy high-resolution screens, and I didn't always need glasses to tell the difference between people. So, forgive me if I mix up faces or names now retired and then. At this rate, I'll be mistaking my cat for the vacuum cleaner Grin

BTC: bc1qdwnxr7s08xwelpjy3cc52rrxg63xsmagv50fa8

NotATether

Legendary

Offline

Activity: 2072
Merit: 8844

Search? Try talksearch.io

Re: Pollard's kangaroo ECDLP solver

June 29, 2024, 09:24:32 AM

#2833

Quote from: kTimesG on June 28, 2024, 10:16:10 PM

Quote from: kTimesG on June 08, 2024, 09:50:47 AM

Quote

Thus, if sloppy reduction is applied to a randomly chosen number in the interval [0, R² − 1], the probability of an incorrect result is about r(r−1) / 2N

There is a missing potential carry that is assumed it can never happen, after the 2nd reduction. And no one can explain it.

I do recall JLP himself writing that it's using projective coordinates in the ecmath. So I guess for his modular multiplication implementation too (this is the same for VanitySearch).

In such a coordinate system, a different variable holds the divisor that's accumulated during the multiplication steps of the other two variables, so that reduction can be done much much later (as it is still expensive, even if you use heavily optimized algos of binary GCD and 2x52 legs).

Since the multiplication can be done by repeated addition, this page might come useful: https://www.nayuki.io/page/elliptic-curve-point-addition-in-projective-coordinates

██
██
██
██
██
██
██
██
██
██
██
██
██

...^{⬤^⬤ LIVECASINO.io} ^│^P^l^a^y^Lⁱ^v^e^G^a^m^e^s^wⁱ^t^h^u^p^t^o^20%^c^a^s^h^b^a^c^k^!...

██
██
██
██
██
██
██
██
██
██
██
██
██

casinotester0001

Member

Offline

Activity: 196
Merit: 67

Re: Pollard's kangaroo ECDLP solver

June 29, 2024, 10:39:10 AM

#2834

Testing JLP's kangaroo with an RTX 4090 and getting a speed of 2230 MK/s.
What speeds are you getting? eg. for RTX 3080, RTX 3080Ti, RTX 3090, RTX 4080

Quote from: NotATether on June 29, 2024, 09:24:32 AM

...

NotATether, what is your speed experience for GPUs?

kTimesG

Full Member

Offline

Activity: 574
Merit: 198

Re: Pollard's kangaroo ECDLP solver

June 29, 2024, 01:06:16 PM

#2835

Quote from: NotATether on June 29, 2024, 09:24:32 AM

Quote from: kTimesG on June 28, 2024, 10:16:10 PM

Quote from: kTimesG on June 08, 2024, 09:50:47 AM

Quote

Thus, if sloppy reduction is applied to a randomly chosen number in the interval [0, R² − 1], the probability of an incorrect result is about r(r−1) / 2N

There is a missing potential carry that is assumed it can never happen, after the 2nd reduction. And no one can explain it.

I was referring to modular multiplication not point multiplication.

e.g. when we add point A with point B it is done with affine (not projective) coordinates but this is not relevant.

So there's around 3 modular multiplications for a single addition:

m = (y1 - y2) / (x1 - x2). // multiply with inverse
x3 = m ** 2 - x1 - x2 // multiply with ourself
y3 = m * (x2 - x3) - y2 // multiplication #3

There are also 3 more multiplications for each addition when computing a batched inverse.

So a total of 6 multiplications for each jump of each kangaroo.

Problematic lines are these ones reducing the 256x256 product back to mod N:

https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L856
https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L905
https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L1017

2-step reduction is only guaranteed to be correct for Mersenne primes where r = 1

For secp256k1 r is 2**32 + 977, not 1. So 2 steps are not enough.

So since there's an error ratio of only using 2-steps folding: r(r - 1) / 2N
and you have 6 multiplications / jump
and you do a lot of jumps deterministically

at some point, that error will exhibit itself and all future computations get screwed up.

Off the grid, training pigeons to broadcast signed messages.

NotATether

Legendary

Offline

Activity: 2072
Merit: 8844

Search? Try talksearch.io

Re: Pollard's kangaroo ECDLP solver

June 30, 2024, 05:22:50 AM
Last edit: June 30, 2024, 08:05:12 AM by NotATether

#2836

Quote from: kTimesG on June 29, 2024, 01:06:16 PM

...
So a total of 6 multiplications for each jump of each kangaroo.

Problematic lines are these ones reducing the 256x256 product back to mod N:

https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L856
https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L905
https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L1017

2-step reduction is only guaranteed to be correct for Mersenne primes where r = 1

For secp256k1 r is 2**32 + 977, not 1. So 2 steps are not enough.

So since there's an error ratio of only using 2-steps folding: r(r - 1) / 2N
and you have 6 multiplications / jump
and you do a lot of jumps deterministically

at some point, that error will exhibit itself and all future computations get screwed up.

That error rate would be approximately 2**32 squared, i.e. 2**64, divided by 2**256 * 2 so 2**-193 per kangaroo.

If we want to be pragmatic, then approximately 2**-191 something per kangaroo or a 1/2**191 chance it messes up, when we account the 6 multiplications per kangaroo.

But it's still a really small number.

Almost as infrequent as generating a specific 192-bit elliptic private key. Not that anyone would do that, though.

The specific condition that would happen, I found in the paper you linked

We can make a lot of pairs for which they are equal to the un-reduced product. But it is much harder to satisfy the second condition, and I don't know if anyone has generated any counter-examples.

██
██
██
██
██
██
██
██
██
██
██
██
██

...^{⬤^⬤ LIVECASINO.io} ^│^P^l^a^y^Lⁱ^v^e^G^a^m^e^s^wⁱ^t^h^u^p^t^o^20%^c^a^s^h^b^a^c^k^!...

██
██
██
██
██
██
██
██
██
██
██
██
██

kTimesG

Full Member

Offline

Activity: 574
Merit: 198

Re: Pollard's kangaroo ECDLP solver

June 30, 2024, 09:54:50 AM
Last edit: June 30, 2024, 10:21:20 AM by kTimesG

#2837

Quote from: NotATether on June 30, 2024, 05:22:50 AM

Quote from: kTimesG on June 29, 2024, 01:06:16 PM

...
So a total of 6 multiplications for each jump of each kangaroo.

So since there's an error ratio of only using 2-steps folding: r(r - 1) / 2N
and you have 6 multiplications / jump
and you do a lot of jumps deterministically

at some point, that error will exhibit itself and all future computations get screwed up.

That error rate would be approximately 2**32 squared, i.e. 2**64, divided by 2**256 * 2 so 2**-193 per kangaroo.

If we want to be pragmatic, then approximately 2**-191 something per kangaroo or a 1/2**191 chance it messes up, when we account the 6 multiplications per kangaroo.

But it's still a really small number.

We can make a lot of pairs for which they are equal to the un-reduced product. But it is much harder to satisfy the second condition, and I don't know if anyone has generated any counter-examples.

That's not really correct.

Each kangaroo contributes to a running product for a single inversion.

So when a multiply is incorrect then the root product to invert is wrong, and this results in a wrong root inverse. The inversion is then propagated back to each kangaroo.

So then the whole batch gets corrupted from that point on.

Let's say 512 kangs/batch. The error goes up from 2**-191 to 2**-183.

Imagine then that the discrete points that get saved are then redistributed to other work items, in a batch with other correct kangaroos. Then that batch gets corrupted right after the first batch jump as well, since one of the points was computed wrong.

And you have 512 "sick" (invented name) kangaroos already. Maybe each of them ends up in its own batch, infecting their herd. Pandemic follows.

Regarding the counterexample test vector to prove the incorrectness... well, unless there's some part of the code that actually checks if a discrete point is indeed correct, then it's a silent bug. If there is such a check, great, but how is it handled? I don't think it's possible to calculate what values you need to trigger the bug, due to the huge numbers involved. Suffice to say that it's proven it can happen, and according to universe, if something can happen, it will happen. In this case, who can ever know if their results are in a valid state or not?

A proper fix would be to check if we had a carry at the end, instead of ignoring it, and add "r" (2**32 + 977) to the result in that case. Ofcourse, with proper carry propagation again (so one check e.g. an addition + 4 more additions = 5 more 64-bit additions). Only then we can be guaranteed the result is correct, during the computations, not after we collected lots of invalid outputs.

Off the grid, training pigeons to broadcast signed messages.

NotATether

Legendary

Offline

Activity: 2072
Merit: 8844

Search? Try talksearch.io

Re: Pollard's kangaroo ECDLP solver

June 30, 2024, 12:26:22 PM

#2838

Quote from: kTimesG on June 30, 2024, 09:54:50 AM

That shouldn't incur much overhead, in fact we might be able to do it with 4 lines of code for each occurrence:

Code:

// I think we just discard the carry from the previous mod reduce
UADDO(r[0],r[0],0x0000000F000003D1ULL); // 2**32 + 977
UADDC(r[1],r[1],0);
UADDC(r[2],r[2],0);
UADDC(r[3],r[3],0);
// And this carry too

I *think* we are keeping everything mod 256 bits so the highest carries can just be discarded. Or maybe it's actually mod N and it's making assurances above that the product is always less than N.

██
██
██
██
██
██
██
██
██
██
██
██
██

...^{⬤^⬤ LIVECASINO.io} ^│^P^l^a^y^Lⁱ^v^e^G^a^m^e^s^wⁱ^t^h^u^p^t^o^20%^c^a^s^h^b^a^c^k^!...

██
██
██
██
██
██
██
██
██
██
██
██
██

kTimesG

Full Member

Offline

Activity: 574
Merit: 198

⇾ Re: Pollard's kangaroo ECDLP solver

June 30, 2024, 02:24:46 PM

Merited by NotATether (2)

#2839

Quote from: NotATether on June 30, 2024, 12:26:22 PM

That shouldn't incur much overhead, in fact we might be able to do it with 4 lines of code for each occurrence:

Code:

// I think we just discard the carry from the previous mod reduce
UADDO(r[0],r[0],0x0000000F000003D1ULL); // 2**32 + 977
UADDC(r[1],r[1],0);
UADDC(r[2],r[2],0);
UADDC(r[3],r[3],0);
// And this carry too

I *think* we are keeping everything mod 256 bits so the highest carries can just be discarded. Or maybe it's actually mod N and it's making assurances above that the product is always less than N.

3rd reduction ensures it would be impossible to have a carry at limb 5. But I don't agree with "we might be able to do it with 4 lines of code" because it must be a conditional 3rd addition:

If no carry after 2nd reduction = all good;
else add "r".

But to check the carry we must do an addition with 0, so we have 5 operations if carry is set, or 1 if unset (plus the condition check).

This is equivalent to always perform the 3rd reduction (it would either add 0 or r) but avoids the multiplications, since the multiply factor would either be 0 or 1.

Off the grid, training pigeons to broadcast signed messages.

NotATether

Legendary

Offline

Activity: 2072
Merit: 8844

Search? Try talksearch.io

Re: Pollard's kangaroo ECDLP solver

June 30, 2024, 05:03:38 PM

#2840

Quote from: kTimesG on June 30, 2024, 02:24:46 PM

If no carry after 2nd reduction = all good;
else add "r".

But to check the carry we must do an addition with 0, so we have 5 operations if carry is set, or 1 if unset (plus the condition check).

I wonder what is the computational cost of executing a conditional branch on CUDA though.

I know that on Intel platforms, it would've taken longer to do a CMP/JNZ than a simple bitshift. So maybe in case jumps are expensive in CUDA too, there would be something like

- create a new variable to store the carry, run UADDC(0,0) on that
- carry bit will be zero at this point
- do a UMULLO(the r value you quoted, carry) - ie. multiplication and store lower 64 bits. This potentially avoids an if statement of it is correct
- Add this the result to the value in the code that we are reducing

In this case, if there's no carry, the above is a no-op and can be optimized out by the kernel, otherwise it reduces to the previous code.

So what do you think about this?

██
██
██
██
██
██
██
██
██
██
██
██
██

...^{⬤^⬤ LIVECASINO.io} ^│^P^l^a^y^Lⁱ^v^e^G^a^m^e^s^wⁱ^t^h^u^p^t^o^20%^c^a^s^h^b^a^c^k^!...

██
██
██
██
██
██
██
██
██
██
██
██
██

Pages: « 1 ... 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 [142] 143 144 145 146 »

Bitcoin Forum > Bitcoin > Development & Technical Discussion > Pollard's kangaroo ECDLP solver

« previous topic next topic »