Bitcoin Forum
October 19, 2024, 06:36:20 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 [142] 143 144 »
  Print  
Author Topic: Pollard's kangaroo ECDLP solver  (Read 58442 times)
G34hj56
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
April 23, 2024, 08:39:10 PM
 #2821

...

I don't think anybody proficient enough for such a port would encounter your issue because they would take care of differences of Unix/Linux compared to Windows. In Linux OS mostly everything i a file and /dev/urandom is the non-blocking random number generating device (file). Reading from this file gives you not blocking random numbers.

You would have to port everything that's specific to Linux to the way it's done in Windows OS. Are you sure you have the knowledge to do this? Based on your question, I hightly doubt it.

I solved it and it's working now with VS2022 and CUDA 12.1. I made some mistakes in the linking.

My question was a little bit short, sorry
Baboshka
Newbie
*
Offline Offline

Activity: 6
Merit: 0


View Profile
May 02, 2024, 10:44:40 PM
 #2822

It is possible to reduce a little bit the complexity of the classic kangaroo algorithm by spreading the starting kangaroo in a non uniform maner.
Roughly speaking, if the private key lies in certain areas it will be found faster. As a counterpart if it lies in certain other areas it will be found slower.
But in average, considering the whole range, the compexity is well reduced.
It you read and understand carefully the reference in the gitbub page, you will find the trick Wink



First welcome back master @Jean_Luc .. so happy to see ur post.

You cant compare your deep knowledge to others, as you have developed those tools and have a deep knowledge in the Math used.

I consider my self good in programming (i have many years experience in C++ , but very new to GPU) .. i have a good Math background but that part of "elliptic curve cryptography" is another world.

So its not easy .. yes i have downloaded ur source code but that not easy at all to understand the magic happening inside.

I am learning a lot, and i must say many thanks for @WanderingPhilospher as he is a big help answering my doubts ...


I don't know if you do stuff in youtube, but i can imagine that some simple videos explaining some of those aspects will be gorgeous .. maybe i wish too much ..


Regards
Black_Pawn
Newbie
*
Offline Offline

Activity: 9
Merit: 5


View Profile
May 15, 2024, 10:23:51 AM
 #2823

Anyone knows if pubKeys of range 2^120 -2^121 have been saved somewhere? Even for lower ranges like 2^118-2^119 if available. I understand there might be Kangaroo work files already saved in this range? We are trialling something at the college and I remember reading about this but can't find the post anymore. Reward is possible for verified files.
graphite
Jr. Member
*
Offline Offline

Activity: 44
Merit: 27


View Profile
May 30, 2024, 03:54:30 PM
 #2824

Im having a hard time understanding how distinguished points (DP) work. From what i know you only save the start point, trail length, and DP per a trail and start a new trail when a DP is found. You repeat this until you find a collision DP and some how find the "actual" collision of the pubkey.

I don't understand how finding and only storing DP would lead to finding the pubkey collision. I fully understand how the kangaroo method works by creating 2 maps to find a collision using the birthday paradox. but I'm not sure how you can do this without storing the whole set a points.
madogss
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
June 08, 2024, 04:49:39 AM
 #2825

Are there any places where save files are shared for puzzle #130 either using JeanLuc kangaroo or Etars kangaroo
COBRAS
Member
**
Offline Offline

Activity: 995
Merit: 23


View Profile
June 08, 2024, 07:03:45 AM
 #2826

Anyone knows if pubKeys of range 2^120 -2^121 have been saved somewhere? Even for lower ranges like 2^118-2^119 if available. I understand there might be Kangaroo work files already saved in this range? We are trialling something at the college and I remember reading about this but can't find the post anymore. Reward is possible for verified files.

you can subtract from  to 2^20 and divide 2^20 times to 2^20 and get 2^20 pubkeys one of them will be 2^100 if you talk about 2^120 pubkey.

[
kTimesG
Member
**
Offline Offline

Activity: 228
Merit: 29


View Profile
June 08, 2024, 09:50:47 AM
 #2827

Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?
Baskentliia
Jr. Member
*
Offline Offline

Activity: 64
Merit: 1

34Sf4DnMt3z6XKKoWmZRw2nGyfGkDgNJZZ


View Profile WWW
June 27, 2024, 01:53:30 PM
 #2828

I have a question
It is for people who know the Kangaroo program well.

Kangaroo program has a multiple pubkey search feature, but it searches sequentially. that is, key1 key2 key3. It does not search for key2 before key1 is found. Likewise, it does not search for key3 unless key2 is found. I AM GIVING AN EXAMPLE. We have created 500 pubkeys. Is it possible to search all of them at the same time? Is there anyone working on this? can you share?

34Sf4DnMt3z6XKKoWmZRw2nGyfGkDgNJZZ
sdhgfdsjfgsiujdgf
Newbie
*
Offline Offline

Activity: 8
Merit: 0


View Profile
June 27, 2024, 07:50:33 PM
 #2829

I have a question
It is for people who know the Kangaroo program well.

Kangaroo program has a multiple pubkey search feature, but it searches sequentially. that is, key1 key2 key3. It does not search for key2 before key1 is found. Likewise, it does not search for key3 unless key2 is found. I AM GIVING AN EXAMPLE. We have created 500 pubkeys. Is it possible to search all of them at the same time? Is there anyone working on this? can you share?

It is like getting a taxi and asking to drive to multiple directions at the same time.
Chail35
Newbie
*
Offline Offline

Activity: 6
Merit: 0


View Profile
June 27, 2024, 10:14:45 PM
 #2830

I have a question
It is for people who know the Kangaroo program well.

Kangaroo program has a multiple pubkey search feature, but it searches sequentially. that is, key1 key2 key3. It does not search for key2 before key1 is found. Likewise, it does not search for key3 unless key2 is found. I AM GIVING AN EXAMPLE. We have created 500 pubkeys. Is it possible to search all of them at the same time? Is there anyone working on this? can you share?

You would have to run on 500 different machines if you wanted to execute the search in parallel depending on the bit range, with the same amount of power. Think of it like this lets say you wanted to search 500 pubkeys in 125 bit range all at the same time, the computing power would be spread too thin it would be more efficient to just run a single pubkey search on 500 different machines concurrently, even though it sounds crazy. I think this is where FUTURE quantum computers would come in. now if you wanted to search for 500 pubkeys in say the 50 bit range, with modern a gpu you could do it relatively quick on 1 computer .
nomachine
Member
**
Offline Offline

Activity: 476
Merit: 35


View Profile
June 28, 2024, 01:12:33 PM
 #2831

Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

I could brag that I managed to write from scratch a completely working Kangaroo that currently works 6x (six times) faster than both the original and all the n00b clones out there. I'm not gonna sell it or release it because for one, I don't need to prove anything to anyone except myself, and secondly, it's really ok if no one believes me that I can squeeze out 900 million jumps/s on an underclocked RTX 3050 laptop GPU that can never reach more than 200 Mops/s with JLP's app. BTW, the stats on JLP's program are biased, the real speed is slower than the one printed on the screen (there's some non-sense smoothing speed computation in there, and the computed time durations are bugged between worker threads and main thread). Real speed is 10% slower due to these bugs. Whatever.


It's impressive that you believe you've developed a version of the Kangaroo that outperforms both the original and all the clones by such a significant margin. Achieving 900 million jumps per second on an underclocked RTX 3050 laptop GPU is indeed a remarkable accomplishment!

Achieving a 6x speed improvement is no small feat, especially considering the limitations you’ve highlighted in the original program. If you truly have a solution that performs this well, sharing it on GitHub isn't just about proving anything to anyone. It's about fostering innovation, collaboration, and growth within the community. By making your work public, you can inspire others, drive progress, and leave a lasting impact on the field. Plus, being part of a collaborative effort can lead to new insights and improvements that benefit everyone.

So, why not contribute to the community and share your impressive work?  Grin

bc1qdwnxr7s08xwelpjy3cc52rrxg63xsmagv50fa8
kTimesG
Member
**
Offline Offline

Activity: 228
Merit: 29


View Profile
June 28, 2024, 10:16:10 PM
Last edit: June 28, 2024, 10:43:07 PM by kTimesG
Merited by citb0in (1)
 #2832

Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

I could brag that I managed to write from scratch a completely working Kangaroo that currently works 6x (six times) faster than both the original and all the n00b clones out there. I'm not gonna sell it or release it because for one, I don't need to prove anything to anyone except myself, and secondly, it's really ok if no one believes me that I can squeeze out 900 million jumps/s on an underclocked RTX 3050 laptop GPU that can never reach more than 200 Mops/s with JLP's app. BTW, the stats on JLP's program are biased, the real speed is slower than the one printed on the screen (there's some non-sense smoothing speed computation in there, and the computed time durations are bugged between worker threads and main thread). Real speed is 10% slower due to these bugs. Whatever.


It's impressive that you believe you've developed a version of the Kangaroo that outperforms both the original and all the clones by such a significant margin. Achieving 900 million jumps per second on an underclocked RTX 3050 laptop GPU is indeed a remarkable accomplishment!

Achieving a 6x speed improvement is no small feat, especially considering the limitations you’ve highlighted in the original program. If you truly have a solution that performs this well, sharing it on GitHub isn't just about proving anything to anyone. It's about fostering innovation, collaboration, and growth within the community. By making your work public, you can inspire others, drive progress, and leave a lasting impact on the field. Plus, being part of a collaborative effort can lead to new insights and improvements that benefit everyone.

So, why not contribute to the community and share your impressive work?  Grin

This coming from a guy who confuses me with some digaran dude. I came across this puzzle in January this year while searching for some tools to help me recover a wallet.dat password from 2016.

It's not about "believing", and I truly don't care on making an impression. I already indicated multiple times what the bottlenecks are:

- use of GPU global memory due to use of local stacks, due to use of GPU group size and multiple places where you don't even need a stack at all. This is the #1 slowdown.
- parts of the algorithm can be merged together, some other parts are really just handbook methods, unoptimized to CUDA hardware and the parallel paradigm

Using registers only, smaller group size, actually using the fast shared memory on-chip for inter-thread calculations, thinking better what the algorithm implies when run on massively number of threads, do make an implementation run 6x faster. This is not simply some optimizations in  JLP's Kangaroo, I am talking about a completely from scratch well-thought system here. It's very nice that this software was used to break 115 bit puzzle, but I will again have my suspicions that this was used for 120 or 125. It is just not up to that level of competence, from a design perspective.

Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

Since no one answered, I will assume it's taken for granted that the multiplication used by JLP is correct (2-folding reduction steps). However, according to page 15 of this research paper:

https://eprint.iacr.org/2021/1151.pdf

under the analysis of "Sloppy reduction" it is proven that unless a correctness check is performed, then you need a 3rd reduction step.

Otherwise, the modular multiplication reduction has a chance (around one in 2**193) to be incorrect, because:

Quote
Thus, if sloppy reduction is applied to a randomly chosen number in the interval [0, R2 − 1], the probability of an incorrect result is about r(r−1) / 2N

There is a missing potential carry that is assumed it can never happen, after the 2nd reduction. And no one can explain it.
nomachine
Member
**
Offline Offline

Activity: 476
Merit: 35


View Profile
June 28, 2024, 10:39:04 PM
 #2833

This coming from a guy who confuses me with some digaran dude. I came across this puzzle in January this year while searching for some tools to help me recover a wallet.dat password from 2016.

Look, I'm old enough to remember when floppy disks were cutting-edge technology. Back in my day, we didn't have these fancy-schmancy high-resolution screens, and I didn't always need glasses to tell the difference between people. So, forgive me if I mix up faces or names now retired and then. At this rate, I'll be mistaking my cat for the vacuum cleaner  Grin

bc1qdwnxr7s08xwelpjy3cc52rrxg63xsmagv50fa8
NotATether
Legendary
*
Offline Offline

Activity: 1764
Merit: 7330


Top Crypto Casino


View Profile WWW
June 29, 2024, 09:24:32 AM
 #2834

Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

Since no one answered, I will assume it's taken for granted that the multiplication used by JLP is correct (2-folding reduction steps). However, according to page 15 of this research paper:

https://eprint.iacr.org/2021/1151.pdf

under the analysis of "Sloppy reduction" it is proven that unless a correctness check is performed, then you need a 3rd reduction step.

Otherwise, the modular multiplication reduction has a chance (around one in 2**193) to be incorrect, because:

Quote
Thus, if sloppy reduction is applied to a randomly chosen number in the interval [0, R2 − 1], the probability of an incorrect result is about r(r−1) / 2N

There is a missing potential carry that is assumed it can never happen, after the 2nd reduction. And no one can explain it.

I do recall JLP himself writing that it's using projective coordinates in the ecmath. So I guess for his modular multiplication implementation too (this is the same for VanitySearch).

In such a coordinate system, a different variable holds the divisor that's accumulated during the multiplication steps of the other two variables, so that reduction can be done much much later (as it is still expensive, even if you use heavily optimized algos of binary GCD and 2x52 legs).

Since the multiplication can be done by repeated addition, this page might come useful: https://www.nayuki.io/page/elliptic-curve-point-addition-in-projective-coordinates

███████████████████████
████▐██▄█████████████████
████▐██████▄▄▄███████████
████▐████▄█████▄▄████████
████▐█████▀▀▀▀▀███▄██████
████▐███▀████████████████
████▐█████████▄█████▌████
████▐██▌█████▀██████▌████
████▐██████████▀████▌████
█████▀███▄█████▄███▀█████
███████▀█████████▀███████
██████████▀███▀██████████

███████████████████████
.
BC.GAME
▄▄▀▀▀▀▀▀▀▄▄
▄▀▀░▄██▀░▀██▄░▀▀▄
▄▀░▐▀▄░▀░░▀░░▀░▄▀▌░▀▄
▄▀▄█▐░▀▄▀▀▀▀▀▄▀░▌█▄▀▄
▄▀░▀░░█░▄███████▄░█░░▀░▀▄
█░█░▀░█████████████░▀░█░█
█░██░▀█▀▀█▄▄█▀▀█▀░██░█
█░█▀██░█▀▀██▀▀█░██▀█░█
▀▄▀██░░░▀▀▄▌▐▄▀▀░░░██▀▄▀
▀▄▀██░░▄░▀▄█▄▀░▄░░██▀▄▀
▀▄░▀█░▄▄▄░▀░▄▄▄░█▀░▄▀
▀▄▄▀▀███▄███▀▀▄▄▀
██████▄▄▄▄▄▄▄██████
.
..CASINO....SPORTS....RACING..


▄▄████▄▄
▄███▀▀███▄
██████████
▀███▄░▄██▀
▄▄████▄▄░▀█▀▄██▀▄▄████▄▄
▄███▀▀▀████▄▄██▀▄███▀▀███▄
███████▄▄▀▀████▄▄▀▀███████
▀███▄▄███▀░░░▀▀████▄▄▄███▀
▀▀████▀▀████████▀▀████▀▀
casinotester0001
Member
**
Offline Offline

Activity: 196
Merit: 67


View Profile
June 29, 2024, 10:39:10 AM
 #2835

Testing JLP's kangaroo with an RTX 4090 and getting a speed of 2230 MK/s.
What speeds are you getting? eg. for RTX 3080, RTX 3080Ti, RTX 3090, RTX 4080

...
NotATether, what is your speed experience for GPUs?
kTimesG
Member
**
Offline Offline

Activity: 228
Merit: 29


View Profile
June 29, 2024, 01:06:16 PM
 #2836

Can anyone explain why a sloppy modular multiplication is used? The 3rd step is missing, otherwise there exist probability of an incorrect result. Or is it proven that only 2 steps were sufficient (I doubt it)?

Since no one answered, I will assume it's taken for granted that the multiplication used by JLP is correct (2-folding reduction steps). However, according to page 15 of this research paper:

https://eprint.iacr.org/2021/1151.pdf

under the analysis of "Sloppy reduction" it is proven that unless a correctness check is performed, then you need a 3rd reduction step.

Otherwise, the modular multiplication reduction has a chance (around one in 2**193) to be incorrect, because:

Quote
Thus, if sloppy reduction is applied to a randomly chosen number in the interval [0, R2 − 1], the probability of an incorrect result is about r(r−1) / 2N

There is a missing potential carry that is assumed it can never happen, after the 2nd reduction. And no one can explain it.

I do recall JLP himself writing that it's using projective coordinates in the ecmath. So I guess for his modular multiplication implementation too (this is the same for VanitySearch).

In such a coordinate system, a different variable holds the divisor that's accumulated during the multiplication steps of the other two variables, so that reduction can be done much much later (as it is still expensive, even if you use heavily optimized algos of binary GCD and 2x52 legs).

Since the multiplication can be done by repeated addition, this page might come useful: https://www.nayuki.io/page/elliptic-curve-point-addition-in-projective-coordinates

I was referring to modular multiplication not point multiplication.

e.g. when we add point A with point B it is done with affine (not projective) coordinates but this is not relevant.

So there's around 3 modular multiplications for a single addition:

m = (y1 - y2) / (x1 - x2).  // multiply with inverse
x3 = m ** 2 - x1 - x2        // multiply with ourself
y3 = m * (x2 - x3) - y2     // multiplication #3

There are also 3 more multiplications for each addition when computing a batched inverse.

So a total of 6 multiplications for each jump of each kangaroo.

Problematic lines are these ones reducing the 256x256 product back to mod N:

https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L856
https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L905
https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L1017

2-step reduction is only guaranteed to be correct for Mersenne primes where r = 1

For secp256k1 r is 2**32 + 977, not 1. So 2 steps are not enough.

So since there's an error ratio of only using 2-steps folding: r(r - 1) / 2N
and you have 6 multiplications / jump
and you do a lot of jumps deterministically

at some point, that error will exhibit itself and all future computations get screwed up.
NotATether
Legendary
*
Offline Offline

Activity: 1764
Merit: 7330


Top Crypto Casino


View Profile WWW
June 30, 2024, 05:22:50 AM
Last edit: June 30, 2024, 08:05:12 AM by NotATether
 #2837

...
So a total of 6 multiplications for each jump of each kangaroo.

Problematic lines are these ones reducing the 256x256 product back to mod N:

https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L856
https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L905
https://github.com/JeanLucPons/Kangaroo/blob/37576c82d198c20fca65b14da74138ae6153a446/GPU/GPUMath.h#L1017

2-step reduction is only guaranteed to be correct for Mersenne primes where r = 1

For secp256k1 r is 2**32 + 977, not 1. So 2 steps are not enough.

So since there's an error ratio of only using 2-steps folding: r(r - 1) / 2N
and you have 6 multiplications / jump
and you do a lot of jumps deterministically

at some point, that error will exhibit itself and all future computations get screwed up.

That error rate would be approximately 2**32 squared, i.e. 2**64, divided by 2**256 * 2 so 2**-193 per kangaroo.

If we want to be pragmatic, then approximately 2**-191 something per kangaroo or a 1/2**191 chance it messes up, when we account the 6 multiplications per kangaroo.

But it's still a really small number.

Almost as infrequent as generating a specific 192-bit elliptic private key. Not that anyone would do that, though.

The specific condition that would happen, I found in the paper you linked



We can make a lot of pairs for which they are equal to the un-reduced product. But it is much harder to satisfy the second condition, and I don't know if anyone has generated any counter-examples.

███████████████████████
████▐██▄█████████████████
████▐██████▄▄▄███████████
████▐████▄█████▄▄████████
████▐█████▀▀▀▀▀███▄██████
████▐███▀████████████████
████▐█████████▄█████▌████
████▐██▌█████▀██████▌████
████▐██████████▀████▌████
█████▀███▄█████▄███▀█████
███████▀█████████▀███████
██████████▀███▀██████████

███████████████████████
.
BC.GAME
▄▄▀▀▀▀▀▀▀▄▄
▄▀▀░▄██▀░▀██▄░▀▀▄
▄▀░▐▀▄░▀░░▀░░▀░▄▀▌░▀▄
▄▀▄█▐░▀▄▀▀▀▀▀▄▀░▌█▄▀▄
▄▀░▀░░█░▄███████▄░█░░▀░▀▄
█░█░▀░█████████████░▀░█░█
█░██░▀█▀▀█▄▄█▀▀█▀░██░█
█░█▀██░█▀▀██▀▀█░██▀█░█
▀▄▀██░░░▀▀▄▌▐▄▀▀░░░██▀▄▀
▀▄▀██░░▄░▀▄█▄▀░▄░░██▀▄▀
▀▄░▀█░▄▄▄░▀░▄▄▄░█▀░▄▀
▀▄▄▀▀███▄███▀▀▄▄▀
██████▄▄▄▄▄▄▄██████
.
..CASINO....SPORTS....RACING..


▄▄████▄▄
▄███▀▀███▄
██████████
▀███▄░▄██▀
▄▄████▄▄░▀█▀▄██▀▄▄████▄▄
▄███▀▀▀████▄▄██▀▄███▀▀███▄
███████▄▄▀▀████▄▄▀▀███████
▀███▄▄███▀░░░▀▀████▄▄▄███▀
▀▀████▀▀████████▀▀████▀▀
kTimesG
Member
**
Offline Offline

Activity: 228
Merit: 29


View Profile
June 30, 2024, 09:54:50 AM
Last edit: June 30, 2024, 10:21:20 AM by kTimesG
 #2838

...
So a total of 6 multiplications for each jump of each kangaroo.

So since there's an error ratio of only using 2-steps folding: r(r - 1) / 2N
and you have 6 multiplications / jump
and you do a lot of jumps deterministically

at some point, that error will exhibit itself and all future computations get screwed up.

That error rate would be approximately 2**32 squared, i.e. 2**64, divided by 2**256 * 2 so 2**-193 per kangaroo.

If we want to be pragmatic, then approximately 2**-191 something per kangaroo or a 1/2**191 chance it messes up, when we account the 6 multiplications per kangaroo.

But it's still a really small number.

We can make a lot of pairs for which they are equal to the un-reduced product. But it is much harder to satisfy the second condition, and I don't know if anyone has generated any counter-examples.
That's not really correct.

Each kangaroo contributes to a running product for a single inversion.

So when a multiply is incorrect then the root product to invert is wrong, and this results in a wrong root inverse. The inversion is then propagated back to each kangaroo.

So then the whole batch gets corrupted from that point on.

Let's say 512 kangs/batch. The error goes up from 2**-191 to 2**-183.

Imagine then that the discrete points that get saved are then redistributed to other work items, in a batch with other correct kangaroos. Then that batch gets corrupted right after the first batch jump as well, since one of the points was computed wrong.

And you have 512 "sick" (invented name) kangaroos already. Maybe each of them ends up in its own batch, infecting their herd. Pandemic follows.

Regarding the counterexample test vector to prove the incorrectness... well, unless there's some part of the code that actually checks if a discrete point is indeed correct, then it's a silent bug. If there is such a check, great, but how is it handled? I don't think it's possible to calculate what values you need to trigger the bug, due to the huge numbers involved. Suffice to say that it's proven it can happen, and according to universe, if something can happen, it will happen. In this case, who can ever know if their results are in a valid state or not? Smiley

A proper fix would be to check if we had a carry at the end, instead of ignoring it, and add "r" (2**32 + 977) to the result in that case. Ofcourse, with proper carry propagation again (so one check e.g. an addition + 4 more additions = 5 more 64-bit additions). Only then we can be guaranteed the result is correct, during the computations, not after we collected lots of invalid outputs.
NotATether
Legendary
*
Offline Offline

Activity: 1764
Merit: 7330


Top Crypto Casino


View Profile WWW
June 30, 2024, 12:26:22 PM
 #2839

A proper fix would be to check if we had a carry at the end, instead of ignoring it, and add "r" (2**32 + 977) to the result in that case. Ofcourse, with proper carry propagation again (so one check e.g. an addition + 4 more additions = 5 more 64-bit additions). Only then we can be guaranteed the result is correct, during the computations, not after we collected lots of invalid outputs.

That shouldn't incur much overhead, in fact we might be able to do it with 4 lines of code for each occurrence:

Code:
// I think we just discard the carry from the previous mod reduce
UADDO(r[0],r[0],0x0000000F000003D1ULL); // 2**32 + 977
UADDC(r[1],r[1],0);
UADDC(r[2],r[2],0);
UADDC(r[3],r[3],0);
// And this carry too

I *think* we are keeping everything mod 256 bits so the highest carries can just be discarded. Or maybe it's actually mod N and it's making assurances above that the product is always less than N.

███████████████████████
████▐██▄█████████████████
████▐██████▄▄▄███████████
████▐████▄█████▄▄████████
████▐█████▀▀▀▀▀███▄██████
████▐███▀████████████████
████▐█████████▄█████▌████
████▐██▌█████▀██████▌████
████▐██████████▀████▌████
█████▀███▄█████▄███▀█████
███████▀█████████▀███████
██████████▀███▀██████████

███████████████████████
.
BC.GAME
▄▄▀▀▀▀▀▀▀▄▄
▄▀▀░▄██▀░▀██▄░▀▀▄
▄▀░▐▀▄░▀░░▀░░▀░▄▀▌░▀▄
▄▀▄█▐░▀▄▀▀▀▀▀▄▀░▌█▄▀▄
▄▀░▀░░█░▄███████▄░█░░▀░▀▄
█░█░▀░█████████████░▀░█░█
█░██░▀█▀▀█▄▄█▀▀█▀░██░█
█░█▀██░█▀▀██▀▀█░██▀█░█
▀▄▀██░░░▀▀▄▌▐▄▀▀░░░██▀▄▀
▀▄▀██░░▄░▀▄█▄▀░▄░░██▀▄▀
▀▄░▀█░▄▄▄░▀░▄▄▄░█▀░▄▀
▀▄▄▀▀███▄███▀▀▄▄▀
██████▄▄▄▄▄▄▄██████
.
..CASINO....SPORTS....RACING..


▄▄████▄▄
▄███▀▀███▄
██████████
▀███▄░▄██▀
▄▄████▄▄░▀█▀▄██▀▄▄████▄▄
▄███▀▀▀████▄▄██▀▄███▀▀███▄
███████▄▄▀▀████▄▄▀▀███████
▀███▄▄███▀░░░▀▀████▄▄▄███▀
▀▀████▀▀████████▀▀████▀▀
kTimesG
Member
**
Offline Offline

Activity: 228
Merit: 29


View Profile
June 30, 2024, 02:24:46 PM
Merited by NotATether (2)
 #2840

That shouldn't incur much overhead, in fact we might be able to do it with 4 lines of code for each occurrence:

Code:
// I think we just discard the carry from the previous mod reduce
UADDO(r[0],r[0],0x0000000F000003D1ULL); // 2**32 + 977
UADDC(r[1],r[1],0);
UADDC(r[2],r[2],0);
UADDC(r[3],r[3],0);
// And this carry too

I *think* we are keeping everything mod 256 bits so the highest carries can just be discarded. Or maybe it's actually mod N and it's making assurances above that the product is always less than N.

3rd reduction ensures it would be impossible to have a carry at limb 5. But I don't agree with "we might be able to do it with 4 lines of code" because it must be a conditional 3rd addition:

If no carry after 2nd reduction = all good;
else add "r".

But to check the carry we must do an addition with 0, so we have 5 operations if carry is set, or 1 if unset (plus the condition check).

This is equivalent to always perform the 3rd reduction (it would either add 0 or r) but avoids the multiplications, since the multiply factor would either be 0 or 1.

Pages: « 1 ... 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 [142] 143 144 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!