mahurovihamilo
Jr. Member

Activity: 50
Merit: 2
|
 |
October 27, 2025, 06:49:10 PM |
|
Does anyone know, in Linux, how to enumerate the GPUs? when using Jean-Luc's script? I am using this, but getting an error:
./kangaroo -gpu gpuId0,gpuId1,gpuId2,gpuId3,gpuId4,gpuId5,gpuId6,gpuId7 -ws savedKangs.txt -wi 30 in.txt
Kangaroo v2.2
Unexpected gpuId0,gpuId1,gpuId2,gpuId3,gpuId4,gpuId5,gpuId6,gpuId7 argument
So the question is, how to enumerate the 8 GPUs on the script commend.
Thank you!
|
|
|
|
|
ABCbits
Legendary

Activity: 3612
Merit: 10050
|
 |
October 28, 2025, 08:16:31 AM |
|
Does anyone know, in Linux, how to enumerate the GPUs? when using Jean-Luc's script? I am using this, but getting an error:
./kangaroo -gpu gpuId0,gpuId1,gpuId2,gpuId3,gpuId4,gpuId5,gpuId6,gpuId7 -ws savedKangs.txt -wi 30 in.txt
Kangaroo v2.2
Unexpected gpuId0,gpuId1,gpuId2,gpuId3,gpuId4,gpuId5,gpuId6,gpuId7 argument
So the question is, how to enumerate the 8 GPUs on the script commend.
Thank you!
Based on the GitHub README file, how about this? ./kangaroo -gpu -gpuId 0,1,2,3,4,5,6,7
|
|
|
|
songokuj5
Newbie

Activity: 2
Merit: 0
|
 |
January 15, 2026, 07:46:12 PM |
|
Does anyone know how fast the RTX 5090 is running puzzle 135?
|
|
|
|
|
MB2AA5RR
Newbie

Activity: 9
Merit: 0
|
 |
January 20, 2026, 05:34:37 AM |
|
Hello For puzzle 135 I use Collider bsgs cuda which provides me with a good scanning speed of 60-65 Exa key/sec. I adapted the software for RTX5090 from the source: https://github.com/Etayson/BSGS-cuda. The software is optimized, does not give errors and does not miss keys (in tests on valid addresses). To generate the executable, PureBasic with a license is required. Below I put an example of scanning for Puzzle 135. C:\Users\NN\Desktop\COLLIDER>bsgscudaHT_1_9_7file -t 256 -b 256 -p 914 -w 32 -htsz 31 -pk 6cf4feb12b75e8e00fffffffffffffffff -pke 6cf4feb12b75e8eFFFFFFFFFFFFFFFFFFF -infile Puzle135 Number of GPU threads set to #256 Number of GPU blocks set to #256 Number of pparam set to #914 Items number set to 2^32=4294967296 HT size set to 2^31 Range begin: 0x6cf4feb12b75e8e00fffffffffffffffff Range end: 0x6cf4feb12b75e8efffffffffffffffffff Will be used file: Puzle135 Found 1 Cuda device. Cuda device:NVIDIA GeForce RTX 5090 (30840.000/32606MB) Current config hash[] GiantSUBvalue:0000000000000000000000000000000000000000000000000000000200000000 GiantSUBpubkey: 038c0989f2ceb5c771a8415dff2b4c4199d8d9c8f9237d08084b05284f1e4df706 ******************************* Total GPU Memory Need: 30060.000MB ******************************* Both HT files exist Load BIN file:256_256_914_4294967296_g2.BIN - chunk:1073741824b
[1] chunk:1073741824b [2] chunk:1073741824b Last chunk:612368384b [3] chunk:612368384b Done in 00:00:00s Gstep: e48000000000000 GPU count #1 GPU #0 launched GPU #0 Free/Total/Need memory: 30838/32606/30060.002MB _A size:120 GPU #0 copied giant array Remove Giant array, freed memory: 3656.000 MB Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_4294967296_214 7483648_htGPUv0.BIN
- chunk:1073741824b
[1] chunk:1073741824b [2] chunk:1073741824b ....................................... [23] chunk:1073741824b Last chunk:4b [24] chunk:4b Done in 00:00:03s GPU #0 copied hash table Remove HT for GPU, freed memory: 24576.000 MB Random verify packed HTCPU items in file...ok START RANGE= 0000000000000000000000000000006cf4feb12b75e8e00fffffffffffffffff END RANGE= 0000000000000000000000000000006cf4feb12b75e8efffffffffffffffffff WIDTH RANGE= 000000000000000000000000000000000000000000000ff00000000000000000 = 2^76 SUBpoint= (afaacd852045a0e036d93ee350283936b312b379f0f1e04bf35565897ecaa282, 8a334cf89c64444f69049c40d563f435209697a9a7b92b38bd59a02b44db2556) Save work every 180 seconds Checker thread started
Findpubkey : 02145d2611c823a396ef6712ce0f712f09b9b4f3135e3e0aa3230fb9b6d08d1e16 Searchpubkey: 03235dada82c3477f7b249b6c7660b84b664d490465f98afd5efcc2b8c5c074c97 Cnt:fea5718000000000001 [1][ 7161 ] = 7161 MKeys/s x2^33.0=2^65.81 Jt:00:19:27 Reached end of space GPU#0 job finished GPU#0 thread finished cuda finished ok Press Enter to exit ... Speed calculation Total RANGE = ff00000000000000000 (hex) => 75262715820734970593280 (decimal) Working time = 00:19:27 = 1167 sec Average working speed = 75262715820734970593280 : 1167 = 64,492,472,854,100,231,870 => ~ 64.49 Exa key/sec
|
|
|
|
|
|
kTimesG
|
 |
January 20, 2026, 12:42:21 PM |
|
Speed calculation Total RANGE = ff00000000000000000 (hex) => 75262715820734970593280 (decimal) Working time = 00:19:27 = 1167 sec Average working speed = 75262715820734970593280 : 1167 = 64,492,472,854,100,231,870 => ~ 64.49 Exa key/sec
Let me fix your math: Optimistically assume the solution is found in sqrt(n) steps (though it's larger for BSGS). sqrt(n) / 1167 = 235 Mkeys/s, not galactical exokeys 32 GB hashtable: let's assume all bits are raw data and lookup is O(1) instant (though this is of course impossible, but let's just assume). So, maybe 1 billion items in table. This means number of giant steps is around 2**135/2**30 = 2**105 steps. Total time until you'll find a solution: 2**105 / 235M seconds, which is 5473625442223861 yearsGood luck!
|
Off the grid, training pigeons to broadcast signed messages.
|
|
|
MB2AA5RR
Newbie

Activity: 9
Merit: 0
|
 |
January 20, 2026, 01:44:24 PM |
|
Hello <<songokuj5>> wanted to know how fast the RTX5090 can run, and I answered with an example. If I didn't do the speed calculations correctly, calculate differently: 7161 MKeys/s x 2^33.0 = 2^65.81 => 64,682,085,578,665,827,982 => ~ 64 Exa key/sec. I made a more accurate calculation by dividing the space worked by the time the operations took, and I obtained the average speed. In practice, the instantaneous speed displayed by the program fluctuates depending on other applications that the PC is running at that moment. Especially video applications. Even the simple movement of the mouse counts in displaying the speed. Again, the space in which I tested is << WIDTH RANGE= 000000000000000000000000000000000000000000000ff00000000000000000 = 2^76 >>. Below I will give other details that may help:
Be careful when setting parameters : -t 256 -b 256 -p 914 -w 32 -htsz 31 Follow this line at the beginning of the program : GPU #0 Free/Total/Need memory: 30838/32606/30060.002MB The required memory must not exceed the free memory. If you have not met this condition, stop the program and adjust the parameters. Otherwise, you will receive an error. You will waste your time.
To generate the bin files you need RAM, at least 128-256 Gb/5600Mhz, minimum 16 core processor, frequency ~5 Ghz, a fast Nvme SSD helps a lot. Do not use disk storage units, generating the bin files will take a long, long time. The motherboard should have PCie generation 5 slots and 64-128 lanes. After generating the bin files, the processor and memory are no longer intensively requested. Bin File 1 = 41943041 Kb. Bin File 2 = 25165825 Kb.
|
|
|
|
|
|
kTimesG
|
 |
January 20, 2026, 02:56:39 PM |
|
<<songokuj5>> wanted to know how fast the RTX5090 can run, and I answered with an example. If I didn't do the speed calculations correctly, calculate differently: 7161 MKeys/s x 2^33.0 = 2^65.81 => 64,682,085,578,665,827,982 => ~ 64 Exa key/sec.
Except this topic is about JLP's Kangaroo solver, not about BSGS non-sense. An RTX 5090 can easily do 15+ GKeys/s in this context. Your whatever code only reached 200 or so Mops/s, which is 80 times slower, and trillions of trillions times slower to solve, per total.
|
Off the grid, training pigeons to broadcast signed messages.
|
|
|
MB2AA5RR
Newbie

Activity: 9
Merit: 0
|
 |
January 20, 2026, 04:39:56 PM |
|
<<kTimesG>> If you want to use Pollard's kangaroo, for puzzle 135 ... good luck and sorry for the inconvenience. It's a waste of time.
|
|
|
|
|
snaz3d
Newbie

Activity: 19
Merit: 0
|
 |
January 28, 2026, 05:50:07 PM |
|
<<kTimesG>> If you want to use Pollard's kangaroo, for puzzle 135 ... good luck and sorry for the inconvenience. It's a waste of time.
It's way faster than any BSGS nonsense.
|
|
|
|
|
Ryan121001
Newbie

Activity: 2
Merit: 0
|
 |
May 11, 2026, 01:43:31 PM |
|
Hello everyone, I’m looking for a developer who can modify or extend https://github.com/pscamillo/PSCKangaroo into a distributed client/server version. Goal: * Central server managing work ranges * Multiple remote clients/workers * Multi-GPU and multi-machine support * Shared checkpoint/database * Automatic reconnect and resume after crash * Linux and/or Windows support The current version appears optimized for single GPU/local use. I would like something similar to a distributed cracking/search framework where workers connect to a server and receive tasks dynamically. If anyone has experience with CUDA, secp256k1, Pollard Kangaroo, or distributed GPU computing and is interested, please reply or contact me. Thanks.
|
|
|
|
|
blankx4729
Newbie

Activity: 9
Merit: 0
|
 |
May 11, 2026, 06:00:37 PM |
|
https://github.com/waze4729/PSCKangaroo-Sneaky-Cousin-Expansion hey check this , added sneaky cousin expansion as a better and working cheap point and it leads to colisions that can resolve private key , vibe coded with claude/deepseek , without main authors wouldn't be possible much thanks , i am new here to puzzle / crypto and this is a great way to learn how cuda kernel works and this kangaroo algo , i still don't understand 100% but i am here to prove cheap point will work and might improve k , about your last question i would be happy to join you Ryan121001 lets do it
|
|
|
|
|
ldabasim
Newbie

Activity: 19
Merit: 0
|
 |
May 12, 2026, 04:16:13 AM Last edit: May 13, 2026, 11:08:35 AM by ldabasim |
|
Newb question, does anyone know why my kangaroo sucks with a work file, I do 700 Mkeys/s, I let it work for an hour or so in around a 75 bit range, creating a work file and then input that file with a different key (same range), the work file doesn't seem to help it go faster? My DP is low enough so that it should take a few seconds to a minute to detect the collision (15 bit maybe), but it still takes as long as if there was no DP database. The file is a few gigs. EDIT: yeah I'm stupid that workfile was not actually a dp database and kept working on the old key, so had to rewrite some of the code. Now the workfile behaves like a tame database 
|
|
|
|
|
blankx4729
Newbie

Activity: 9
Merit: 0
|
 |
May 12, 2026, 02:32:57 PM |
|
The TL;DR Cousin expansion / Walkers
Walker: walks 1000 random jumps → checks 1 point against table Cousin: takes 1 base distance → checks 32 points against table
Both are just: "generate a point, check if it's in the table" Same coin flip, different method of picking the coin. Why "Same Shit" Both methods produce independent Bernoulli trials:
Method Points checked per effort Probability per point GPU Walker 1 point per ~1000 jumps 1/2^20 CPU Cousin 32 points per MultiplyG 1/2^20 The probability of finding the key is:
P(success) = 1 - (1 - 1/2^20)^(total_points_checked) Only total_points_checked matters. How you generate those points (walking vs cousin) is irrelevant to the math.
What Actually Matters The cousin system has ONE advantage: speed per point checked.
GPU Walker: 1000 jumps × 256 threads / 0.17 GKeys/s = ~1.5ms per DP CPU Cousin: 1 MultiplyG + 31 AddPoints = ~100µs for 32 checks = ~3µs per check (500x faster per point!) The 14,500 GPU walkers produce DPs at a fixed rate limited by GPU speed. CPU cousins are essentially free extra samples that don't slow down the GPU pipeline.
The Real Optimization Since both are "same shit," the only thing that matters is:
MAXIMIZE: total_points_checked_per_second / cost And the cheapest points are cousins (CPU, AddPoints, ~3µs each). So:
GPU walkers: Keep doing their thing (they're already maxed out)
CPU cousins: Go HARD - 1024 steps instead of 16, exponential spread
Don't overthink it: It's all just independent coin flips
Final Answer Yes. Redirecting GPU walkers to "focus zones" doesn't help because:
Each walker already has 1/2^20 chance per DP regardless of starting position
Moving walkers costs more than just letting them walk
CPU cousins give you the same checks at 500x lower cost
The only improvement is more cousin checks per collision
|
|
|
|
|
Ryan121001
Newbie

Activity: 2
Merit: 0
|
 |
May 13, 2026, 07:15:58 AM |
|
For 135 bits is enough Minimum DP @120GB DP=12+ safely how DP12 on PSCKangaroo v60need a lot of ram or more DP? And for 140 bits is enough DP14? 120GB This is the minimum ram or ssd space?
|
|
|
|
|
|
kTimesG
|
 |
May 13, 2026, 03:08:00 PM |
|
The TL;DR Cousin expansion / Walkers
The only reason I'm glad more people stupidly swear for such LLM bullshit (disguised as scientific certainties) is because it ensures the competition becomes smaller. If you actually understand some basics, you'd know that the only thing towards where the crappy AI told you and vibe-coded leads you to is to 50x slower shit-code, and useless dumb ideas (e.g. the entireness of your LLM copy paste). Unfortunately, I've actually read it, so please don't even try to defend it, it's simply idiotic at best.
|
Off the grid, training pigeons to broadcast signed messages.
|
|
|
|