Hello
For puzzle 135 I use Collider bsgs cuda which provides me with a good scanning speed of 60-65 Exa key/sec.
I adapted the software for RTX5090 from the source:
https://github.com/Etayson/BSGS-cuda.
The software is optimized, does not give errors and does not miss keys (in tests on valid addresses).
To generate the executable, PureBasic with a license is required. Below I put an example of scanning for Puzzle 135.
C:\Users\NN\Desktop\COLLIDER>bsgscudaHT_1_9_7file -t 256 -b 256 -p 914 -w 32 -htsz 31 -pk 6cf4feb12b75e8e00fffffffffffffffff -pke 6cf4feb12b75e8eFFFFFFFFFFFFFFFFFFF -infile Puzle135
Number of GPU threads set to #256
Number of GPU blocks set to #256
Number of pparam set to #914
Items number set to 2^32=4294967296
HT size set to 2^31
Range begin: 0x6cf4feb12b75e8e00fffffffffffffffff
Range end: 0x6cf4feb12b75e8efffffffffffffffffff
Will be used file: Puzle135
Found 1 Cuda device.
Cuda device:NVIDIA GeForce RTX 5090 (30840.000/32606MB)
Current config hash[]
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000200000000
GiantSUBpubkey: 038c0989f2ceb5c771a8415dff2b4c4199d8d9c8f9237d08084b05284f1e4df706
*******************************
Total GPU Memory Need: 30060.000MB
*******************************
Both HT files exist
Load BIN file:256_256_914_4294967296_g2.BIN
- chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
Last chunk:612368384b
[3] chunk:612368384b
Done in 00:00:00s
Gstep: e48000000000000
GPU count #1
GPU #0 launched
GPU #0 Free/Total/Need memory: 30838/32606/30060.002MB
_A size:120
GPU #0 copied giant array
Remove Giant array, freed memory: 3656.000 MB
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_4294967296_214 7483648_htGPUv0.BIN
- chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
.......................................
[23] chunk:1073741824b
Last chunk:4b
[24] chunk:4b
Done in 00:00:03s
GPU #0 copied hash table
Remove HT for GPU, freed memory: 24576.000 MB
Random verify packed HTCPU items in file...ok
START RANGE= 0000000000000000000000000000006cf4feb12b75e8e00fffffffffffffffff
END RANGE= 0000000000000000000000000000006cf4feb12b75e8efffffffffffffffffff
WIDTH RANGE= 000000000000000000000000000000000000000000000ff00000000000000000 = 2^76
SUBpoint= (afaacd852045a0e036d93ee350283936b312b379f0f1e04bf35565897ecaa282, 8a334cf89c64444f69049c40d563f435209697a9a7b92b38bd59a02b44db2556)
Save work every 180 seconds
Checker thread started
Findpubkey : 02145d2611c823a396ef6712ce0f712f09b9b4f3135e3e0aa3230fb9b6d08d1e16
Searchpubkey: 03235dada82c3477f7b249b6c7660b84b664d490465f98afd5efcc2b8c5c074c97
Cnt:fea5718000000000001 [1][ 7161 ] = 7161 MKeys/s x2^33.0=2^65.81 Jt:00:19:27
Reached end of space
GPU#0 job finished
GPU#0 thread finished
cuda finished ok
Press Enter to exit
...
Speed calculation
Total RANGE = ff00000000000000000 (hex) => 75262715820734970593280 (decimal)
Working time = 00:19:27 = 1167 sec
Average working speed = 75262715820734970593280 : 1167 = 64,492,472,854,100,231,870 => ~ 64.49 Exa key/sec