Hello
I RESOLVED THE PROBLEM WITH -w32.
For puzzle 135 I use Collider bsgs cuda which provides me with a good scanning speed of 60-65 Exa key/sec.
I adapted the software for RTX5090 from the source:
https://github.com/Etayson/BSGS-cuda.
The software is optimized, does not give errors and does not miss keys (in tests on valid addresses).
To generate the executable, PureBasic with a license is required. Below I put an example of scanning for Puzzle 135.
C:\Users\NN\Desktop\COLLIDER>bsgscudaHT_1_9_7file -t 256 -b 256 -p 914 -w 32 -htsz 31 -pk 6cf4feb12b75e8e00fffffffffffffffff -pke 6cf4feb12b75e8eFFFFFFFFFFFFFFFFFFF -infile Puzle135
Number of GPU threads set to #256
Number of GPU blocks set to #256
Number of pparam set to #914
Items number set to 2^32=4294967296
HT size set to 2^31
Range begin: 0x6cf4feb12b75e8e00fffffffffffffffff
Range end: 0x6cf4feb12b75e8efffffffffffffffffff
Will be used file: Puzle135
Found 1 Cuda device.
Cuda device:NVIDIA GeForce RTX 5090 (30840.000/32606MB)
Current config hash[]
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000200000000
GiantSUBpubkey: 038c0989f2ceb5c771a8415dff2b4c4199d8d9c8f9237d08084b05284f1e4df706
*******************************
Total GPU Memory Need: 30060.000MB
*******************************
Both HT files exist
Load BIN file:256_256_914_4294967296_g2.BIN
- chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
Last chunk:612368384b
[3] chunk:612368384b
Done in 00:00:00s
Gstep: e48000000000000
GPU count #1
GPU #0 launched
GPU #0 Free/Total/Need memory: 30838/32606/30060.002MB
_A size:120
GPU #0 copied giant array
Remove Giant array, freed memory: 3656.000 MB
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_4294967296_214 7483648_htGPUv0.BIN
- chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
.......................................
[23] chunk:1073741824b
Last chunk:4b
[24] chunk:4b
Done in 00:00:03s
GPU #0 copied hash table
Remove HT for GPU, freed memory: 24576.000 MB
Random verify packed HTCPU items in file...ok
START RANGE= 0000000000000000000000000000006cf4feb12b75e8e00fffffffffffffffff
END RANGE= 0000000000000000000000000000006cf4feb12b75e8efffffffffffffffffff
WIDTH RANGE= 000000000000000000000000000000000000000000000ff00000000000000000 = 2^76
SUBpoint= (afaacd852045a0e036d93ee350283936b312b379f0f1e04bf35565897ecaa282, 8a334cf89c64444f69049c40d563f435209697a9a7b92b38bd59a02b44db2556)
Save work every 180 seconds
Checker thread started
Findpubkey : 02145d2611c823a396ef6712ce0f712f09b9b4f3135e3e0aa3230fb9b6d08d1e16
Searchpubkey: 03235dada82c3477f7b249b6c7660b84b664d490465f98afd5efcc2b8c5c074c97
Cnt:fea5718000000000001 [1][ 7161 ] = 7161 MKeys/s x2^33.0=2^65.81 Jt:00:19:27
Reached end of space
GPU#0 job finished
GPU#0 thread finished
cuda finished ok
Press Enter to exit
................................................
Speed calculation
Total RANGE = ff00000000000000000 (hex) => 75262715820734970593280 (decimal)
Working time = 00:19:27 = 1167 sec
Average working speed = 75262715820734970593280 : 1167 = 64,492,472,854,100,231,870 => ~ 64.49 Exa key/sec
...
Be careful when setting parameters : -t 256 -b 256 -p 914 -w 32 -htsz 31
Follow this line at the beginning of the program : GPU #0 Free/Total/Need memory: 30838/32606/30060.002MB
The required memory must not exceed the free memory.
If you have not met this condition, stop the program and adjust the parameters.
Otherwise, you will receive an error. You will waste your time.
To generate the bin files you need RAM, at least 128-256 Gb/5600Mhz, minimum 16 core processor, frequency ~5 Ghz, a fast Nvme SSD helps a lot.
Do not use disk storage units, generating the bin files will take a long, long time.
The motherboard should have PCie generation 5 slots and 64-128 lanes.
After generating the bin files, the processor and memory are no longer intensively requested.
Bin File 1 = 41943041 Kb. Bin File 2 = 25165825 Kb.