WanderingPhilospher
Full Member
Offline
Activity: 1204
Merit: 237
Shooters Shoot...
|
|
December 12, 2023, 05:54:29 AM |
|
While running some performance tests with Rotor-Cuda I've noticed that when I assign a monstruous grid-size for my GPU, I can get more speed.
If I set it like this --gpux 18000,512 I get a steady 4.62 GK/s peaking at 6.90 GK/s for a few seconds.
Can this cause any problems, like skipping keys during the search? If so, can anyone recommend a good grid-size for a 3080ti?
Thanks in advance!
The best thing to do is to test your grid size and run through a small range, something like a 2^40 range. See if the grid size finds the key or not. I use a similiar version of KeyHunt Cuda / Rotor and haven't missed a key with a large grid size. But seriously, run a simple test to know for sure with your card and setup.
|
|
|
|
CY4NiDE
Member
Offline
Activity: 63
Merit: 14
|
|
December 12, 2023, 06:38:59 AM |
|
Hey there, thanks for your reply. Much appreciated.
So if it can pass the 2^40 test without skipping any keys can I deem it safe?
So far no problems with 2^35, 2^38, 2^39, 2^40.
--gpux 32000,512
|
1CY4NiDEaNXfhZ3ndgC2M2sPnrkRhAZhmS
|
|
|
WanderingPhilospher
Full Member
Offline
Activity: 1204
Merit: 237
Shooters Shoot...
|
|
December 12, 2023, 06:45:40 AM |
|
Hey there, thanks for your reply. Much appreciated.
So if it can pass the 2^40 test without skipping any keys can I deem it safe?
So far no problems with 2^35, 2^38, 2^39, 2^40.
--gpux 32000,512
I would run a few tests. Put some keys at the beginning of range, the middle and the end. I've used some large grid sizes with no issues. If you are using the rekey option, you will get fluctuation in your speed no matter the grid size; as it spins up to rekey and then picks back up.
|
|
|
|
CY4NiDE
Member
Offline
Activity: 63
Merit: 14
|
|
December 14, 2023, 08:20:06 PM |
|
I definitely got ahead of myself. I ran X-Point mode using --gpux 32000,512 against 20 keys spread over the 2^40 range and only 2 of those keys were being found. Same with --gpux 18000,512 and anything in between. In the end only with --gpux 1024,512 the program was able to find all 20 keys without skipping. I'll run 2^50 next against more keys to see if this effect gets mitigated as the space increases. Haven't checked Address mode or Hash160 mode yet.
|
1CY4NiDEaNXfhZ3ndgC2M2sPnrkRhAZhmS
|
|
|
digaran
Copper Member
Hero Member
Offline
Activity: 1330
Merit: 899
🖤😏
|
|
December 14, 2023, 08:51:53 PM |
|
You might be interested to read and learn more about grid sizes, there are more stats which you could find by visiting this nvidia page there are technical stats on what is the acceptable grid size for different applications. You can't simply use any arbitrary grid size.
|
🖤😏
|
|
|
WanderingPhilospher
Full Member
Offline
Activity: 1204
Merit: 237
Shooters Shoot...
|
|
December 14, 2023, 11:08:07 PM Last edit: December 14, 2023, 11:24:02 PM by WanderingPhilospher |
|
You might be interested to read and learn more about grid sizes, there are more stats which you could find by visiting this nvidia page there are technical stats on what is the acceptable grid size for different applications. You can't simply use any arbitrary grid size. Digger doesn't even own a GPU or PC, lol. He's doing all of his tests from an old Blackberry flip phone I've had some large grid sizes CY4NiDE, but I keep them multiples. If card stock grid is 38,128; then I will keep a front grid that is a multiple of 38. I normally run multiple of 38x256. I've used 760x512 with no issues, 1520x512 with no issues, and I think 1 multiple higher. Those are tests with KeyHunterGPU; Rotor has some flaws in it, especially if using the continue option. I stopped using/testing all versions of Rotor after I discovered a bug with the continue option, because it was causing keys to be skipped.
|
|
|
|
CY4NiDE
Member
Offline
Activity: 63
Merit: 14
|
|
December 15, 2023, 12:37:07 AM Last edit: December 15, 2023, 12:54:04 AM by CY4NiDE Merited by Halab (2), JayJuanGee (1) |
|
Yeah, I was testing different grids for my card the other day, within the reasonable bounds, keeping it a small multiple of the original grid. After a while I decided to play with it a bit and increased the grid by larger factors, thus arriving at numbers like 18000 and 32000. They are not arbitrary. I thought the program wouldn't even initiate. For my surprise not only it ran but it had increased speeds. Then it came to mind that it was probably jumping over a bunch of keys. It was too good to be true. I was getting a constant 5GK/s with sudden peaks to 9GK/s every few seconds running sequential X-Point mode with a grid-size like --gpux 36000x512 against #130. Raise it much further than that and the speed will keep dropping to 0.00 MK/s for a few seconds during the entire search. Anyways, if going for random mode I guess this issue could be overlooked? One can have more threads thus searching faster, the trade-off being skipping some keys... About this other flaw [in a scenario where the grid-size is not causing it to skip keys] could I avoid it by updating the lower range in my .bat file to be the last key shown in the counter before terminating the session, instead of using continue.bat? Thanks!
|
1CY4NiDEaNXfhZ3ndgC2M2sPnrkRhAZhmS
|
|
|
WanderingPhilospher
Full Member
Offline
Activity: 1204
Merit: 237
Shooters Shoot...
|
|
December 15, 2023, 04:16:33 AM Merited by JayJuanGee (1) |
|
Yeah, I was testing different grids for my card the other day, within the reasonable bounds, keeping it a small multiple of the original grid. After a while I decided to play with it a bit and increased the grid by larger factors, thus arriving at numbers like 18000 and 32000. They are not arbitrary. I thought the program wouldn't even initiate. For my surprise not only it ran but it had increased speeds. Then it came to mind that it was probably jumping over a bunch of keys. It was too good to be true. I was getting a constant 5GK/s with sudden peaks to 9GK/s every few seconds running sequential X-Point mode with a grid-size like --gpux 36000x512 against #130. Raise it much further than that and the speed will keep dropping to 0.00 MK/s for a few seconds during the entire search. Anyways, if going for random mode I guess this issue could be overlooked? One can have more threads thus searching faster, the trade-off being skipping some keys... About this other flaw [in a scenario where the grid-size is not causing it to skip keys] could I avoid it by updating the lower range in my .bat file to be the last key shown in the counter before terminating the session, instead of using continue.bat? Thanks! Yes, you could. I implemented a total key counter in mine. It would print to file, total # of keys, that way, even if power went out, I'd have a good starting point. If you do it this way, make sure you keep/know start and end range. You then need to take total keys ran, divide by the number of threads (grid size) and then take that number and add it to your initial/last start AND end range. Example: If you had a start range of 0 and an end range of 1000000 (keep it small for this purpose) and your grid size was 10x10. The program says you have ran/checked 10,000 keys total. Take 10,000 (total keys) and divide by 10x10=100 (grid size); 10,000 / 100 = 100. So each gpu thread checked 100 keys. So for your next batch file, you would have a start/end range of 100:1000100. If you only change the start range by 100, then you are overlapping/possibly missing keys checked on the other threads. If you stop and think about it, or do the math, it'll make sense. Your first thread checked 0-100 (now on second run it should start at 100 and be on the hook to check up to 10,100); the last thread checked 990,000-990,100. If you don't adjust the end range as well, your last thread will now be checking 999,900 instead of starting where it left off at 990,100. Lol, again, if you do the math you'll understand. Hope it made some sense.
|
|
|
|
CY4NiDE
Member
Offline
Activity: 63
Merit: 14
|
|
December 16, 2023, 08:49:17 AM |
|
Yes, you could. I implemented a total key counter in mine. It would print to file, total # of keys, that way, even if power went out, I'd have a good starting point.
If you do it this way, make sure you keep/know start and end range. You then need to take total keys ran, divide by the number of threads (grid size) and then take that number and add it to your initial/last start AND end range.
Example: If you had a start range of 0 and an end range of 1000000 (keep it small for this purpose) and your grid size was 10x10. The program says you have ran/checked 10,000 keys total. Take 10,000 (total keys) and divide by 10x10=100 (grid size); 10,000 / 100 = 100. So each gpu thread checked 100 keys. So for your next batch file, you would have a start/end range of 100:1000100. If you only change the start range by 100, then you are overlapping/possibly missing keys checked on the other threads. If you stop and think about it, or do the math, it'll make sense. Your first thread checked 0-100 (now on second run it should start at 100 and be on the hook to check up to 10,100); the last thread checked 990,000-990,100. If you don't adjust the end range as well, your last thread will now be checking 999,900 instead of starting where it left off at 990,100. Lol, again, if you do the math you'll understand. Hope it made some sense.
That's actually a very good explanation, really appreciate it man!
|
1CY4NiDEaNXfhZ3ndgC2M2sPnrkRhAZhmS
|
|
|
3dmlib
Jr. Member
Offline
Activity: 44
Merit: 2
|
|
December 16, 2023, 10:18:51 PM |
|
Those are tests with KeyHunterGPU; Rotor has some flaws in it, especially if using the continue option. I stopped using/testing all versions of Rotor after I discovered a bug with the continue option, because it was causing keys to be skipped.
Hello. What is continue option? What bug exactly? Also, can somebody explain me what maxFound option is and how it used in code? Thanks.
|
|
|
|
WanderingPhilospher
Full Member
Offline
Activity: 1204
Merit: 237
Shooters Shoot...
|
|
December 17, 2023, 05:22:22 AM |
|
Those are tests with KeyHunterGPU; Rotor has some flaws in it, especially if using the continue option. I stopped using/testing all versions of Rotor after I discovered a bug with the continue option, because it was causing keys to be skipped.
Hello. What is continue option? What bug exactly? Also, can somebody explain me what maxFound option is and how it used in code? Thanks. The continue option was an option in rotorcuda that would save how many keys searched and grid size and readjust the range on a restart. It had flaws, as in sometimes it would not adjust correctly, or the total keys searched line would be blank. The maxFound option was the max keys the program could find in a single kernel call. I don't remember that being in keyhuntcuda or rotorcuda but more of the vanitysearch/forks of vanitysearch.
|
|
|
|
3dmlib
Jr. Member
Offline
Activity: 44
Merit: 2
|
|
December 17, 2023, 07:08:32 AM Merited by JayJuanGee (1) |
|
Those are tests with KeyHunterGPU; Rotor has some flaws in it, especially if using the continue option. I stopped using/testing all versions of Rotor after I discovered a bug with the continue option, because it was causing keys to be skipped.
Hello. What is continue option? What bug exactly? Also, can somebody explain me what maxFound option is and how it used in code? Thanks. The continue option was an option in rotorcuda that would save how many keys searched and grid size and readjust the range on a restart. It had flaws, as in sometimes it would not adjust correctly, or the total keys searched line would be blank. The maxFound option was the max keys the program could find in a single kernel call. I don't remember that being in keyhuntcuda or rotorcuda but more of the vanitysearch/forks of vanitysearch. Thanks. I had some thoughts about this program optimization also. 1. It uses global device memory access even if searching by one key. Why it can't fit searched bitcoin address ripemd160 hash, public key incremental function and ripemd160(sha256) functions in cache? 2. As I understand it executes kernel from cpu thread several times on range. Why don't do it just one time for entire range supplied to kernel. 3. ripemd160(sha256) using Tensor cores?
|
|
|
|
fecell
Jr. Member
Offline
Activity: 136
Merit: 2
|
|
January 05, 2024, 01:25:46 AM Last edit: January 05, 2024, 06:14:29 AM by fecell |
|
If you can do that, congratulations because you just partially broke elliptic curve.
No, i mean I can reduce a generator range to skip not random values, so time to bruteforce reduced too. For example, 23 bit key to test (python 3.11 + ice_secp256k1.dll). with secret algo: GOT: KwDiBf89QgGbjEhKnhXJuH7LrciVrZi3qYjgd9M7rVkthFNsQ6i7 10.363348245620728 s
with usual range (2^22 ... 2^23-1) GOT: KwDiBf89QgGbjEhKnhXJuH7LrciVrZi3qYjgd9M7rVkthFNsQ6i7 16.832353353500366 swith big values, like 66 bit, a lot of values just skiped as NOT random binary values, because cant be randomly generated by author (by wallet software). for example, first value for 66-bit range is 100000100100100101010011001011000111000111001011000111000111001011, all values less is fail. this value give generator as first value applyed with random's rules anyway, pure python not a good instrument to get result. wanna use numba cuda.jit, but still learning how to.
|
|
|
|
Baboshka
Newbie
Offline
Activity: 7
Merit: 0
|
|
January 11, 2024, 08:20:01 PM |
|
If you can do that, congratulations because you just partially broke elliptic curve.
No, i mean I can reduce a generator range to skip not random values, so time to bruteforce reduced too. For example, 23 bit key to test (python 3.11 + ice_secp256k1.dll). with secret algo: GOT: KwDiBf89QgGbjEhKnhXJuH7LrciVrZi3qYjgd9M7rVkthFNsQ6i7 10.363348245620728 s
with usual range (2^22 ... 2^23-1) GOT: KwDiBf89QgGbjEhKnhXJuH7LrciVrZi3qYjgd9M7rVkthFNsQ6i7 16.832353353500366 swith big values, like 66 bit, a lot of values just skiped as NOT random binary values, because cant be randomly generated by author (by wallet software). for example, first value for 66-bit range is 100000100100100101010011001011000111000111001011000111000111001011, all values less is fail. this value give generator as first value applyed with random's rules anyway, pure python not a good instrument to get result. wanna use numba cuda.jit, but still learning how to. Hi fecell .. can you please explain more why values less "100000100100100101010011001011000111000111001011000111000111001011" will fail .. thanks and regards
|
|
|
|
fecell
Jr. Member
Offline
Activity: 136
Merit: 2
|
|
January 12, 2024, 04:19:16 PM |
|
can you please explain
NO. excusema
|
|
|
|
WanderingPhilospher
Full Member
Offline
Activity: 1204
Merit: 237
Shooters Shoot...
|
|
January 12, 2024, 10:37:04 PM |
|
can you please explain
NO. excusema He can't lol. Really, what do you mean? How is 0x10492A658E39638E5 the first value for the 66 bit range? Maybe I am misreading your statement(s).
|
|
|
|
Emmanuelex
Jr. Member
Offline
Activity: 137
Merit: 2
|
|
January 13, 2024, 12:09:05 AM |
|
Meehn... Seems like this is a game for programmers?🤦🏾♂️😁 I'm outta here I guess
|
|
|
|
3dmlib
Jr. Member
Offline
Activity: 44
Merit: 2
|
|
January 20, 2024, 10:29:13 PM Last edit: January 20, 2024, 10:43:07 PM by 3dmlib |
|
There is some tips to speed-up keyhunt-cuda (rotor-cuda): Apply this then you need less grid size, like 4096x512 will be enough for 4090: https://bitcointalk.org/index.php?topic=5244940.msg63526413#msg63526413Also change this: __device__ __noinline__ void CheckHashSEARCH_MODE_SA(uint64_t* px, uint64_t* py, int32_t incr, uint32_t* hash160, uint32_t* out) { switch (mode) { case SEARCH_COMPRESSED: CheckHashCompSEARCH_MODE_SA(px, (uint8_t)(py[0] & 1), incr, hash160, out); break; case SEARCH_UNCOMPRESSED: CheckHashUnCompSEARCH_MODE_SA(px, py, incr, hash160, out); break; case SEARCH_BOTH: CheckHashCompSEARCH_MODE_SA(px, (uint8_t)(py[0] & 1), incr, hash160, out); CheckHashUnCompSEARCH_MODE_SA(px, py, incr, hash160, out); break; } } to this because doing switch-case in kernel is very bad idea: __device__ __noinline__ void CheckHashSEARCH_MODE_SA(uint64_t* px, uint64_t* py, int32_t incr, uint32_t* hash160, uint32_t* out) { CheckHashCompSEARCH_MODE_SA(px, (uint8_t)(py[0] & 1), incr, hash160, out); } also maxFound can be completely removed to search puzzle, because we need only one return result anyway Rotor-cuda speed with this mods: [00:17:10] [CPU+GPU: 6.71 Gk/s] [GPU: 6.71 Gk/s] [C: 36.453247 %] [R: 0] [T: 6,412,923,043,840 (43 bit)] [F: 0] [00:17:11] [CPU+GPU: 6.71 Gk/s] [GPU: 6.71 Gk/s] [C: 36.500549 %] [R: 0] [T: 6,421,244,542,976 (43 bit)] [F: 0] [00:17:12] [CPU+GPU: 6.71 Gk/s] [GPU: 6.71 Gk/s] [C: 36.547852 %] [R: 0] [T: 6,429,566,042,112 (43 bit)] [F: 0] [00:17:13] [CPU+GPU: 6.71 Gk/s] [GPU: 6.71 Gk/s] [C: 36.595154 %] [R: 0] [T: 6,437,887,541,248 (43 bit)] [F: 0] [00:17:15] [CPU+GPU: 6.71 Gk/s] [GPU: 6.71 Gk/s] [C: 36.642456 %] [R: 0] [T: 6,446,209,040,384 (43 bit)] [F: 0] [00:17:16] [CPU+GPU: 6.72 Gk/s] [GPU: 6.72 Gk/s] [C: 36.689758 %] [R: 0] [T: 6,454,530,539,520 (43 bit)] [F: 0] [00:17:17] [CPU+GPU: 6.72 Gk/s] [GPU: 6.72 Gk/s] [C: 36.737061 %] [R: 0] [T: 6,462,852,038,656 (43 bit)] [F: 0] [00:17:18] [CPU+GPU: 6.72 Gk/s] [GPU: 6.72 Gk/s] [C: 36.784363 %] [R: 0] [T: 6,471,173,537,792 (43 bit)] [F: 0] [00:17:20] [CPU+GPU: 6.72 Gk/s] [GPU: 6.72 Gk/s] [C: 36.831665 %] [R: 0] [T: 6,479,495,036,928 (43 bit)] [F: 0] [00:17:21] [CPU+GPU: 6.71 Gk/s] [GPU: 6.71 Gk/s] [C: 36.878967 %] [R: 0] [T: 6,487,816,536,064 (43 bit)] [F: 0] Thanks.
|
|
|
|
WanderingPhilospher
Full Member
Offline
Activity: 1204
Merit: 237
Shooters Shoot...
|
|
January 20, 2024, 10:57:36 PM |
|
There is some tips to speed-up keyhunt-cuda (rotor-cuda): Apply this then you need less grid size, like 4096x512 will be enough for 4090: https://bitcointalk.org/index.php?topic=5244940.msg63526413#msg63526413Also change this: __device__ __noinline__ void CheckHashSEARCH_MODE_SA(uint64_t* px, uint64_t* py, int32_t incr, uint32_t* hash160, uint32_t* out) { switch (mode) { case SEARCH_COMPRESSED: CheckHashCompSEARCH_MODE_SA(px, (uint8_t)(py[0] & 1), incr, hash160, out); break; case SEARCH_UNCOMPRESSED: CheckHashUnCompSEARCH_MODE_SA(px, py, incr, hash160, out); break; case SEARCH_BOTH: CheckHashCompSEARCH_MODE_SA(px, (uint8_t)(py[0] & 1), incr, hash160, out); CheckHashUnCompSEARCH_MODE_SA(px, py, incr, hash160, out); break; } } to this because doing switch-case in kernel is very bad idea: __device__ __noinline__ void CheckHashSEARCH_MODE_SA(uint64_t* px, uint64_t* py, int32_t incr, uint32_t* hash160, uint32_t* out) { CheckHashCompSEARCH_MODE_SA(px, (uint8_t)(py[0] & 1), incr, hash160, out); } also maxFound can be completely removed to search puzzle, because we need only one return result anyway Rotor-cuda speed with this mods: [00:17:10] [CPU+GPU: 6.71 Gk/s] [GPU: 6.71 Gk/s] [C: 36.453247 %] [R: 0] [T: 6,412,923,043,840 (43 bit)] [F: 0] [00:17:11] [CPU+GPU: 6.71 Gk/s] [GPU: 6.71 Gk/s] [C: 36.500549 %] [R: 0] [T: 6,421,244,542,976 (43 bit)] [F: 0] [00:17:12] [CPU+GPU: 6.71 Gk/s] [GPU: 6.71 Gk/s] [C: 36.547852 %] [R: 0] [T: 6,429,566,042,112 (43 bit)] [F: 0] [00:17:13] [CPU+GPU: 6.71 Gk/s] [GPU: 6.71 Gk/s] [C: 36.595154 %] [R: 0] [T: 6,437,887,541,248 (43 bit)] [F: 0] [00:17:15] [CPU+GPU: 6.71 Gk/s] [GPU: 6.71 Gk/s] [C: 36.642456 %] [R: 0] [T: 6,446,209,040,384 (43 bit)] [F: 0] [00:17:16] [CPU+GPU: 6.72 Gk/s] [GPU: 6.72 Gk/s] [C: 36.689758 %] [R: 0] [T: 6,454,530,539,520 (43 bit)] [F: 0] [00:17:17] [CPU+GPU: 6.72 Gk/s] [GPU: 6.72 Gk/s] [C: 36.737061 %] [R: 0] [T: 6,462,852,038,656 (43 bit)] [F: 0] [00:17:18] [CPU+GPU: 6.72 Gk/s] [GPU: 6.72 Gk/s] [C: 36.784363 %] [R: 0] [T: 6,471,173,537,792 (43 bit)] [F: 0] [00:17:20] [CPU+GPU: 6.72 Gk/s] [GPU: 6.72 Gk/s] [C: 36.831665 %] [R: 0] [T: 6,479,495,036,928 (43 bit)] [F: 0] [00:17:21] [CPU+GPU: 6.71 Gk/s] [GPU: 6.71 Gk/s] [C: 36.878967 %] [R: 0] [T: 6,487,816,536,064 (43 bit)] [F: 0] Thanks. What was the speed before and which version of Rotor-cuda are you using? One checks symmetry/endos, etc. one does not. The one that checks endos, is not good for the puzzle and the speed is misleading.
|
|
|
|
3dmlib
Jr. Member
Offline
Activity: 44
Merit: 2
|
|
January 21, 2024, 09:28:53 AM |
|
What was the speed before and which version of Rotor-cuda are you using? One checks symmetry/endos, etc. one does not. The one that checks endos, is not good for the puzzle and the speed is misleading.
Speed before my mods was about 6.38 Gk/s. I think I used this one: https://github.com/Vladimir855/Rotor-CudaIs any better version available? Thanks.
|
|
|
|
|