krnlx
|
|
November 20, 2016, 09:41:16 PM |
|
why not using vector store...damn it, I few more weeks and I'll get in rails with OpenCL.
vector store address must be aligned by 16 bytes, it is not possible in any round because of different offsets
|
|
|
|
zawawa
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
November 20, 2016, 09:57:52 PM |
|
I have been tweaking disassembled GCN codes of SA's kernels, and there seems to be quite a bit of room for performance enhancements, especially by optimizing global memory access by reordering flat_store_dword and s_waitcnt in ht_store(). @eXtremal, how are your next batch of optimizations coming along? If they are almost ready, I will wait for them. Otherwise, I will optimize the OpenCL kernel myself and then tweak the GCN code.
xor_and_store and ht_store must be rewrited, and joined to one function. unaligned 32 bits reads in xor_and_store -> join in 64bit in half_aligned_long -> 64bit xor in xor_and_store -> on 2,4,6,8 round 256bit shift on xi0xi1xi2xi3 in xor_and_store -> 256bit shift again in ht_store -> split in 32bit, and write in ht_store must be rewrited to: unaligned 32 bits reads - > 32 bit xor -> 256bit shift -> 32 or 64 bit, or vector store or 64 bits reads -> 64 bit xor -> 64 bit 256bit shift -> 64bit or vector store or 64 and 32 bit reads -> 64 and 32 bit xor -> mixed 256bit shift -> 64bit or 32bit or vector store depend on round Excellent suggestions! Let me get to them ASAP.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
eXtremal
|
|
November 20, 2016, 10:04:06 PM Last edit: November 20, 2016, 10:15:20 PM by eXtremal |
|
Speedup +10-15% for NVIdia only: http://coinsforall.io/distr/nvidia/input.clhttp://coinsforall.io/distr/nvidia/param.hSorry, but can't work more than 1 hour a day on miner now. For other developers, you need: - Decrease NR_ROWS_LOG to 13 or 12. ht_store function works much faster with low NR_ROWS values and when you decrease NR_ROWS, you also decrease total slots amount, because you can use less values for OVERHEAD constant. - Optimize equihash round for big NR_SLOTS values. I begin do it in last NVidia release, but need much more work..
|
|
|
|
zawawa
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
November 20, 2016, 10:09:08 PM |
|
No prob, thank you for your contributions!
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
laik2
|
|
November 20, 2016, 10:13:02 PM |
|
No prob, thank you for your contributions! Did you make any progress with AMD?
|
|
|
|
eXtremal
|
|
November 20, 2016, 10:19:33 PM |
|
Did you make any progress with AMD?
Last release don't working on AMD, but if I found a reason, it will be same +10-15% on AMD cards. For more speedup, see my previous post.
|
|
|
|
zawawa
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
November 20, 2016, 10:34:04 PM |
|
Did you make any progress with AMD?
Last release don't working on AMD, but if I found a reason, it will be same +10-15% on AMD cards. For more speedup, see my previous post. Your last release is working on RX 480 with these modifications. Thanks a bunch! // Number of rows and slots is affected by this. 20 offers the best performance // but occasionally misses ~1% of solutions. #ifdef cl_nv_pragma_unroll // NVIDIA #define NR_ROWS_LOG 16 #else #define NR_ROWS_LOG 18 #endif
// Setting this to 1 might make SILENTARMY faster, see TROUBLESHOOTING.md #define OPTIM_SIMPLIFY_ROUND 1
// Number of collision items to track, per thread #ifdef cl_nv_pragma_unroll // NVIDIA #define THREADS_PER_ROW 32 #define ROWS_PER_WORKGROUP (64/THREADS_PER_ROW) #define LDS_COLL_SIZE (NR_SLOTS * 24 * (64 / THREADS_PER_ROW)) #else #define THREADS_PER_ROW 8 #define ROWS_PER_WORKGROUP (64/THREADS_PER_ROW) #define LDS_COLL_SIZE (NR_SLOTS * 8 * (64 / THREADS_PER_ROW)) #endif
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
November 20, 2016, 10:40:43 PM |
|
I am currently getting 100-114 sol/s with RX 480. This is very nice...
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
laik2
|
|
November 20, 2016, 10:46:59 PM |
|
I am currently getting 100-114 sol/s with RX 480. This is very nice...
102/103 here with 2080 OC. Which card do you have?
|
|
|
|
zawawa
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
November 20, 2016, 10:49:10 PM |
|
I pushed recent changes to my repo, including my Win32 multithreading mod: https://github.com/zawawawa/silentarmyNew Windows binaries will be available shortly.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
November 20, 2016, 10:51:44 PM |
|
I am currently getting 100-114 sol/s with RX 480. This is very nice...
102/103 here with 2080 OC. Which card do you have? XFX Black Edition with a modded BIOS.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
November 20, 2016, 10:54:20 PM |
|
I am currently getting 100-114 sol/s with RX 480. This is very nice...
102/103 here with 2080 OC. Which card do you have? Oh, my numbers are with 4 threads per GPU, too. Multithreading seems to be working well so far.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
|
ioglnx
Sr. Member
Offline
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
|
|
November 20, 2016, 11:34:29 PM |
|
@extermal: Nice optimizations I was close to hit the 500sol/s with 2 GTX1080 and 2 GTX1070 :-D Thanks
|
GTX 1080Ti rocks da house... seriously... this card is a beast³ Owning by now 18x GTX1080Ti :-D @serious love of efficiency
|
|
|
Amph
Legendary
Offline
Activity: 3248
Merit: 1070
|
|
November 21, 2016, 07:45:39 AM |
|
is this version adding the extremal addition codes or any improved hashrate besides fixing the known bugs?
|
|
|
|
zawawa
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
November 21, 2016, 07:53:03 AM |
|
is this version adding the extremal addition codes or any improved hashrate besides fixing the known bugs? Yes. I haven't uploaded binaries yet, though. I just got new ideas for optimization. Please wait.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
disman
Newbie
Offline
Activity: 25
Merit: 0
|
|
November 21, 2016, 09:33:28 AM |
|
138 sol - already not interested ... ((
Profit below the plinth.
200+ sol on 1070... one might think
I can see there remained some sportsmen altruists)))
|
|
|
|
Venon
Newbie
Offline
Activity: 51
Merit: 0
|
|
November 21, 2016, 02:41:04 PM |
|
138 sol - already not interested ... ((
Profit below the plinth.
200+ sol on 1070... one might think
I can see there remained some sportsmen altruists)))
That is the reason why this thread is quite now. If you pay $0.25/kWh, it is not profitable to mine ZEC.
|
|
|
|
laik2
|
|
November 21, 2016, 02:44:49 PM |
|
138 sol - already not interested ... ((
Profit below the plinth.
200+ sol on 1070... one might think
I can see there remained some sportsmen altruists)))
That is the reason why this thread is quite now. If you pay $0.25/kWh, it is not profitable to mine ZEC. Yep...switched to ethereum...more profitable right now
|
|
|
|
laik2
|
|
November 21, 2016, 08:40:19 PM |
|
Is someone working on any improvements for AMD, expecially newer RX cards.
|
|
|
|
|