Bitcoin Forum
March 26, 2017, 05:13:55 PM *
News: Latest stable version of Bitcoin Core: 0.14.0  [Torrent]. (New!)
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 [47]
  Print  
Author Topic: Gateless Gate: zawawa's open-source ZEC/ETH/XMR/PASC miner (250 S/s on RX 480)  (Read 76024 times)
sp_
Hero Member
*****
Online Online

Activity: 980

Ccminer developer


View Profile
March 24, 2017, 06:41:38 AM
 #921

GG is finally running faster with the parallelized Round 0.
I fused Round 0 with Rounds 7 and 8 to alleviate cache contamination and to improve the cache hit ratio for the next Round 1. I could even merge it with the solution-searching kernel for better results.
Good stuff.

So how much faster?

BTC: 1CTiNJyoUmbdMRACtteRWXhGqtSETYd6Vd
1490548435
Hero Member
*
Offline Offline

Posts: 1490548435

View Profile Personal Message (Offline)

Ignore
1490548435
Reply with quote  #2

1490548435
Report to moderator
1490548435
Hero Member
*
Offline Offline

Posts: 1490548435

View Profile Personal Message (Offline)

Ignore
1490548435
Reply with quote  #2

1490548435
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1490548435
Hero Member
*
Offline Offline

Posts: 1490548435

View Profile Personal Message (Offline)

Ignore
1490548435
Reply with quote  #2

1490548435
Report to moderator
djeZo
Sr. Member
****
Offline Offline

Activity: 350


View Profile
March 24, 2017, 10:22:20 AM
 #922

GG is finally running faster with the parallelized Round 0.
I fused Round 0 with Rounds 7 and 8 to alleviate cache contamination and to improve the cache hit ratio for the next Round 1. I could even merge it with the solution-searching kernel for better results.
Good stuff.

How much speed you gained that way?

sp_
Hero Member
*****
Online Online

Activity: 980

Ccminer developer


View Profile
March 24, 2017, 10:35:21 AM
 #923

His changes are opensource. You can compile and check.


https://github.com/zawawawa/gatelessgate/network


BTC: 1CTiNJyoUmbdMRACtteRWXhGqtSETYd6Vd
djeZo
Sr. Member
****
Offline Offline

Activity: 350


View Profile
March 24, 2017, 10:50:07 AM
 #924

His changes are opensource. You can compile and check.


https://github.com/zawawawa/gatelessgate/network



Don't have AMD cards so...

zawawa
Full Member
***
Offline Offline

Activity: 196


View Profile
March 24, 2017, 01:17:39 PM
 #925

GG is finally running faster with the parallelized Round 0.
I fused Round 0 with Rounds 7 and 8 to alleviate cache contamination and to improve the cache hit ratio for the next Round 1. I could even merge it with the solution-searching kernel for better results.
Good stuff.

So how much faster?

Around 5% increase in speed at this point.
I am thinking about dynamic compilation of the kernel to simplify the blake2b calculations as nerdralph suggested.
Probably Optiminer is doing this already anyway.

BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ (for Gateless Gate, an open-source Zcash miner https://github.com/zawawawa/gatelessgate )
BTC: 1PKWibaoZBKk2ZSqch3gphrv9Fz1wFrfSX (specifically for NVIDIA support of Gateless Gate)
nerdralph
Full Member
***
Offline Offline

Activity: 238


View Profile
March 24, 2017, 01:46:30 PM
 #926

GG is finally running faster with the parallelized Round 0.
I fused Round 0 with Rounds 7 and 8 to alleviate cache contamination and to improve the cache hit ratio for the next Round 1. I could even merge it with the solution-searching kernel for better results.
Good stuff.
You also could try SLC writes to bypass the L2 cache.
zawawa
Full Member
***
Offline Offline

Activity: 196


View Profile
March 25, 2017, 01:05:08 PM
 #927

GG is finally running faster with the parallelized Round 0.
I fused Round 0 with Rounds 7 and 8 to alleviate cache contamination and to improve the cache hit ratio for the next Round 1. I could even merge it with the solution-searching kernel for better results.
Good stuff.
You also could try SLC writes to bypass the L2 cache.


I tried various combinations of SLC/GLC bits, but they didn't work quite well.
I think I figured out how to implement dual mining properly, though. We will see in a bit...

BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ (for Gateless Gate, an open-source Zcash miner https://github.com/zawawawa/gatelessgate )
BTC: 1PKWibaoZBKk2ZSqch3gphrv9Fz1wFrfSX (specifically for NVIDIA support of Gateless Gate)
zawawa
Full Member
***
Offline Offline

Activity: 196


View Profile
March 25, 2017, 04:19:45 PM
 #928

I found that, for an efficient implementation of dual mining with the memory-bound foreground kernel and compute-intense background kernel, you really need to be careful with the number of wavefronts/warps and the timings of kernel launches. The whole point is to keep the foreground and background kernels together on the GPU as long as possible so that they can be switched back and forth without performance penalty, and that would be impossible if there are too many concurrent wavefronts and/or kernel executions are not synchronized properly. Another potential issue to consider is cache pollution by the background kernel as that could also severely degrade the performance of the foreground tasks. I wish I knew all this from the very beginning, but I suppose we all live and learn.

BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ (for Gateless Gate, an open-source Zcash miner https://github.com/zawawawa/gatelessgate )
BTC: 1PKWibaoZBKk2ZSqch3gphrv9Fz1wFrfSX (specifically for NVIDIA support of Gateless Gate)
djeZo
Sr. Member
****
Offline Offline

Activity: 350


View Profile
March 25, 2017, 04:23:29 PM
 #929

I found that, for an efficient implementation of dual mining with the memory-bound foreground kernel and compute-intense background kernel, you really need to be careful with the number of wavefronts/warps and the timings of kernel launches. The whole point is to keep the foreground and background kernels together on the GPU as long as possible so that they can be switched back and forth without performance penalty, and that would be impossible if there are too many concurrent wavefronts and/or kernel executions are not synchronized properly. Another potential issue to consider is cache pollution by the background kernel as that could also severely degrade the performance of the foreground tasks. I wish I knew all this from the very beginning, but I suppose we all live and learn.

So... what you are saying is that it's not worth it or ? Because you said you gained 5% speed with it...

zawawa
Full Member
***
Offline Offline

Activity: 196


View Profile
March 25, 2017, 04:26:56 PM
 #930

I found that, for an efficient implementation of dual mining with the memory-bound foreground kernel and compute-intense background kernel, you really need to be careful with the number of wavefronts/warps and the timings of kernel launches. The whole point is to keep the foreground and background kernels together on the GPU as long as possible so that they can be switched back and forth without performance penalty, and that would be impossible if there are too many concurrent wavefronts and/or kernel executions are not synchronized properly. Another potential issue to consider is cache pollution by the background kernel as that could also severely degrade the performance of the foreground tasks. I wish I knew all this from the very beginning, but I suppose we all live and learn.

So... what you are saying is that it's not worth it or ? Because you said you gained 5% speed with it...

Oh, I think it's totally worth it. That number is with my naive initial implementation. I will give you an update shortly with a new number.

BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ (for Gateless Gate, an open-source Zcash miner https://github.com/zawawawa/gatelessgate )
BTC: 1PKWibaoZBKk2ZSqch3gphrv9Fz1wFrfSX (specifically for NVIDIA support of Gateless Gate)
sp_
Hero Member
*****
Online Online

Activity: 980

Ccminer developer


View Profile
March 25, 2017, 05:33:04 PM
 #931

Another potential issue to consider is cache pollution by the background kernel as that could also severely degrade the performance of the foreground tasks. I wish I knew all this from the very beginning, but I suppose we all live and learn.

On nvidia its bether to move precalc tables into the instruction cache. Bether for dual mining/kernels.

BTC: 1CTiNJyoUmbdMRACtteRWXhGqtSETYd6Vd
sp_
Hero Member
*****
Online Online

Activity: 980

Ccminer developer


View Profile
March 25, 2017, 07:19:39 PM
 #932

And since everything but the nonce is constant for ~2.5 minutes, you can probably move some of the calculations to compile time and generate a new kernel for each new block.  Since you're already building a custom llvm, you can probably get the kernel compile and dispatch time down to a few ms.

You don't need a compiler. You can inject the blake2s precalc into the instructions. Dual mine round0 with self modified code. Should be faster than Claymore and optiminer..

BTC: 1CTiNJyoUmbdMRACtteRWXhGqtSETYd6Vd
zawawa
Full Member
***
Offline Offline

Activity: 196


View Profile
Today at 05:46:58 AM
 #933

And since everything but the nonce is constant for ~2.5 minutes, you can probably move some of the calculations to compile time and generate a new kernel for each new block.  Since you're already building a custom llvm, you can probably get the kernel compile and dispatch time down to a few ms.

You don't need a compiler. You can inject the blake2s precalc into the instructions. Dual mine round0 with self modified code. Should be faster than Claymore and optiminer..


Excellent! I am revising the data structure of the hash table right now.
Let me get to that when I'm done.

BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ (for Gateless Gate, an open-source Zcash miner https://github.com/zawawawa/gatelessgate )
BTC: 1PKWibaoZBKk2ZSqch3gphrv9Fz1wFrfSX (specifically for NVIDIA support of Gateless Gate)
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 [47]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!