Bitcoin Forum
May 29, 2024, 05:08:49 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 [24] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 ... 91 »
  Print  
Author Topic: SILENTARMY v5: Zcash miner, 115 sol/s on R9 Nano, 70 sol/s on GTX 1070  (Read 209264 times)
ioglnx
Sr. Member
****
Offline Offline

Activity: 574
Merit: 250

Fighting mob law and inquisition in this forum


View Profile
November 11, 2016, 08:15:44 PM
 #461

dev said v5 will be a windows version ....  any news about windows release Huh

Sorry I'm still working on more optimizations for now. Windows support has been delayed for now.

Why not merge the Genoil submitted changes to make windows build possible? The longer  you postpone the merge the less is left from his efforts.

GTX 1080Ti rocks da house... seriously... this card is a beast³
Owning by now 18x GTX1080Ti :-D @serious love of efficiency
zawawa
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
November 11, 2016, 08:16:59 PM
 #462

I was able to build SILENTARMY v5 for Windows, but the performance is suboptimal.
If I manage to squeeze the advertised speed on my RX 480's, I will release Windows binaries.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
hagie
Hero Member
*****
Offline Offline

Activity: 792
Merit: 501



View Profile
November 11, 2016, 08:17:05 PM
 #463

Huge thanks to eXtremal for these optimizations. I merged them and released SILENTARMY v5: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md I measured a 2x speedup on some cards like the R9 Nano:
  • 102 sol/s on R9 Nano (up from 54 sol/s)
  • 72 sol/s on RX 480
  • 64 sol/s on GTX 1070

The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck. I was working on packing 8 counters per uint, and reducing branch divergence, but eXtremal was done before me Wink That's the benefit of open source; anyone can improve the code for all.

Sorry in my case the new version does only 50% sols. Maybe a SM3.0 problem ?

Code:
./silentarmy --list
Devices on platform "NVIDIA CUDA":
  ID 0: GRID K520


V4 with only param.h changed:

Code:
~/silentarmy.v4$ ./silentarmy
Connecting to us1-zcash.flypool.org:3333
Stratum server sent us the first job
Mining on 1 device
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 18.0 sol/s [dev0 18.0] 0 shares
Total 15.5 sol/s [dev0 15.5] 0 shares
Total 14.3 sol/s [dev0 14.3] 0 shares
Total 14.2 sol/s [dev0 14.2] 0 shares
Total 16.0 sol/s [dev0 16.0] 0 shares
Total 13.8 sol/s [dev0 13.8] 0 shares
Total 14.1 sol/s [dev0 14.1] 0 shares
Total 14.1 sol/s [dev0 14.1] 0 shares

~/silentarmy.v4$ ./sa-solver
Solving default all-zero 140-byte header
Building program
Hash tables will use 805.3 MB
Running...
Nonce 0000000000000000000000000000000000000000000000000000000000000000: 2 sols
Total 2 solutions in 135.0 ms (14.8 Sol/s)

Code:
~/silentarmy$ ./silentarmy
Connecting to us1-zcash.flypool.org:3333
Stratum server sent us the first job
Mining on 1 device
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 7.0 sol/s [dev0 7.0] 0 shares
Total 6.5 sol/s [dev0 6.5] 0 shares
Total 6.3 sol/s [dev0 6.3] 0 shares
Total 8.0 sol/s [dev0 8.0] 0 shares
Total 7.2 sol/s [dev0 7.2] 0 shares
Total 6.8 sol/s [dev0 6.8] 0 shares
Total 7.3 sol/s [dev0 7.3] 0 shares

~/silentarmy$ ./sa-solver
Solving default all-zero 140-byte header
Building program
Hash tables will use 805.3 MB
Running...
Nonce 0000000000000000000000000000000000000000000000000000000000000000: 2 sols
Total 2 solutions in 220.8 ms (9.1 Sol/s)


Any Idea ?

Regards
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
November 11, 2016, 08:21:43 PM
 #464

nerdralph, any way we can get a binary with the slow CPU fixes?  I believe someone linked to it earlier.  The current one is struggling on my celeron. 

I can't test a fix for a problem I can't reproduce.  Ubuntu 14.04 with fglrx has less than 2% CPU use for each sa-solver instance on my G1840.
yslyung
Legendary
*
Offline Offline

Activity: 1500
Merit: 1002


Mine Mine Mine


View Profile
November 11, 2016, 08:23:09 PM
 #465

WINDOWS version please . . .

i'm sure you'll get donations instead of a closed source with fixed fees ..

thx for the great work mrb
adamvp
Hero Member
*****
Offline Offline

Activity: 1246
Merit: 708



View Profile
November 11, 2016, 08:35:40 PM
 #466

Huge thanks to eXtremal for these optimizations. I merged them and released SILENTARMY v5: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md I measured a 2x speedup on some cards like the R9 Nano:
  • 102 sol/s on R9 Nano (up from 54 sol/s)
  • 72 sol/s on RX 480
  • 64 sol/s on GTX 1070

The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck. I was working on packing 8 counters per uint, and reducing branch divergence, but eXtremal was done before me Wink That's the benefit of open source; anyone can improve the code for all.
for me eXtremal mode seems to be a little better for 380x r9 card..
With his 3moded files I have about 42s/s, your merge gives me about 39s/s...

I am looking for signature campaign Wink pm me
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
November 11, 2016, 08:35:45 PM
 #467

I just realized this uses eXtremal's 4-way first_words hack.  When I previously tested it on AMD it didn't provide any speed increase.
I'm going to try going back the way it was with OPTIM_SIMPLIFY_ROUND to see if it is any faster with the latest changes.
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
November 11, 2016, 08:37:15 PM
 #468

Huge thanks to eXtremal for these optimizations. I merged them and released SILENTARMY v5: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md I measured a 2x speedup on some cards like the R9 Nano:
  • 102 sol/s on R9 Nano (up from 54 sol/s)
  • 72 sol/s on RX 480
  • 64 sol/s on GTX 1070

The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck. I was working on packing 8 counters per uint, and reducing branch divergence, but eXtremal was done before me Wink That's the benefit of open source; anyone can improve the code for all.
for me eXtremal mode seems to be a little better for 380x r9 card..
With his 3moded files I have about 42s/s, your merge gives me about 39s/s...

My 380x with modified Hynix timing gives me almost 50.
mrb (OP)
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
November 11, 2016, 08:39:09 PM
 #469

I just realized this uses eXtremal's 4-way first_words hack.  When I previously tested it on AMD it didn't provide any speed increase.
I'm going to try going back the way it was with OPTIM_SIMPLIFY_ROUND to see if it is any faster with the latest changes.

Yes this loop unrolling does not increase perf. I only merged it in the interest of saving time.
adaseb
Legendary
*
Offline Offline

Activity: 3766
Merit: 1718


CoinPoker.com


View Profile
November 11, 2016, 08:41:11 PM
 #470

Anyone running this on a Tahiti ?

mrb (OP)
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
November 11, 2016, 08:42:51 PM
 #471

dev said v5 will be a windows version ....  any news about windows release Huh

Sorry I'm still working on more optimizations for now. Windows support has been delayed for now.

Why not merge the Genoil submitted changes to make windows build possible? The longer  you postpone the merge the less is left from his efforts.

To my knowledge, his last pull request was breaking things. And neither he nor I had the time to fix them.

I would merge in a heartbeat if someone, anyone, provided a pull request that doesn't break silentarmy.
mrb (OP)
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
November 11, 2016, 08:45:34 PM
 #472

for me eXtremal mode seems to be a little better for 380x r9 card..
With his 3moded files I have about 42s/s, your merge gives me about 39s/s...

Let it warm up. AMD cards are sensitive to temperature and seem to need a few minutes to stabilize.
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
November 11, 2016, 08:48:50 PM
 #473

The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck.

So is the l2 cache thrashing in ht_store your next optimization target?  I know eXtremal is already working on my idea to improve the read performance in equihash_round by using 4x256-byte strides.
A fully-optimized implementation should average one cache line read per equihash_round and 2-3 cache lines of read/write in ht_store.  For a Rx 470 with 7Ghz RAM that's 78 itterations per second or ~13ms of time.  Add ~1ms for the blake2b initialization for round 0 to get a total of 14ms or 71 itterations per second.  If you are correct about 1.9 sols/itteration being optimal, that gives a theoretical 135 solutions/sec, or almost double the current speed.

nevermind41
Full Member
***
Offline Offline

Activity: 186
Merit: 100


View Profile
November 11, 2016, 08:50:08 PM
 #474

Great work. Thank you. The only problem I can't enable overclock feature with this driver. I use coolbits 8 but it didn't effect. Here is default speeds 5 X gtx 1070
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
November 11, 2016, 08:52:24 PM
 #475

Anyone running this on a Tahiti ?

No, but I'm getting 45-50 on Pitcairn clocked at 1100,1550 on the 1375 Samsung strap.  It's looking like these cards will never be going back to eth mining...
zawawa
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
November 11, 2016, 08:53:51 PM
 #476

I am getting pretty good speeds with SILENTARMY v5 and 3 RX 480's on Windows 10.
This is amazing...

Code:
Total 277.6 sol/s [dev0 93.5, dev1 97.5, dev2 86.6] 0 shares
Total 251.1 sol/s [dev0 85.0, dev1 84.0, dev2 82.1] 1 share
Total 263.8 sol/s [dev0 88.8, dev1 90.8, dev2 84.2] 1 share
Total 261.8 sol/s [dev0 87.9, dev1 88.7, dev2 85.2] 1 share
Total 261.1 sol/s [dev0 84.9, dev1 88.8, dev2 87.4] 1 share
Total 263.7 sol/s [dev0 86.3, dev1 89.7, dev2 87.7] 1 share
Total 268.3 sol/s [dev0 87.3, dev1 93.2, dev2 87.8] 3 shares
Total 266.7 sol/s [dev0 86.7, dev1 91.7, dev2 88.4] 3 shares

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
November 11, 2016, 08:54:56 PM
 #477

WINDOWS version please . . .

i'm sure you'll get donations instead of a closed source with fixed fees ..

thx for the great work mrb

Genoil already tried that, and said donations slowed to a trickle after Claymore released his miner.
mrb (OP)
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
November 11, 2016, 08:56:12 PM
 #478

I am getting pretty good speeds with SILENTARMY v5 and 3 RX 480's on Windows 10.

Please do submit your changes adding Windows support Smiley
mrada1204
Newbie
*
Offline Offline

Activity: 28
Merit: 0


View Profile
November 11, 2016, 08:56:30 PM
 #479

I am getting pretty good speeds with SILENTARMY v5 and 3 RX 480's on Windows 10.
This is amazing...

Code:
Total 277.6 sol/s [dev0 93.5, dev1 97.5, dev2 86.6] 0 shares
Total 251.1 sol/s [dev0 85.0, dev1 84.0, dev2 82.1] 1 share
Total 263.8 sol/s [dev0 88.8, dev1 90.8, dev2 84.2] 1 share
Total 261.8 sol/s [dev0 87.9, dev1 88.7, dev2 85.2] 1 share
Total 261.1 sol/s [dev0 84.9, dev1 88.8, dev2 87.4] 1 share
Total 263.7 sol/s [dev0 86.3, dev1 89.7, dev2 87.7] 1 share
Total 268.3 sol/s [dev0 87.3, dev1 93.2, dev2 87.8] 3 shares
Total 266.7 sol/s [dev0 86.7, dev1 91.7, dev2 88.4] 3 shares

could you please share windows version
Mugatu
Member
**
Offline Offline

Activity: 93
Merit: 10


View Profile
November 11, 2016, 08:57:46 PM
 #480

WINDOWS version please . . .

i'm sure you'll get donations instead of a closed source with fixed fees ..

thx for the great work mrb

Genoil already tried that, and said donations slowed to a trickle after Claymore released his miner.


Genoil's miner was/is unstable as hell
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 [24] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 ... 91 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!