ioglnx
Sr. Member
Offline
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
|
|
November 11, 2016, 08:15:44 PM |
|
dev said v5 will be a windows version .... any news about windows release Sorry I'm still working on more optimizations for now. Windows support has been delayed for now. Why not merge the Genoil submitted changes to make windows build possible? The longer you postpone the merge the less is left from his efforts.
|
GTX 1080Ti rocks da house... seriously... this card is a beast³ Owning by now 18x GTX1080Ti :-D @serious love of efficiency
|
|
|
zawawa
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
November 11, 2016, 08:16:59 PM |
|
I was able to build SILENTARMY v5 for Windows, but the performance is suboptimal. If I manage to squeeze the advertised speed on my RX 480's, I will release Windows binaries.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
hagie
|
|
November 11, 2016, 08:17:05 PM |
|
Huge thanks to eXtremal for these optimizations. I merged them and released SILENTARMY v5: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md I measured a 2x speedup on some cards like the R9 Nano: - 102 sol/s on R9 Nano (up from 54 sol/s)
- 72 sol/s on RX 480
- 64 sol/s on GTX 1070
The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck. I was working on packing 8 counters per uint, and reducing branch divergence, but eXtremal was done before me That's the benefit of open source; anyone can improve the code for all. Sorry in my case the new version does only 50% sols. Maybe a SM3.0 problem ? ./silentarmy --list Devices on platform "NVIDIA CUDA": ID 0: GRID K520 V4 with only param.h changed: ~/silentarmy.v4$ ./silentarmy Connecting to us1-zcash.flypool.org:3333 Stratum server sent us the first job Mining on 1 device Total 0.0 sol/s [dev0 0.0] 0 shares Total 18.0 sol/s [dev0 18.0] 0 shares Total 15.5 sol/s [dev0 15.5] 0 shares Total 14.3 sol/s [dev0 14.3] 0 shares Total 14.2 sol/s [dev0 14.2] 0 shares Total 16.0 sol/s [dev0 16.0] 0 shares Total 13.8 sol/s [dev0 13.8] 0 shares Total 14.1 sol/s [dev0 14.1] 0 shares Total 14.1 sol/s [dev0 14.1] 0 shares
~/silentarmy.v4$ ./sa-solver Solving default all-zero 140-byte header Building program Hash tables will use 805.3 MB Running... Nonce 0000000000000000000000000000000000000000000000000000000000000000: 2 sols Total 2 solutions in 135.0 ms (14.8 Sol/s)
~/silentarmy$ ./silentarmy Connecting to us1-zcash.flypool.org:3333 Stratum server sent us the first job Mining on 1 device Total 0.0 sol/s [dev0 0.0] 0 shares Total 7.0 sol/s [dev0 7.0] 0 shares Total 6.5 sol/s [dev0 6.5] 0 shares Total 6.3 sol/s [dev0 6.3] 0 shares Total 8.0 sol/s [dev0 8.0] 0 shares Total 7.2 sol/s [dev0 7.2] 0 shares Total 6.8 sol/s [dev0 6.8] 0 shares Total 7.3 sol/s [dev0 7.3] 0 shares
~/silentarmy$ ./sa-solver Solving default all-zero 140-byte header Building program Hash tables will use 805.3 MB Running... Nonce 0000000000000000000000000000000000000000000000000000000000000000: 2 sols Total 2 solutions in 220.8 ms (9.1 Sol/s)
Any Idea ? Regards
|
|
|
|
nerdralph
|
|
November 11, 2016, 08:21:43 PM |
|
nerdralph, any way we can get a binary with the slow CPU fixes? I believe someone linked to it earlier. The current one is struggling on my celeron.
I can't test a fix for a problem I can't reproduce. Ubuntu 14.04 with fglrx has less than 2% CPU use for each sa-solver instance on my G1840.
|
|
|
|
yslyung
Legendary
Offline
Activity: 1500
Merit: 1002
Mine Mine Mine
|
|
November 11, 2016, 08:23:09 PM |
|
WINDOWS version please . . .
i'm sure you'll get donations instead of a closed source with fixed fees ..
thx for the great work mrb
|
|
|
|
adamvp
|
|
November 11, 2016, 08:35:40 PM |
|
Huge thanks to eXtremal for these optimizations. I merged them and released SILENTARMY v5: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md I measured a 2x speedup on some cards like the R9 Nano: - 102 sol/s on R9 Nano (up from 54 sol/s)
- 72 sol/s on RX 480
- 64 sol/s on GTX 1070
The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck. I was working on packing 8 counters per uint, and reducing branch divergence, but eXtremal was done before me That's the benefit of open source; anyone can improve the code for all. for me eXtremal mode seems to be a little better for 380x r9 card.. With his 3moded files I have about 42s/s, your merge gives me about 39s/s...
|
I am looking for signature campaign pm me
|
|
|
nerdralph
|
|
November 11, 2016, 08:35:45 PM |
|
I just realized this uses eXtremal's 4-way first_words hack. When I previously tested it on AMD it didn't provide any speed increase. I'm going to try going back the way it was with OPTIM_SIMPLIFY_ROUND to see if it is any faster with the latest changes.
|
|
|
|
nerdralph
|
|
November 11, 2016, 08:37:15 PM |
|
Huge thanks to eXtremal for these optimizations. I merged them and released SILENTARMY v5: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md I measured a 2x speedup on some cards like the R9 Nano: - 102 sol/s on R9 Nano (up from 54 sol/s)
- 72 sol/s on RX 480
- 64 sol/s on GTX 1070
The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck. I was working on packing 8 counters per uint, and reducing branch divergence, but eXtremal was done before me That's the benefit of open source; anyone can improve the code for all. for me eXtremal mode seems to be a little better for 380x r9 card.. With his 3moded files I have about 42s/s, your merge gives me about 39s/s... My 380x with modified Hynix timing gives me almost 50.
|
|
|
|
mrb (OP)
Legendary
Offline
Activity: 1512
Merit: 1027
|
|
November 11, 2016, 08:39:09 PM |
|
I just realized this uses eXtremal's 4-way first_words hack. When I previously tested it on AMD it didn't provide any speed increase. I'm going to try going back the way it was with OPTIM_SIMPLIFY_ROUND to see if it is any faster with the latest changes.
Yes this loop unrolling does not increase perf. I only merged it in the interest of saving time.
|
|
|
|
adaseb
Legendary
Offline
Activity: 3766
Merit: 1718
CoinPoker.com
|
|
November 11, 2016, 08:41:11 PM |
|
Anyone running this on a Tahiti ?
|
|
|
|
mrb (OP)
Legendary
Offline
Activity: 1512
Merit: 1027
|
|
November 11, 2016, 08:42:51 PM |
|
dev said v5 will be a windows version .... any news about windows release Sorry I'm still working on more optimizations for now. Windows support has been delayed for now. Why not merge the Genoil submitted changes to make windows build possible? The longer you postpone the merge the less is left from his efforts. To my knowledge, his last pull request was breaking things. And neither he nor I had the time to fix them. I would merge in a heartbeat if someone, anyone, provided a pull request that doesn't break silentarmy.
|
|
|
|
mrb (OP)
Legendary
Offline
Activity: 1512
Merit: 1027
|
|
November 11, 2016, 08:45:34 PM |
|
for me eXtremal mode seems to be a little better for 380x r9 card.. With his 3moded files I have about 42s/s, your merge gives me about 39s/s...
Let it warm up. AMD cards are sensitive to temperature and seem to need a few minutes to stabilize.
|
|
|
|
nerdralph
|
|
November 11, 2016, 08:48:50 PM |
|
The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck.
So is the l2 cache thrashing in ht_store your next optimization target? I know eXtremal is already working on my idea to improve the read performance in equihash_round by using 4x256-byte strides. A fully-optimized implementation should average one cache line read per equihash_round and 2-3 cache lines of read/write in ht_store. For a Rx 470 with 7Ghz RAM that's 78 itterations per second or ~13ms of time. Add ~1ms for the blake2b initialization for round 0 to get a total of 14ms or 71 itterations per second. If you are correct about 1.9 sols/itteration being optimal, that gives a theoretical 135 solutions/sec, or almost double the current speed.
|
|
|
|
nevermind41
|
|
November 11, 2016, 08:50:08 PM |
|
Great work. Thank you. The only problem I can't enable overclock feature with this driver. I use coolbits 8 but it didn't effect. Here is default speeds 5 X gtx 1070
|
|
|
|
nerdralph
|
|
November 11, 2016, 08:52:24 PM |
|
Anyone running this on a Tahiti ?
No, but I'm getting 45-50 on Pitcairn clocked at 1100,1550 on the 1375 Samsung strap. It's looking like these cards will never be going back to eth mining...
|
|
|
|
zawawa
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
November 11, 2016, 08:53:51 PM |
|
I am getting pretty good speeds with SILENTARMY v5 and 3 RX 480's on Windows 10. This is amazing... Total 277.6 sol/s [dev0 93.5, dev1 97.5, dev2 86.6] 0 shares Total 251.1 sol/s [dev0 85.0, dev1 84.0, dev2 82.1] 1 share Total 263.8 sol/s [dev0 88.8, dev1 90.8, dev2 84.2] 1 share Total 261.8 sol/s [dev0 87.9, dev1 88.7, dev2 85.2] 1 share Total 261.1 sol/s [dev0 84.9, dev1 88.8, dev2 87.4] 1 share Total 263.7 sol/s [dev0 86.3, dev1 89.7, dev2 87.7] 1 share Total 268.3 sol/s [dev0 87.3, dev1 93.2, dev2 87.8] 3 shares Total 266.7 sol/s [dev0 86.7, dev1 91.7, dev2 88.4] 3 shares
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
nerdralph
|
|
November 11, 2016, 08:54:56 PM |
|
WINDOWS version please . . .
i'm sure you'll get donations instead of a closed source with fixed fees ..
thx for the great work mrb
Genoil already tried that, and said donations slowed to a trickle after Claymore released his miner.
|
|
|
|
mrb (OP)
Legendary
Offline
Activity: 1512
Merit: 1027
|
|
November 11, 2016, 08:56:12 PM |
|
I am getting pretty good speeds with SILENTARMY v5 and 3 RX 480's on Windows 10.
Please do submit your changes adding Windows support
|
|
|
|
mrada1204
Newbie
Offline
Activity: 28
Merit: 0
|
|
November 11, 2016, 08:56:30 PM |
|
I am getting pretty good speeds with SILENTARMY v5 and 3 RX 480's on Windows 10. This is amazing... Total 277.6 sol/s [dev0 93.5, dev1 97.5, dev2 86.6] 0 shares Total 251.1 sol/s [dev0 85.0, dev1 84.0, dev2 82.1] 1 share Total 263.8 sol/s [dev0 88.8, dev1 90.8, dev2 84.2] 1 share Total 261.8 sol/s [dev0 87.9, dev1 88.7, dev2 85.2] 1 share Total 261.1 sol/s [dev0 84.9, dev1 88.8, dev2 87.4] 1 share Total 263.7 sol/s [dev0 86.3, dev1 89.7, dev2 87.7] 1 share Total 268.3 sol/s [dev0 87.3, dev1 93.2, dev2 87.8] 3 shares Total 266.7 sol/s [dev0 86.7, dev1 91.7, dev2 88.4] 3 shares
could you please share windows version
|
|
|
|
Mugatu
Member
Offline
Activity: 93
Merit: 10
|
|
November 11, 2016, 08:57:46 PM |
|
WINDOWS version please . . .
i'm sure you'll get donations instead of a closed source with fixed fees ..
thx for the great work mrb
Genoil already tried that, and said donations slowed to a trickle after Claymore released his miner. Genoil's miner was/is unstable as hell
|
|
|
|
|