nerdralph, any chance at updated ubuntu binary?
I uploaded one here: https://github.com/mbevand/silentarmy/releasesIt should run on most modern Linux 64-bit distros. Download sa-solver.linux64.v5 rename it to sa-solver and place it in the same directory as the silentarmy script.
|
|
|
@mrb I have waited but it reaches not more as 44... about 42 approx...
Try running with different number of instances ("--instances 1", or 2 or 3)
|
|
|
I am getting pretty good speeds with SILENTARMY v5 and 3 RX 480's on Windows 10.
Please do submit your changes adding Windows support
|
|
|
for me eXtremal mode seems to be a little better for 380x r9 card.. With his 3moded files I have about 42s/s, your merge gives me about 39s/s...
Let it warm up. AMD cards are sensitive to temperature and seem to need a few minutes to stabilize.
|
|
|
dev said v5 will be a windows version .... any news about windows release Sorry I'm still working on more optimizations for now. Windows support has been delayed for now. Why not merge the Genoil submitted changes to make windows build possible? The longer you postpone the merge the less is left from his efforts. To my knowledge, his last pull request was breaking things. And neither he nor I had the time to fix them. I would merge in a heartbeat if someone, anyone, provided a pull request that doesn't break silentarmy.
|
|
|
I just realized this uses eXtremal's 4-way first_words hack. When I previously tested it on AMD it didn't provide any speed increase. I'm going to try going back the way it was with OPTIM_SIMPLIFY_ROUND to see if it is any faster with the latest changes.
Yes this loop unrolling does not increase perf. I only merged it in the interest of saving time.
|
|
|
Nice work guys...currently this performs better on my 470/480s than even claymours miner on windows side. Sucks that he will simply rip off your code again and further optimize his windows miner, while continuing to say "FU" to all his linux customers that supported him for the past year.
Do you guys have a centralized donation address for this project? Ill donate whatever fees claymour would have made to you guys.
Just launch "silentarmy" without pool connect (-c) or user (-u) options and by default it will mine on my donation address (t1cVviFvgJinQ4w3C2m2CfRxgP5DnHYaoFC - also in README.md). But don't forget to also give to eXtremal who is responsible for the latest optims!
|
|
|
dev said v5 will be a windows version .... any news about windows release Sorry I'm still working on more optimizations for now. Windows support has been delayed for now.
|
|
|
Huge thanks to eXtremal for these optimizations. I merged them and released SILENTARMY v5: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md I measured a 2x speedup on some cards like the R9 Nano: - 102 sol/s on R9 Nano (up from 54 sol/s)
- 72 sol/s on RX 480
- 64 sol/s on GTX 1070
The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck. I was working on packing 8 counters per uint, and reducing branch divergence, but eXtremal was done before me That's the benefit of open source; anyone can improve the code for all.
|
|
|
mrb Claymore copied a part of host code and round0 (blake2b) ? I think, only kernel rounds and solutions extract code is valueble, and it's not copied.
I agree, only kernel rounds and solutions extractions are valuable, and there was never any evidence this part of the code was copied. Which is why I keep telling people there is no need to make a fuss about this whole episode
|
|
|
How does this compare in terms of speed with the current github tip of SA?
|
|
|
FINALLY, ONE THAT WORKS ON NVIDIA!!! --
Silentarmy has worked on Nvidia since v4: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.mdOK, and thanks again. KenshiroTheFist has implemented extranonce suscribe for mrb's silentarmy, and posted the code. The miner is working just fine except for the "#xnsub". If you are contributing to the code, or know who is, please encourage them to merge the available "#xnsub" code.
I am aware of the extranonce patch and the only reason I have not merged it already is because it is not high-priority. Silentarmy works just fine as it is on nicehash. What is the #xnsub patch?
|
|
|
I'm seeing a 7-9% speed improvement between 2 GPUs. One is in a 16x slot and the other on a 1x riser. For the card on the 1x riser the speed improvement is ~10%, and for the card in the 16x slot ~5%.
Nice, thanks for confirming.
|
|
|
new vesrion with 45 sols per gtx 1070 can be runned on windows(10 x64) ?
+1 Is it so hard to add windows support ? Sorry I am still not, at the moment, working on Windows. But optimizations and various improvements.
|
|
|
RX480 with amdgpu-pro 16.30 Total 55.8 sol/s [dev0 54.0] 18 shares Total 55.3 sol/s [dev0 52.4] 18 shares Total 55.6 sol/s [dev0 54.7] 18 shares Total 55.9 sol/s [dev0 55.7] 18 shares Total 55.0 sol/s [dev0 55.7] 18 shares Total 55.5 sol/s [dev0 56.2] 18 shares Total 55.2 sol/s [dev0 56.1] 19 shares Total 54.6 sol/s [dev0 54.8] 19 shares Total 54.9 sol/s [dev0 55.3] 19 shares Total 55.1 sol/s [dev0 53.1] 19 shares Total 54.4 sol/s [dev0 52.6] 19 shares
Kernel: http://coinsforall.io/distr/input.cl.coll1 NVidia also have speedup. I reduced number of collisions to found from 5 to 1, it seems 5 is too much, need mrb's comments. Yes, you can reduce collisions from 5 to 1. I meant to do this but forgot about it :-P For the record your input.cl.coll1 on my RX 480 with amdgpu-pro 16.40: Total 40.7 sol/s [dev0 41.9] 3 shares Total 40.7 sol/s [dev0 40.8] 3 shares Total 40.7 sol/s [dev0 40.4] 3 shares Total 41.4 sol/s [dev0 42.8] 3 shares Total 41.6 sol/s [dev0 43.3] 3 shares Total 41.8 sol/s [dev0 45.9] 3 shares Total 42.1 sol/s [dev0 47.7] 3 shares Total 41.8 sol/s [dev0 45.5] 3 shares Total 41.1 sol/s [dev0 44.9] 3 shares Total 41.2 sol/s [dev0 43.5] 4 shares Total 41.1 sol/s [dev0 44.3] 4 shares Total 41.4 sol/s [dev0 44.5] 4 shares Total 41.3 sol/s [dev0 44.7] 4 shares You must have o/c'd. I don't believe 16.30 is 37% faster than 16.40.
|
|
|
But also +4% is something it just shows that optimizations can still be in cooperated
Absolutely. I even care about +1%, so I'll incorporate the changes!
|
|
|
My latest kernel results ( http://coinsforall.io/distr/input.cl), first row - original SA kernel, second - patched. Ubuntu 13.10, Catalyst 14.4, Radeon R9 290 900/1250 (downclocked) Total 29.1 sol/s [dev0 30.2] 4 shares Total 41.1 sol/s [dev0 42.0] 2 shares +40%
Ubuntu 16.04, NVidia 367, GeForce GTX1070 Total 196 solutions in 6588.2 ms (29.8 Sol/s) Total 196 solutions in 5334.1 ms (36.7 Sol/s) +20%
Ubuntu 16.04, amdgpu-pro 16.30, Radeon RX480 Total 50.4 sol/s [dev0 51.0] 4 shares Total 53.1 sol/s [dev0 53.2] 14 shares +4%
FWIW, exact same silentarmy code running on the same machine on an R9 Nano, dual booting into 2 OSes: * 33.2 sol/s with fglrx 2:15.201-0ubuntu0.14.04.1 on Ubuntu 14.04 * 47.4 sol/s with amdgpu-pro 16.40 on Ubuntu 16.04 So yeah, a +40% difference just by changing drivers... No wonder you found a way to rework the OpenCL code to get a +40% on fglrx, and it gets you only +4% on amdgpu-pro. amdgpu-pro compiles the OpenCL code just really, really well on its own, with almost no needs for manual tweaks.
|
|
|
|