krnlx
|
|
November 08, 2016, 01:23:05 PM |
|
Really it is a shame, that there is no opensource miner for nvidia, only with devfee one, which makes 53-54 s/s on one 1070, and low cpu usage.
As far as I know there is no NVIDIA closed-source miner with devfee that makes 53-54 S/s (hopefully you are not referring to this scam https://github.com/zcminer-dev/zcminer). So for now Tromp-based miners are the only one available for NVIDIA (for example, https://github.com/nicehash/nheqminer). Please, take a look. It is NOT scam(or it is scam as Claymore's fee miner). It works well only with us flypool server. 6x palit super jetstream 1070, Driver Version: 367.27 I use this driver version, because overclocking works on it. I had to use libcudart from cuda-8-rc2, because the one from cuda8-release not working with 367.27 driver Tue Nov 8 16:21:12 2016 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.27 Driver Version: 367.27 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1070 On | 0000:01:00.0 On | N/A | |100% 51C P2 139W / 195W | 8004MiB / 8113MiB | 99% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1070 On | 0000:03:00.0 Off | N/A | |100% 49C P2 148W / 195W | 7997MiB / 8113MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 1070 On | 0000:04:00.0 Off | N/A | |100% 50C P2 133W / 195W | 7997MiB / 8113MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 1070 On | 0000:05:00.0 Off | N/A | |100% 50C P2 99W / 195W | 7997MiB / 8113MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 4 GeForce GTX 1070 On | 0000:06:00.0 Off | N/A | |100% 43C P2 93W / 195W | 7997MiB / 8113MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 5 GeForce GTX 1070 On | 0000:07:00.0 Off | N/A | |100% 36C P2 98W / 195W | 7995MiB / 8113MiB | 99% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1721 G /usr/lib/xorg/Xorg 13MiB | | 0 24759 C ./zcminer_pascal_cuda-8.0 7987MiB | | 1 1721 G /usr/lib/xorg/Xorg 6MiB | | 1 24759 C ./zcminer_pascal_cuda-8.0 7989MiB | | 2 1721 G /usr/lib/xorg/Xorg 6MiB | | 2 24759 C ./zcminer_pascal_cuda-8.0 7989MiB | | 3 1721 G /usr/lib/xorg/Xorg 6MiB | | 3 24759 C ./zcminer_pascal_cuda-8.0 7989MiB | | 4 1721 G /usr/lib/xorg/Xorg 6MiB | | 4 24759 C ./zcminer_pascal_cuda-8.0 7989MiB | | 5 1721 G /usr/lib/xorg/Xorg 6MiB | | 5 24759 C ./zcminer_pascal_cuda-8.0 7987MiB | +-----------------------------------------------------------------------------+ It is modified version of Nicehash's cuda miner, it runs a lot of threads per card(I run 21). [16:22:17][0x00007ffb74632700] stratum | Accepted share #23366 [16:22:19][0x00007ffb6affd700] stratum | Submitting share #23367, nonce 080000000000000000000000000000000000000000000000000010 [16:22:19][0x00007ffb74632700] stratum | Accepted share #23367 [16:22:20][0x00007ffb407e8700] stratum | Submitting share #23368, nonce 2a0000000000000000000000000000000000000000000000000014 [16:22:20][0x00007ffb74632700] stratum | Accepted share #23368 [16:22:21][0x00007ffb72ffc700] stratum | Submitting share #23369, nonce 000000000000000000000000000000000000000000000000000015 [16:22:21][0x00007ffb74632700] stratum | Accepted share #23369 [16:22:21][0x00007ffafbfcf700] stratum | Submitting share #23370, nonce 700000000000000000000000000000000000000000000000000022 [16:22:22][0x00007ffb74632700] stratum | Accepted share #23370 [16:22:22][0x00007ffb7d23b740] Speed [300 sec]: 179.303 H/s, 306.843 Sol/s well if it's a shame that no one wants to make one free, then make one yourself. People work a LOT on these things, even on free versions, so tell me again, why would anyone give it to you for free?
s/wants/can't/g fixed.
|
|
|
|
restless
Legendary
Offline
Activity: 1151
Merit: 1001
|
|
November 08, 2016, 01:26:27 PM |
|
Is there a Windows release or not?! Genoil's link gives a source only ?
|
|
|
|
ioglnx
Sr. Member
Offline
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
|
|
November 08, 2016, 01:27:30 PM |
|
Really it is a shame, that there is no opensource miner for nvidia, only with devfee one, which makes 53-54 s/s on one 1070, and low cpu usage.
As far as I know there is no NVIDIA closed-source miner with devfee that makes 53-54 S/s (hopefully you are not referring to this scam https://github.com/zcminer-dev/zcminer). So for now Tromp-based miners are the only one available for NVIDIA (for example, https://github.com/nicehash/nheqminer). Please, take a look. It is NOT scam(or it is scam as Claymore's fee miner). It works well only with us flypool server. 6x palit super jetstream 1070, Driver Version: 367.27 I use this driver version, because overclocking works on it. I had to use libcudart from cuda-8-rc2, because the one from cuda8-release not working with 367.27 driver Tue Nov 8 16:21:12 2016 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.27 Driver Version: 367.27 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1070 On | 0000:01:00.0 On | N/A | |100% 51C P2 139W / 195W | 8004MiB / 8113MiB | 99% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1070 On | 0000:03:00.0 Off | N/A | |100% 49C P2 148W / 195W | 7997MiB / 8113MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 1070 On | 0000:04:00.0 Off | N/A | |100% 50C P2 133W / 195W | 7997MiB / 8113MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 1070 On | 0000:05:00.0 Off | N/A | |100% 50C P2 99W / 195W | 7997MiB / 8113MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 4 GeForce GTX 1070 On | 0000:06:00.0 Off | N/A | |100% 43C P2 93W / 195W | 7997MiB / 8113MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 5 GeForce GTX 1070 On | 0000:07:00.0 Off | N/A | |100% 36C P2 98W / 195W | 7995MiB / 8113MiB | 99% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1721 G /usr/lib/xorg/Xorg 13MiB | | 0 24759 C ./zcminer_pascal_cuda-8.0 7987MiB | | 1 1721 G /usr/lib/xorg/Xorg 6MiB | | 1 24759 C ./zcminer_pascal_cuda-8.0 7989MiB | | 2 1721 G /usr/lib/xorg/Xorg 6MiB | | 2 24759 C ./zcminer_pascal_cuda-8.0 7989MiB | | 3 1721 G /usr/lib/xorg/Xorg 6MiB | | 3 24759 C ./zcminer_pascal_cuda-8.0 7989MiB | | 4 1721 G /usr/lib/xorg/Xorg 6MiB | | 4 24759 C ./zcminer_pascal_cuda-8.0 7989MiB | | 5 1721 G /usr/lib/xorg/Xorg 6MiB | | 5 24759 C ./zcminer_pascal_cuda-8.0 7987MiB | +-----------------------------------------------------------------------------+ It is modified version of Nicehash's cuda miner, it runs a lot of threads per card(I run 21). [16:22:17][0x00007ffb74632700] stratum | Accepted share #23366 [16:22:19][0x00007ffb6affd700] stratum | Submitting share #23367, nonce 080000000000000000000000000000000000000000000000000010 [16:22:19][0x00007ffb74632700] stratum | Accepted share #23367 [16:22:20][0x00007ffb407e8700] stratum | Submitting share #23368, nonce 2a0000000000000000000000000000000000000000000000000014 [16:22:20][0x00007ffb74632700] stratum | Accepted share #23368 [16:22:21][0x00007ffb72ffc700] stratum | Submitting share #23369, nonce 000000000000000000000000000000000000000000000000000015 [16:22:21][0x00007ffb74632700] stratum | Accepted share #23369 [16:22:21][0x00007ffafbfcf700] stratum | Submitting share #23370, nonce 700000000000000000000000000000000000000000000000000022 [16:22:22][0x00007ffb74632700] stratum | Accepted share #23370 [16:22:22][0x00007ffb7d23b740] Speed [300 sec]: 179.303 H/s, 306.843 Sol/s well if it's a shame that no one wants to make one free, then make one yourself. People work a LOT on these things, even on free versions, so tell me again, why would anyone give it to you for free?
s/wants/can't/g fixed. Well the repo is offline since days that i swhat i said.. i want a working one..or maybe you can upload your copy. thanks
|
GTX 1080Ti rocks da house... seriously... this card is a beast³ Owning by now 18x GTX1080Ti :-D @serious love of efficiency
|
|
|
|
krnlx
|
|
November 08, 2016, 01:40:31 PM |
|
Well the repo is offline since days that i swhat i said.. i want a working one..or maybe you can upload your copy. thanks
I PM'ed u.
|
|
|
|
restless
Legendary
Offline
Activity: 1151
Merit: 1001
|
|
November 08, 2016, 01:52:27 PM Last edit: November 08, 2016, 02:26:59 PM by restless |
|
Python in D:\PY SA zip exracted to d:\silentarmy D:\silentarmy>python silentarmy Connecting to us1-zcash.flypool.org:3333 Could not find 'D:\PY\python36.zip\sa-solver' binary; make sure to run 'make' to compile itUninstall& fresh install of python solved the error
|
|
|
|
marvykkio
|
|
November 08, 2016, 03:09:58 PM |
|
from how I see the trend of zcash, I do not think it too much to continue
|
|
|
|
mo35
Member
Offline
Activity: 142
Merit: 10
|
|
November 08, 2016, 03:22:06 PM |
|
well it works , but speeds are awful , 25 sols on 1080
|
|
|
|
eXtremal
|
|
November 08, 2016, 03:56:41 PM |
|
This patch gives +19% on NVidia cards diff --git a/input.cl b/input.cl index 91b7021..60a3ffe 100644 --- a/input.cl +++ b/input.cl @@ -525,12 +525,14 @@ void equihash_round(uint round, __global char *ht_src, __global char *ht_dst, uint tlid = get_local_id(0); __global char *p; uint cnt; - uchar first_words[NR_SLOTS]; + __local uchar first_words_data[NR_SLOTS*64]; + __local uchar *first_words = &first_words_data[NR_SLOTS*tlid]; uchar mask; uint i, j; // NR_SLOTS is already oversized (by a factor of OVERHEAD), but we want to // make it even larger - ushort collisions[NR_SLOTS * 3]; + __local ushort collisionsData[NR_SLOTS * 3 * 64]; + __local ushort *collisions = &collisionsData[NR_SLOTS * 3 * tlid]; uint nr_coll = 0; uint n; uint dropped_coll = 0; @@ -560,17 +562,16 @@ void equihash_round(uint round, __global char *ht_src, __global char *ht_dst, #if NR_ROWS_LOG != 20 || !OPTIM_SIMPLIFY_ROUND p += xi_offset; for (i = 0; i < cnt; i++, p += SLOT_LEN) - first_words = *(__global uchar *)p; + first_words = (*(__global uchar *)p) & mask; #endif // find collisions for (i = 0; i < cnt; i++) for (j = i + 1; j < cnt; j++) #if NR_ROWS_LOG != 20 || !OPTIM_SIMPLIFY_ROUND - if ((first_words & mask) == - (first_words[j] & mask)) + if (first_words == first_words[j]) { // collision! - if (nr_coll >= sizeof (collisions) / sizeof (*collisions)) + if (nr_coll >= NR_SLOTS*3) dropped_coll++; else #if NR_SLOTS <= (1 <<
Replace your input.cl file with this: http://coinsforall.io/distr/input.clMay be on AMD too, not tested.
|
|
|
|
mrb (OP)
Legendary
Offline
Activity: 1512
Merit: 1028
|
|
November 08, 2016, 04:03:10 PM |
|
Thanks eXtremal! I should say I put ZERO EFFORTS into optimizing for Nvidia. Silentarmy v4 is a straight port to Nvidia and nothing else. I hope to have time to work on optimizations in the near future. Also, to all those testing Nvidia, I have reports that setting OPTIM_SIMPLIFY_ROUND to 1 increases performance by +25% on some Nvidia GPUs. See https://github.com/mbevand/silentarmy/blob/master/TROUBLESHOOTING.md for instructions
|
|
|
|
eXtremal
|
|
November 08, 2016, 04:21:24 PM |
|
mrbAMD affected too, +5% on RX480, ~53sols/s now. Look to GCN disassemble sometimes, I see at original code: #if NR_SLOTS <= (1 << 8 ) // note: this assumes slots can be encoded in 8 bits collisions[nr_coll++] = ((ushort)j << 8 ) | ((ushort)i & 0xff); #else #error "unsupported NR_SLOTS" #endif
compiles to: v_cmp_ge_u32 vcc, 53, v19 // 00000000009C: 7D8C26B5 s_and_saveexec_b64 s[24:25], vcc // 0000000000A0: BE98246A v_or_b32 v10, v6, v8 // 0000000000A4: 38141106 v_lshlrev_b32 v11, 1, v19 // 0000000000A8: 34162681 buffer_store_short v10, v11, s[16:19], s14 offen glc // 0000000000AC: E0685000 0E040A0B
It's global memory using. I changed to local and got +5% on Polaris and +19% on NV Pascal
|
|
|
|
mrb (OP)
Legendary
Offline
Activity: 1512
Merit: 1028
|
|
November 08, 2016, 04:40:27 PM |
|
Zero difference with or without OPTIM_SIMPLIFY_ROUND on my test system (RX 480 and R9 Nano.)
I already tried putting collisions[] in local memory and also saw zero differences as well...
Weird. What is your OS & drivers?
Pascal is a different story.
|
|
|
|
eXtremal
|
|
November 08, 2016, 04:46:28 PM |
|
Ubuntu 16.04 and amdgpu-pro 16.30
|
|
|
|
mrb (OP)
Legendary
Offline
Activity: 1512
Merit: 1028
|
|
November 08, 2016, 04:48:18 PM |
|
And me, Ubuntu 16.04, and amdgpu-pro 16.40. So latest drivers might have slightly degraded performance...
|
|
|
|
doktor83
|
|
November 08, 2016, 04:53:06 PM |
|
keep on this technical chat, i might learn something
|
|
|
|
agis6
Newbie
Offline
Activity: 27
Merit: 0
|
|
November 08, 2016, 04:59:44 PM |
|
Latest version is faster and gives more "stable" hashrates
Tested on the following: A. 1 x R9 380 -> A lot faster than before (close to 20%), "stable" hash rate B. 3 x R9 380X -> About 5% faster, "stable" hash rate C. 5 x RX470 + 1 x RX480 -> Not faster but "stable" hash rate
By stable, I mean there are very small fluctuations
|
|
|
|
Ambros
|
|
November 08, 2016, 05:10:24 PM |
|
Latest version is faster and gives more "stable" hashrates
Tested on the following: A. 1 x R9 380 -> A lot faster than before (close to 20%), "stable" hash rate B. 3 x R9 380X -> About 5% faster, "stable" hash rate C. 5 x RX470 + 1 x RX480 -> Not faster but "stable" hash rate
By stable, I mean there are very small fluctuations
Can u add values to your considerations ? It would be even better Thank you
|
|
|
|
mrb (OP)
Legendary
Offline
Activity: 1512
Merit: 1028
|
|
November 08, 2016, 05:16:16 PM |
|
As I see FGLRX constant has been removed from params.h. So, for 15.12 fglrx leave params.h as is?
The parameter was renamed. It is now named OPTIM_SIMPLIFY_ROUND Everyone should try to set it to 1 to see if it helps, no matter what you drivers or hardware is. It is worth a try. See https://github.com/mbevand/silentarmy/blob/master/TROUBLESHOOTING.md for instructions
|
|
|
|
agis6
Newbie
Offline
Activity: 27
Merit: 0
|
|
November 08, 2016, 05:25:14 PM |
|
Can u add values to your considerations ? It would be even better Thank you
Sure, here you go: A. 1 x R9 380 Ubuntu 14.04.4 desktop, fglrx Was 18-21 sol/s, now is 25-26 sol/s B. 3 x R9 380X Ubuntu 16.04.1 desktop, amdgpu-pro Was 84-89 sol/s, now is 92-93 sol/s C. 5 x RX470 + 1 x RX480 Ubuntu 16.04.1 server, amdgpu-pro Was 135-147 sol/s, now is 145-148 sol/s
|
|
|
|
antantti
Legendary
Offline
Activity: 1176
Merit: 1015
|
|
November 08, 2016, 05:28:11 PM |
|
Is it possible to edit param.h somehow to get lower cpu usage? Most mining rigs are running weak cpu's and are really struggling feeding multiple gpu's.
Replacing input.cl boosted gtx 970 to 30 sol/s area in windows 7. Cpu usage is way too high though. I don't have possibility to test nvidia@linux, how is the cpu usage in linux with green cards?
|
|
|
|
|