laik2
|
|
November 10, 2016, 04:53:03 PM |
|
I think you should keep improving speed rather than support windows for now, because they have claymore miner already with decent speeds
while as myself on linux we stuck with the half of their speed on windows with claymore miner i just got around 45~49 per card for modded rx 470....
I hope to consider this in the roadmap....
Just booted windows disk and tested newer Claymore 4.0 // GPU #0: Ellesmere, 8192 MB available, 36 compute units GPU #1: Ellesmere, 8192 MB available, 36 compute units ZEC - Total Speed: 135.354 H/s, Total Shares: 151, Rejected: 1, Time: 00:05 ZEC: GPU0 68.841 H/s, GPU1 66.452 H/s Pool switches: ZEC - 0 Current ZEC pool share target: 0x0025d4c3 (diff: 1732H) GPU1 t=49C fan=62%, GPU2 t=60C fan=65% // This is with intensity set to 2 and my quad amd APU A8-7600 is 50% loaded. Using -i 0 I get almost equal results with silentarmy's miner. Until someone finds different approach for gaining more Sol/s results will not be much different...I would love to help but me myself not coder at all ... I can otherwise help with packing and better newbies support. I also don't think that windows should be considered priority...if you have 1/2 rigs...OK but headless rigs are on the other hand preferred and easily monitored.
|
|
|
|
Dr_Victor
|
|
November 10, 2016, 05:56:08 PM |
|
When can we get a windows version?
|
yobit.net is banned from signatures
|
|
|
osnwt
|
|
November 10, 2016, 06:11:29 PM |
|
Just booted windows disk and tested newer Claymore 4.0 GPU #0: Ellesmere, 8192 MB available, 36 compute units GPU #1: Ellesmere, 8192 MB available, 36 compute units ZEC - Total Speed: 135.354 H/s, Total Shares: 151, Rejected: 1, Time: 00:05 ZEC: GPU0 68.841 H/s, GPU1 66.452 H/s Pool switches: ZEC - 0 Current ZEC pool share target: 0x0025d4c3 (diff: 1732H) GPU1 t=49C fan=62%, GPU2 t=60C fan=65% // This is with intensity set to 2 and my quad amd APU A8-7600 is 50% loaded. Using -i 0 I get almost equal results with silentarmy's miner. CZM 4.0: PowerColor 390X Devil (hybrid cooling) OC'ed: 100 Sol/s for primary card. 93-98 for others (and I don't know why - all risers are x1, Gen.2 PCI-e mode, CPU load with -i 0 is low, using 1-2 slowdowns with my weak G3240 CPU. Agreed we need a Linux miner first, Windows users are happy with CZM. PS. Host is Windows, just forgot to update the label in my web monitor config.
|
|
|
|
eXtremal
|
|
November 10, 2016, 07:15:07 PM |
|
I have another idea for optimize equihash round kernel, results will be in next 12-24h You must have o/c'd. I don't believe 16.30 is 37% faster than 16.40. Only memory, 1100/2160 and low DRAM timings preset (modded ROM). Excellent! We believe in you! Got only 2%, because need optimize another place - function ht_store at kernel. This one row in code: 124 cnt = atomic_inc((__global uint *)p);
Takes a half of all iteration time!
|
|
|
|
toptek
Legendary
Offline
Activity: 1274
Merit: 1000
|
|
November 10, 2016, 07:26:48 PM |
|
I hope there is windows version any way this can and will run faster then claymore over time no comment why we need windows , we do .
|
|
|
|
Genoil
|
|
November 10, 2016, 07:35:04 PM |
|
WHOA! Thanks man EDIT: I see there is no binary there yet. You got me all excited. There is a binary solver and a python script. That's what SA is.
|
ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
|
|
|
nerdralph
|
|
November 10, 2016, 07:48:16 PM |
|
I have another idea for optimize equihash round kernel, results will be in next 12-24h You must have o/c'd. I don't believe 16.30 is 37% faster than 16.40. Only memory, 1100/2160 and low DRAM timings preset (modded ROM). Excellent! We believe in you! Got only 2%, because need optimize another place - function ht_store at kernel. This one row in code: 124 cnt = atomic_inc((__global uint *)p);
Takes a half of all iteration time! Good work, but you're a little behind. Here's part of an email I sent to JW and Marc 4 days ago: "I think the atomic_inc in ht_store is a bottleneck. As you probably already know, incrementing it non-atomically (even if it is a volatile) fails to maintain data consistency between the threads. On fglrx the atomic_inc compiles to flat_atomic_inc and s_waitcnt: flat_atomic_inc v24, v[24:25], v26 glc // 000000000270: DD2D0000 18001A18 s_waitcnt vmcnt(0) & lgkmcnt(0) // 000000000278: BF8C0070"
|
|
|
|
laik2
|
|
November 10, 2016, 07:54:34 PM |
|
WHOA! Thanks man EDIT: I see there is no binary there yet. You got me all excited. There is a binary solver and a python script. That's what SA is. I think there should be Roadmap for app enhancement and better usability for both Linux,Windows and even OSX + embedded hw. Unified C/C++ app with daemonize,syslog and api cli support for command/monitor. Simplified OpenCL switch between AMD/Intel/nVidia with hwmon support.
|
|
|
|
|
eXtremal
|
|
November 10, 2016, 08:12:01 PM |
|
nerdralph Did you try to use local memory for atomic increment (store all data to global memory and walk through data in seperate kernel) ?
|
|
|
|
panv
Newbie
Offline
Activity: 5
Merit: 0
|
|
November 10, 2016, 08:19:25 PM |
|
How do I run this windows version on windows?
|
|
|
|
nerdralph
|
|
November 10, 2016, 08:36:06 PM |
|
nerdralph Did you try to use local memory for atomic increment (store all data to global memory and walk through data in seperate kernel) ?
Each compute unit has 64KB of LDS, so a Rx 470 with 32 CUs has 2MB of LDS. 1 million (2^20) 32-bit counters needs 4MB. atomic_inc works only with ints, so even if the counters are packed into 8 bits each so they'll all fit in LDS, there doesn't seem to be a way in opencl to atomically increment them.
|
|
|
|
nerdralph
|
|
November 10, 2016, 08:48:25 PM |
|
I'm seeing a 7-9% speed improvement between 2 GPUs. One is in a 16x slot and the other on a 1x riser. For the card on the 1x riser the speed improvement is ~10%, and for the card in the 16x slot ~5%.
|
|
|
|
mrb (OP)
Legendary
Offline
Activity: 1512
Merit: 1027
|
|
November 10, 2016, 08:51:59 PM |
|
new vesrion with 45 sols per gtx 1070 can be runned on windows(10 x64) ?
+1 Is it so hard to add windows support ? Sorry I am still not, at the moment, working on Windows. But optimizations and various improvements.
|
|
|
|
mrb (OP)
Legendary
Offline
Activity: 1512
Merit: 1027
|
|
November 10, 2016, 08:55:06 PM |
|
I'm seeing a 7-9% speed improvement between 2 GPUs. One is in a 16x slot and the other on a 1x riser. For the card on the 1x riser the speed improvement is ~10%, and for the card in the 16x slot ~5%.
Nice, thanks for confirming.
|
|
|
|
eXtremal
|
|
November 10, 2016, 09:03:12 PM |
|
nerdralph Did you try to use local memory for atomic increment (store all data to global memory and walk through data in seperate kernel) ?
Each compute unit has 64KB of LDS, so a Rx 470 with 32 CUs has 2MB of LDS. 1 million (2^20) 32-bit counters needs 4MB. atomic_inc works only with ints, so even if the counters are packed into 8 bits each so they'll all fit in LDS, there doesn't seem to be a way in opencl to atomically increment them. See pm.
|
|
|
|
hypercrypto
Newbie
Offline
Activity: 19
Merit: 0
|
|
November 10, 2016, 09:28:15 PM |
|
I just tried this update and i can confirm that cpu usage is near zero also i can see 2~5% more speed improvement on my 8 rigs of rx 470 .. good work
|
|
|
|
scavern
|
|
November 10, 2016, 09:33:00 PM |
|
How close are we to Claymore speeds? I just can't get myself to move my rigs to Windows...
|
|
|
|
jstefanop
Legendary
Offline
Activity: 2098
Merit: 1397
|
|
November 10, 2016, 09:47:17 PM |
|
new vesrion with 45 sols per gtx 1070 can be runned on windows(10 x64) ?
+1 Is it so hard to add windows support ? Sorry I am still not, at the moment, working on Windows. But optimizations and various improvements. Yea, not exactly sure why windows users are coming on here and demanding a windows version. Linux users got screwed over by this whole shit, and I have rigs that can't be booted under windows and require linux. Be happy with your claymour 100h/s miner. @mbr id rather you get close to clamour performance, charge a devfee to get that done, and then worry about windows.
|
|
|
|
krnlx
|
|
November 10, 2016, 09:59:13 PM |
|
Well except Nvidia because their OpenCL implementation implements busy waits, but I'll check in a workaround soon.
Low cpu usage on celeron and 6 1070 cards now https://bitcointalk.org/index.php?topic=1666489.msg16818120#msg16818120But it is more correct to preload library from python @asyncio.coroutine def start_solvers(self, devid): verbose('Solver %s: launching' % devid) os.environ["LD_PRELOAD"]="./libtime.so" # execute "sa-solver --mining --use <id>" create = asyncio.create_subprocess_exec( self.solver_binary, '--mining', '--use', devid.split('.')[0], stdin=asyncio.subprocess.PIPE, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.STDOUT)
|
|
|
|
|