Bitcoin Forum
May 17, 2024, 01:26:31 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [18] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 ... 91 »
  Print  
Author Topic: SILENTARMY v5: Zcash miner, 115 sol/s on R9 Nano, 70 sol/s on GTX 1070  (Read 209263 times)
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
November 10, 2016, 04:53:03 PM
 #341

I think you should keep improving speed rather than support windows for now,  because they have claymore miner already with decent speeds

while as myself on linux we stuck with the half of their speed on windows with claymore miner i just got around 45~49 per card for modded rx 470....

I hope to consider this in the roadmap.... 

Just booted windows disk and tested newer Claymore 4.0
//
GPU #0: Ellesmere, 8192 MB available, 36 compute units
GPU #1: Ellesmere, 8192 MB available, 36 compute units
ZEC - Total Speed: 135.354 H/s, Total Shares: 151, Rejected: 1, Time: 00:05
ZEC: GPU0 68.841 H/s, GPU1 66.452 H/s
Pool switches: ZEC - 0
Current ZEC pool share target: 0x0025d4c3 (diff: 1732H)
GPU1 t=49C fan=62%, GPU2 t=60C fan=65%
//
This is with intensity set to 2 and my quad amd APU A8-7600 is 50% loaded. Using -i 0 I get almost equal results with silentarmy's miner. Until someone finds different approach for gaining more Sol/s results will not be much different...I would love to help but me myself not coder at all ... I can otherwise help with packing and better newbies support. I also don't think that windows should be considered priority...if you have 1/2 rigs...OK but headless rigs are on the other hand preferred and easily monitored.

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
Dr_Victor
Sr. Member
****
Offline Offline

Activity: 954
Merit: 250


View Profile
November 10, 2016, 05:56:08 PM
 #342

When can we get a windows version?

yobit.net is banned from signatures
osnwt
Sr. Member
****
Offline Offline

Activity: 353
Merit: 251


View Profile
November 10, 2016, 06:11:29 PM
 #343

Just booted windows disk and tested newer Claymore 4.0
GPU #0: Ellesmere, 8192 MB available, 36 compute units
GPU #1: Ellesmere, 8192 MB available, 36 compute units
ZEC - Total Speed: 135.354 H/s, Total Shares: 151, Rejected: 1, Time: 00:05
ZEC: GPU0 68.841 H/s, GPU1 66.452 H/s
Pool switches: ZEC - 0
Current ZEC pool share target: 0x0025d4c3 (diff: 1732H)
GPU1 t=49C fan=62%, GPU2 t=60C fan=65%
//
This is with intensity set to 2 and my quad amd APU A8-7600 is 50% loaded. Using -i 0 I get almost equal results with silentarmy's miner.

CZM 4.0: PowerColor 390X Devil (hybrid cooling) OC'ed: 100 Sol/s for primary card. 93-98 for others (and I don't know why - all risers are x1, Gen.2 PCI-e mode, CPU load with -i 0 is low, using 1-2 slowdowns with my weak G3240 CPU.



Agreed we need a Linux miner first, Windows users are happy with CZM.

PS. Host is Windows, just forgot to update the label in my web monitor config.
eXtremal
Sr. Member
****
Offline Offline

Activity: 2106
Merit: 282


👉bit.ly/3QXp3oh | 🔥 Ultimate Launc


View Profile WWW
November 10, 2016, 07:15:07 PM
 #344

I have another idea for optimize equihash round kernel, results will be in next 12-24h

Quote
You must have o/c'd. I don't believe 16.30 is 37% faster than 16.40.
Only memory, 1100/2160 and low DRAM timings preset (modded ROM).
Excellent! We believe in you! Smiley
Got only 2%, because need optimize another place - function ht_store at kernel. This one row in code:
Quote
124             cnt = atomic_inc((__global uint *)p);
Takes a half of all iteration time! Smiley

TONUP██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
▄▄███████▄▄
▄▄███████████████▄▄
▄███████████████████▄
▄█████▄░▄▄▀█████▀▄████▄
▄███████▄▀█▄▀██▀▄███████▄
█████████▄▀█▄▀▄██████████
██████████▄▀█▄▀██████████
██████████▀▄▀█▄▀█████████
▀███████▀▄██▄▀█▄▀███████▀
▀████▀▄█████▄▀▀░▀█████▀
▀███████████████████▀
▀▀███████████████▀▀
▀▀███████▀▀
▄▄▄███████▄▄▄
▄▄███████████████▄▄
▄███████████████████▄
▄██████████████▀▀█████▄
▄██████████▀▀█████▐████▄
██████▀▀████▄▄▀▀█████████
████▄▄███▄██▀█████▐██████
█████████▀██████████████
▀███████▌▐██████▐██████▀
▀███████▄▄███▄████████▀
▀███████████████████▀
▀▀███████████████▀▀
▀▀▀███████▀▀▀
▄▄▄███████▄▄▄
▄▄███████████████▄▄
▄███████████████████▄
▄█████████████████████▄
▄████▀▀███▀▀███▀▀██▀███▄
████▀███████▀█▀███▀█████
██████████████████████
████▄███████▄█▄███▄█████
▀████▄▄███▄▄███▄▄██▄███▀
▀█████████████████████▀
▀███████████████████▀
▀▀███████████████▀▀
▀▀▀███████▀▀▀
████████
██
██
██
██
██
██
██
██
██
██
██
████████
████████████████████████████████████████████████████████████████████████████████
.
JOIN NOW
.
████████████████████████████████████████████████████████████████████████████████
████████
██
██
██
██
██
██
██
██
██
██
██
████████
toptek
Legendary
*
Offline Offline

Activity: 1274
Merit: 1000


View Profile
November 10, 2016, 07:26:48 PM
 #345

I hope there is windows version any way this can and will run faster then claymore over time no comment why we need windows , we do .

For security, your account has been locked. Email acctcomp15@theymos.e4ward.com
Genoil
Sr. Member
****
Offline Offline

Activity: 438
Merit: 250


View Profile
November 10, 2016, 07:35:04 PM
 #346


WHOA!

Thanks man

EDIT: I see there is no binary there yet. You got me all excited.

There is a binary solver and a python script. That's what SA is.

ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d
BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
November 10, 2016, 07:48:16 PM
 #347

I have another idea for optimize equihash round kernel, results will be in next 12-24h

Quote
You must have o/c'd. I don't believe 16.30 is 37% faster than 16.40.
Only memory, 1100/2160 and low DRAM timings preset (modded ROM).
Excellent! We believe in you! Smiley
Got only 2%, because need optimize another place - function ht_store at kernel. This one row in code:
Quote
124             cnt = atomic_inc((__global uint *)p);
Takes a half of all iteration time! Smiley

Good work, but you're a little behind.  Here's part of an email I sent to JW and Marc 4 days ago:
"I think the atomic_inc in ht_store is a bottleneck.  As you probably already know, incrementing it non-atomically (even if it is a volatile) fails to maintain data consistency between the threads.  On fglrx the atomic_inc compiles to flat_atomic_inc and s_waitcnt:
  flat_atomic_inc  v24, v[24:25], v26 glc               // 000000000270: DD2D0000 18001A18
  s_waitcnt     vmcnt(0) & lgkmcnt(0)                   // 000000000278: BF8C0070"
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
November 10, 2016, 07:54:34 PM
 #348


WHOA!

Thanks man

EDIT: I see there is no binary there yet. You got me all excited.

There is a binary solver and a python script. That's what SA is.

I think there should be Roadmap for app enhancement and better usability for both Linux,Windows and even OSX + embedded hw.
Unified C/C++ app with daemonize,syslog and api cli support for command/monitor. Simplified OpenCL switch between AMD/Intel/nVidia with hwmon support.

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
mrb (OP)
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
November 10, 2016, 08:03:29 PM
 #349

Dramatic CPU usage savings and PCIe bandwidth savings are now committed, thanks to on-device filtering of invalid solutions: https://github.com/mbevand/silentarmy/commit/146b8dc0b6618852e2f322fab51f3ed3739da07a

PCIe bandwidth usage dropped from ~100 MB/s to 500kB/s per GPU! This should really help those with PCIe ×1 risers. MAX_SOLS is now reduced from 2000 to 10 Smiley CPU usage should also now be close to zero. (Well except Nvidia because their OpenCL implementation implements busy waits, but I'll check in a workaround soon.)

As always, check the changelog which I always update in real-time: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md
eXtremal
Sr. Member
****
Offline Offline

Activity: 2106
Merit: 282


👉bit.ly/3QXp3oh | 🔥 Ultimate Launc


View Profile WWW
November 10, 2016, 08:12:01 PM
 #350

nerdralph
Did you try to use local memory for atomic increment (store all data to global memory and walk through data in seperate kernel) ?

TONUP██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
▄▄███████▄▄
▄▄███████████████▄▄
▄███████████████████▄
▄█████▄░▄▄▀█████▀▄████▄
▄███████▄▀█▄▀██▀▄███████▄
█████████▄▀█▄▀▄██████████
██████████▄▀█▄▀██████████
██████████▀▄▀█▄▀█████████
▀███████▀▄██▄▀█▄▀███████▀
▀████▀▄█████▄▀▀░▀█████▀
▀███████████████████▀
▀▀███████████████▀▀
▀▀███████▀▀
▄▄▄███████▄▄▄
▄▄███████████████▄▄
▄███████████████████▄
▄██████████████▀▀█████▄
▄██████████▀▀█████▐████▄
██████▀▀████▄▄▀▀█████████
████▄▄███▄██▀█████▐██████
█████████▀██████████████
▀███████▌▐██████▐██████▀
▀███████▄▄███▄████████▀
▀███████████████████▀
▀▀███████████████▀▀
▀▀▀███████▀▀▀
▄▄▄███████▄▄▄
▄▄███████████████▄▄
▄███████████████████▄
▄█████████████████████▄
▄████▀▀███▀▀███▀▀██▀███▄
████▀███████▀█▀███▀█████
██████████████████████
████▄███████▄█▄███▄█████
▀████▄▄███▄▄███▄▄██▄███▀
▀█████████████████████▀
▀███████████████████▀
▀▀███████████████▀▀
▀▀▀███████▀▀▀
████████
██
██
██
██
██
██
██
██
██
██
██
████████
████████████████████████████████████████████████████████████████████████████████
.
JOIN NOW
.
████████████████████████████████████████████████████████████████████████████████
████████
██
██
██
██
██
██
██
██
██
██
██
████████
panv
Newbie
*
Offline Offline

Activity: 5
Merit: 0


View Profile
November 10, 2016, 08:19:25 PM
 #351

How do I run this windows version on windows?
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
November 10, 2016, 08:36:06 PM
 #352

nerdralph
Did you try to use local memory for atomic increment (store all data to global memory and walk through data in seperate kernel) ?

Each compute unit has 64KB of LDS, so a Rx 470 with 32 CUs has 2MB of LDS.  1 million (2^20) 32-bit counters needs 4MB.  atomic_inc works only with ints, so even if the counters are packed into 8 bits each so they'll all fit in LDS, there doesn't seem to be a way in opencl to atomically increment them.
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
November 10, 2016, 08:48:25 PM
 #353

Dramatic CPU usage savings and PCIe bandwidth savings are now committed, thanks to on-device filtering of invalid solutions: https://github.com/mbevand/silentarmy/commit/146b8dc0b6618852e2f322fab51f3ed3739da07a

PCIe bandwidth usage dropped from ~100 MB/s to 500kB/s per GPU! This should really help those with PCIe ×1 risers. MAX_SOLS is now reduced from 2000 to 10 Smiley CPU usage should also now be close to zero. (Well except Nvidia because their OpenCL implementation implements busy waits, but I'll check in a workaround soon.)

As always, check the changelog which I always update in real-time: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md

I'm seeing a 7-9% speed improvement between 2 GPUs.  One is in a 16x slot and the other on a 1x riser.  For the card on the 1x riser the speed improvement is ~10%, and for the card in the 16x slot ~5%.
mrb (OP)
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
November 10, 2016, 08:51:59 PM
 #354

new vesrion with 45 sols per gtx 1070 can be runned on windows(10 x64) ? 

+1

Is it so hard to add windows support ?  Cry

Sorry I am still not, at the moment, working on Windows. But optimizations and various improvements.
mrb (OP)
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
November 10, 2016, 08:55:06 PM
 #355

I'm seeing a 7-9% speed improvement between 2 GPUs.  One is in a 16x slot and the other on a 1x riser.  For the card on the 1x riser the speed improvement is ~10%, and for the card in the 16x slot ~5%.

Nice, thanks for confirming.
eXtremal
Sr. Member
****
Offline Offline

Activity: 2106
Merit: 282


👉bit.ly/3QXp3oh | 🔥 Ultimate Launc


View Profile WWW
November 10, 2016, 09:03:12 PM
 #356

nerdralph
Did you try to use local memory for atomic increment (store all data to global memory and walk through data in seperate kernel) ?

Each compute unit has 64KB of LDS, so a Rx 470 with 32 CUs has 2MB of LDS.  1 million (2^20) 32-bit counters needs 4MB.  atomic_inc works only with ints, so even if the counters are packed into 8 bits each so they'll all fit in LDS, there doesn't seem to be a way in opencl to atomically increment them.

See pm.

TONUP██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
▄▄███████▄▄
▄▄███████████████▄▄
▄███████████████████▄
▄█████▄░▄▄▀█████▀▄████▄
▄███████▄▀█▄▀██▀▄███████▄
█████████▄▀█▄▀▄██████████
██████████▄▀█▄▀██████████
██████████▀▄▀█▄▀█████████
▀███████▀▄██▄▀█▄▀███████▀
▀████▀▄█████▄▀▀░▀█████▀
▀███████████████████▀
▀▀███████████████▀▀
▀▀███████▀▀
▄▄▄███████▄▄▄
▄▄███████████████▄▄
▄███████████████████▄
▄██████████████▀▀█████▄
▄██████████▀▀█████▐████▄
██████▀▀████▄▄▀▀█████████
████▄▄███▄██▀█████▐██████
█████████▀██████████████
▀███████▌▐██████▐██████▀
▀███████▄▄███▄████████▀
▀███████████████████▀
▀▀███████████████▀▀
▀▀▀███████▀▀▀
▄▄▄███████▄▄▄
▄▄███████████████▄▄
▄███████████████████▄
▄█████████████████████▄
▄████▀▀███▀▀███▀▀██▀███▄
████▀███████▀█▀███▀█████
██████████████████████
████▄███████▄█▄███▄█████
▀████▄▄███▄▄███▄▄██▄███▀
▀█████████████████████▀
▀███████████████████▀
▀▀███████████████▀▀
▀▀▀███████▀▀▀
████████
██
██
██
██
██
██
██
██
██
██
██
████████
████████████████████████████████████████████████████████████████████████████████
.
JOIN NOW
.
████████████████████████████████████████████████████████████████████████████████
████████
██
██
██
██
██
██
██
██
██
██
██
████████
hypercrypto
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile
November 10, 2016, 09:28:15 PM
 #357

Dramatic CPU usage savings and PCIe bandwidth savings are now committed, thanks to on-device filtering of invalid solutions: https://github.com/mbevand/silentarmy/commit/146b8dc0b6618852e2f322fab51f3ed3739da07a

PCIe bandwidth usage dropped from ~100 MB/s to 500kB/s per GPU! This should really help those with PCIe ×1 risers. MAX_SOLS is now reduced from 2000 to 10 Smiley CPU usage should also now be close to zero. (Well except Nvidia because their OpenCL implementation implements busy waits, but I'll check in a workaround soon.)

As always, check the changelog which I always update in real-time: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md

I just tried this update and i can confirm that  cpu usage is near zero  Smiley

 also i can see 2~5% more speed improvement on my 8 rigs of rx 470 .. good work
scavern
Full Member
***
Offline Offline

Activity: 120
Merit: 100



View Profile
November 10, 2016, 09:33:00 PM
 #358

How close are we to Claymore speeds? I just can't get myself to move my rigs to Windows...

DISCIPLINA — The First Blockchain For HR & Education
From core developers of Cardano, PoS minting, unique Web Of Trust & Privacy algorithms. Be the first, join us!
  WEBSITE  TELEGRAM  ANN  BOUNTY  LINKEDIN  WHITEPAPER  Referral Program 5%
jstefanop
Legendary
*
Offline Offline

Activity: 2098
Merit: 1397


View Profile
November 10, 2016, 09:47:17 PM
 #359

new vesrion with 45 sols per gtx 1070 can be runned on windows(10 x64) ? 

+1

Is it so hard to add windows support ?  Cry

Sorry I am still not, at the moment, working on Windows. But optimizations and various improvements.

Yea, not exactly sure why windows users are coming on here and demanding a windows version. Linux users got screwed over by this whole shit, and I have rigs that can't be booted under windows and require linux. Be happy with your claymour 100h/s miner.

@mbr id rather you get close to clamour performance, charge a devfee to get that done, and then worry about windows.

Project Apollo: A Pod Miner Designed for the Home https://bitcointalk.org/index.php?topic=4974036
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
krnlx
Full Member
***
Offline Offline

Activity: 243
Merit: 105


View Profile
November 10, 2016, 09:59:13 PM
 #360

Well except Nvidia because their OpenCL implementation implements busy waits, but I'll check in a workaround soon.

Low cpu usage on celeron and 6 1070 cards now

https://bitcointalk.org/index.php?topic=1666489.msg16818120#msg16818120

But it is more correct to preload library from python

Code:
@asyncio.coroutine
    def start_solvers(self, devid):
        verbose('Solver %s: launching' % devid)
        os.environ["LD_PRELOAD"]="./libtime.so"
        # execute "sa-solver --mining --use <id>"
        create = asyncio.create_subprocess_exec(
                self.solver_binary, '--mining', '--use', devid.split('.')[0],
                stdin=asyncio.subprocess.PIPE, stdout=asyncio.subprocess.PIPE,
                stderr=asyncio.subprocess.STDOUT)
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [18] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 ... 91 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!