Bitcoin Forum
May 02, 2024, 08:39:43 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 [69] 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 »
  Print  
Author Topic: SILENTARMY v5: Zcash miner, 115 sol/s on R9 Nano, 70 sol/s on GTX 1070  (Read 209263 times)
ioglnx
Sr. Member
****
Offline Offline

Activity: 574
Merit: 250

Fighting mob law and inquisition in this forum


View Profile
November 22, 2016, 07:05:58 PM
 #1361

Look at my update :-D
The increase is there hitting 500 on 4 Cards..2GTX1080 and 2 GTX1070 @601-619Watts before with all miners around 450Sol/s @ 750Watts.

It's instances=2 which is working ..threads is not.. :-D

Code:
otal 504.4 sol/s [dev0 117.7, dev1 132.8, dev2 124.3, dev3 134.0] 90 shares
Total 504.3 sol/s [dev0 117.1, dev1 133.1, dev2 124.1, dev3 135.2] 90 shares
Total 504.5 sol/s [dev0 117.1, dev1 133.4, dev2 124.6, dev3 135.5] 90 shares
Total 504.3 sol/s [dev0 117.8, dev1 132.7, dev2 125.1, dev3 134.2] 92 shares
Total 504.2 sol/s [dev0 118.7, dev1 132.4, dev2 124.4, dev3 134.6] 93 shares
Total 504.3 sol/s [dev0 119.5, dev1 131.8, dev2 125.7, dev3 134.7] 93 shares
Total 504.2 sol/s [dev0 119.7, dev1 131.6, dev2 126.3, dev3 134.0] 93 shares
Total 504.4 sol/s [dev0 119.2, dev1 131.7, dev2 126.7, dev3 133.3] 93 shares
Total 504.6 sol/s [dev0 119.4, dev1 131.6, dev2 127.0, dev3 133.1] 93 shares
Total 504.6 sol/s [dev0 120.5, dev1 131.6, dev2 127.2, dev3 133.6] 93 shares
Total 504.6 sol/s [dev0 120.9, dev1 130.5, dev2 126.0, dev3 132.6] 93 shares
Total 505.0 sol/s [dev0 120.8, dev1 130.3, dev2 126.0, dev3 133.0] 93 shares
Total 505.0 sol/s [dev0 121.1, dev1 130.6, dev2 125.5, dev3 132.7] 93 shares
Total 505.0 sol/s [dev0 120.2, dev1 130.2, dev2 125.6, dev3 133.8] 94 shares
Total 505.1 sol/s [dev0 120.3, dev1 130.1, dev2 125.6, dev3 133.9] 94 shares
Total 504.9 sol/s [dev0 119.7, dev1 131.6, dev2 124.9, dev3 134.4] 94 shares
Total 504.9 sol/s [dev0 119.6, dev1 132.0, dev2 123.9, dev3 133.8] 94 shares
Total 504.8 sol/s [dev0 119.0, dev1 131.5, dev2 123.8, dev3 132.9] 94 shares
Total 505.0 sol/s [dev0 119.9, dev1 132.0, dev2 124.8, dev3 132.2] 94 shares
Total 505.1 sol/s [dev0 119.3, dev1 132.6, dev2 125.3, dev3 132.8] 94 shares
Total 505.1 sol/s [dev0 118.2, dev1 132.6, dev2 125.9, dev3 130.7] 94 shares
Total 505.1 sol/s [dev0 118.8, dev1 132.6, dev2 125.3, dev3 130.6] 95 shares

GTX 1080Ti rocks da house... seriously... this card is a beast³
Owning by now 18x GTX1080Ti :-D @serious love of efficiency
1714639183
Hero Member
*
Offline Offline

Posts: 1714639183

View Profile Personal Message (Offline)

Ignore
1714639183
Reply with quote  #2

1714639183
Report to moderator
If you see garbage posts (off-topic, trolling, spam, no point, etc.), use the "report to moderator" links. All reports are investigated, though you will rarely be contacted about your reports.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714639183
Hero Member
*
Offline Offline

Posts: 1714639183

View Profile Personal Message (Offline)

Ignore
1714639183
Reply with quote  #2

1714639183
Report to moderator
1714639183
Hero Member
*
Offline Offline

Posts: 1714639183

View Profile Personal Message (Offline)

Ignore
1714639183
Reply with quote  #2

1714639183
Report to moderator
TIKCrazy
Member
**
Offline Offline

Activity: 73
Merit: 10


View Profile
November 22, 2016, 07:12:53 PM
 #1362

testing 2 is working with 104 sol\s on 1060
zawawa
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
November 22, 2016, 07:13:36 PM
 #1363

Look at my update :-D
The increase is there hitting 500 on 4 Cards..2GTX1080 and 2 GTX1070 @601-619Watts before with all miners around 450Sol/s @ 750Watts.

It's instances=2 which is working ..threads is not.. :-D

VERY interesting... It's the other way around with AMD's drivers for Windows.
Gotta love those crazy OpenCL implementations...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
November 22, 2016, 07:14:40 PM
 #1364

testing2 is a keeper, then. Very well.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
November 22, 2016, 07:23:07 PM
 #1365

I just added Windows binaries with krnlx's optimized kernel for NVIDIA cards. Thank you, krnlx!

https://github.com/zawawawa/silentarmy/releases/tag/v5-win64standalone-r12

Please note that the new NVIDIA version is not compatible with GPU's from other vendors.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
ioglnx
Sr. Member
****
Offline Offline

Activity: 574
Merit: 250

Fighting mob law and inquisition in this forum


View Profile
November 22, 2016, 07:26:46 PM
 #1366

Lol do you need to mention not other vendors :-D just say NV only is much shorter :-P
There is basically just AMD left..S3 gone, VIA gone. 3Dfx long gone..SGi gone :-D so..

Since 5min poolside reports 601.30 Sol/s

GTX 1080Ti rocks da house... seriously... this card is a beast³
Owning by now 18x GTX1080Ti :-D @serious love of efficiency
zawawa
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
November 22, 2016, 07:32:43 PM
 #1367

You would be surprised to know that some people asked me in the past if they could use Intel HD Graphics for GPGPU...
You are mostly right about "other vendors," though. I wish these companies were still around.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
ioglnx
Sr. Member
****
Offline Offline

Activity: 574
Merit: 250

Fighting mob law and inquisition in this forum


View Profile
November 22, 2016, 07:36:24 PM
 #1368

Many of us wish that dream..but at least some 3Dfx tech made another revival in Maxwell GPUs and also in Pascal there are some additions again from 3Dfx :-D
Maybe nvidia started to sort the ideas board / paper bags of 3Dfx offices.

GTX 1080Ti rocks da house... seriously... this card is a beast³
Owning by now 18x GTX1080Ti :-D @serious love of efficiency
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
November 22, 2016, 07:38:07 PM
 #1369

Many of us wish that dream..but at least some 3Dfx tech made another revival in Maxwell GPUs and also in Pascal there are some additions again from 3Dfx :-D
Maybe nvidia started to sort the ideas board / paper bags of 3Dfx offices.
I think someone said that on nvidia one must use 1 instance, 2 won't work(or will be the same).
@zawawa - tell 'em that they can Smiley but its not worthy at all...don't have the capacity of nv/amd.

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
Amph
Legendary
*
Offline Offline

Activity: 3206
Merit: 1069



View Profile
November 22, 2016, 07:38:51 PM
 #1370

that is a good boost over the sp one, getting 120 sol per gpu(1070), with my -502 mem setting and zero core, in my cose ocing it give very small boost over underclocking not worth it
ioglnx
Sr. Member
****
Offline Offline

Activity: 574
Merit: 250

Fighting mob law and inquisition in this forum


View Profile
November 22, 2016, 07:38:56 PM
 #1371

Its weeks ago before all these optimizations have been taken place.
For me its working and give me 10sols more :-D

@AMPH: Did you noticed less power consumption too?

GTX 1080Ti rocks da house... seriously... this card is a beast³
Owning by now 18x GTX1080Ti :-D @serious love of efficiency
zawawa
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
November 22, 2016, 07:39:26 PM
 #1372

Oh, I forgot to mention that my Windows port always shows a 5 min average for total hashrate.
You have to wait a little, but you get a more accurate number that way.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
November 22, 2016, 07:41:19 PM
 #1373

Oh, I forgot to mention that my Windows port always shows a 5 min average for total hashrate.
You have to wait a little, but you get a more accurate number that way.
I wish I could do 130S/s with my AMDs.

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
Amph
Legendary
*
Offline Offline

Activity: 3206
Merit: 1069



View Profile
November 22, 2016, 07:41:43 PM
 #1374

Its weeks ago before all these optimizations have been taken place.
For me its working and give me 10sols more :-D

@AMPH: Did you noticed less power consumption too?

yeah only 650 watt or around that, but still without 200 sol per gpu is not competitive enough against amd...sadly
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
November 22, 2016, 07:47:51 PM
 #1375

Its weeks ago before all these optimizations have been taken place.
For me its working and give me 10sols more :-D

@AMPH: Did you noticed less power consumption too?

yeah only 650 watt or around that, but still without 200 sol per gpu is not competitive enough against amd...sadly
200 with claymore and the price is much high electricity bill... I can't do more than 110 with my RX480s but wattage is only 450(4 cards)

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
zawawa
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
November 22, 2016, 07:54:13 PM
 #1376

Oh, I forgot to mention that my Windows port always shows a 5 min average for total hashrate.
You have to wait a little, but you get a more accurate number that way.
I wish I could do 130S/s with my AMDs.

I'm pretty sure we will get there. It is just that there is no "easy" optimizations left for AMD cards because they were the first targets of this miner. The next optimization requires a massive rewrite, bit it can be done, methinks.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
November 22, 2016, 08:18:50 PM
 #1377

Oh, I forgot to mention that my Windows port always shows a 5 min average for total hashrate.
You have to wait a little, but you get a more accurate number that way.
I wish I could do 130S/s with my AMDs.

Here's how to get a lot more than 130:
https://bitcointalk.org/index.php?topic=1679855.msg16957668#msg16957668
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
November 22, 2016, 08:34:02 PM
Last edit: November 22, 2016, 08:48:01 PM by laik2
 #1378

Oh, I forgot to mention that my Windows port always shows a 5 min average for total hashrate.
You have to wait a little, but you get a more accurate number that way.
I wish I could do 130S/s with my AMDs.

Here's how to get a lot more than 130:
https://bitcointalk.org/index.php?topic=1679855.msg16957668#msg16957668

This is basicly what I could understand from your writings.
wrong paste Smiley
Quote
Using LDS or L1 Cache

There are a number of considerations when deciding between LDS and L1 cache for a given algorithm.

LDS supports read/modify/write operations, as well as atomics. It is well-suited for code that requires fast read/write, read/modify/write, or scatter operations that otherwise are directed to global memory. On current AMD hardware, L1 is part of the read path; hence, it is suited to cache-read-sensitive algorithms, such as matrix multiplication or convolution.

LDS is typically larger than L1 (for example: 64 kB vs 16 kB on Southern Islands devices). If it is not possible to obtain a high L1 cache hit rate for an algorithm, the larger LDS size can help. On the AMD Radeon  HD 7970 device, the theoretical LDS peak bandwidth is 3.8 TB/s, compared to L1 at 1.9 TB/sec.

The native data type for L1 is a four-vector of 32-bit words. On L1, fill and read addressing are linked. It is important that L1 is initially filled from global memory with a coalesced access pattern; once filled, random accesses come at no extra processing cost.

Currently, the native format of LDS is a 32-bit word. The theoretical LDS peak bandwidth is achieved when each thread operates on a two-vector of 32-bit words (16 threads per clock operate on 32 banks). If an algorithm requires coalesced 32-bit quantities, it maps well to LDS. The use of four-vectors or larger can lead to bank conflicts, although the compiler can mitigate some of these.

From an application point of view, filling LDS from global memory, and reading from it, are independent operations that can use independent addressing. Thus, LDS can be used to explicitly convert a scattered access pattern to a coalesced pattern for read and write to global memory. Or, by taking advantage of the LDS read broadcast feature, LDS can be filled with a coalesced pattern from global memory, followed by all threads iterating through the same LDS words simultaneously.

LDS reuses the data already pulled into cache by other wavefronts. Sharing across work-groups is not possible because OpenCL does not guarantee that LDS is in a particular state at the beginning of work-group execution. L1 content, on the other hand, is independent of work-group execution, so that successive work-groups can share the content in the L1 cache of a given Vector ALU. However, it currently is not possible to explicitly control L1 sharing across work-groups.

The use of LDS is linked to GPR usage and wavefront-per-Vector ALU count. Better sharing efficiency requires a larger work-group, so that more work-items share the same LDS. Compiling kernels for larger work-groups typically results in increased register use, so that fewer wavefronts can be scheduled simultaneously per Vector ALU. This, in turn, reduces memory latency hiding. Requesting larger amounts of LDS per work-group results in fewer wavefronts per Vector ALU, with the same effect.

LDS typically involves the use of barriers, with a potential performance impact. This is true even for read-only use cases, as LDS must be explicitly filled in from global memory (after which a barrier is required before reads can commence).

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
zawawa
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
November 22, 2016, 08:48:20 PM
 #1379

Oh, I forgot to mention that my Windows port always shows a 5 min average for total hashrate.
You have to wait a little, but you get a more accurate number that way.
I wish I could do 130S/s with my AMDs.

Here's how to get a lot more than 130:
https://bitcointalk.org/index.php?topic=1679855.msg16957668#msg16957668


Marvelous! An excellent analysis! I will take up the challenge with the GCN assembly.
This is so much fun!!

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
antantti
Legendary
*
Offline Offline

Activity: 1176
Merit: 1015


View Profile
November 22, 2016, 08:54:45 PM
 #1380

4x970 with zawawawa-r12-nv doing ~400 S/s, w7 and some oc. Power consumption down a bit from sp_ version.

r6 was hashing ~340 and sp_1 ~380 with same clocks.

About competition against amd in equihash, I am afraid that we haven't seen nothing yet from high end older amd cards.
Pages: « 1 ... 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 [69] 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!