Bitcoin Forum
May 05, 2024, 11:55:00 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Poll
Question: Do you want to see improvements in Ethash dual-mining with GGS?
I desperately need it. - 8 (15.1%)
It would be nice. - 12 (22.6%)
It's not worth it anymore. - 33 (62.3%)
Total Voters: 53

Pages: « 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 ... 197 »
  Print  
Author Topic: Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480!  (Read 214342 times)
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
December 17, 2016, 01:57:58 AM
 #61

Not so fast  Wink Easy, easy...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
1714953300
Hero Member
*
Offline Offline

Posts: 1714953300

View Profile Personal Message (Offline)

Ignore
1714953300
Reply with quote  #2

1714953300
Report to moderator
"Governments are good at cutting off the heads of a centrally controlled networks like Napster, but pure P2P networks like Gnutella and Tor seem to be holding their own." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
December 17, 2016, 02:19:29 PM
 #62

Yeah, that would be great. I just pushed an improved version of parallel writes.
It is much faster now, but it's still slower than the single thread version.

If you have access to a Tonga or Hawaii card I'd suggest testing with one of those as well.  Ellesmere's sequential copy performance is much worse than Tonga and Hawaii in testing with my cl-mem utility.
https://github.com/nerdralph/cl-mem

Some of your changes could be faster on other GPUs even if they are slower on your Rx 480.  The slow copy speed on Ellesmere suggests the memory controller is not batching reads and writes as well as the older parts, causing the performance to be impacted by the bus turn around time.  If that is the issue, then it could be solved by synchronizing the kernel so all CUs are reading at the same time (copying slots to the LDS), then they all write at the same time.

As a general comment, your code has been getting more complicated and therefore takes more work to follow.  I know sometimes you can't avoid adding complexity when you are tuning performance, but don't forget the best optimizations are the simple ones that reduce code size/complexity.
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
December 17, 2016, 10:07:10 PM
 #63

I know, I know... The newly added complexity actually bothered me quite a bit and I feel bad about making you go through it, but it was necessary to ensure the correctness of the code and maximize LDS usage and thus occupancy. I feel like I have exhausted all the means of optimization at the OpenCL level except for an automatic optimizer as far as RX 480 is concerned. Once I'm done with an on-the-fly optimizer, I will delve into the GCN assembly. I have been experimenting with global synchronization with some pretty interesting results.

As for Tonga and Hawaii, I used to own a whole bunch of them, but I sold them all... I'm thinking about getting a used Nano for testing purposes.

By the way, a new GTX 1060 finally arrived, so I can optimize the miner for NVIDIA cards as well. Good stuff.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
December 17, 2016, 10:24:12 PM
 #64

I know, I know... The newly added complexity actually bothered me quite a bit and I feel bad about making you go through it, but it was necessary to ensure the correctness of the code and maximize LDS usage and thus occupancy. I feel like I have exhausted all the means of optimization at the OpenCL level except for an automatic optimizer as far as RX 480 is concerned. Once I'm done with an on-the-fly optimizer, I will delve into the GCN assembly. I have been experimenting with global synchronization with some pretty interesting results.

As for Tonga and Hawaii, I used to own a whole bunch of them, but I sold them all... I'm thinking about getting a used Nano for testing purposes.

By the way, a new GTX 1060 finally arrived, so I can optimize the miner for NVIDIA cards as well. Good stuff.
Commit 10 and 11 reporting 0S/s on R9 390 under 14.04 fglrx, although GPU usage is 100%

Quote
Gateless Gate, a Zcash miner
Copyright 2016 zawawa @ bitcointalk.org
Connecting to eu1-zcash.flypool.org:3333
Solver 0.0: launching
Successfully connected to eu1-zcash.flypool.org:3333
Received target 0020c49ba5e353f7ced916872b020c49ba5e353f7ced916872b020c49ba5e353
Received job "a50e8e46b67264ee610b"
Stratum server sent us the first job
Mining on 1 device
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
December 17, 2016, 10:35:13 PM
 #65

I know, I know... The newly added complexity actually bothered me quite a bit and I feel bad about making you go through it, but it was necessary to ensure the correctness of the code and maximize LDS usage and thus occupancy. I feel like I have exhausted all the means of optimization at the OpenCL level except for an automatic optimizer as far as RX 480 is concerned. Once I'm done with an on-the-fly optimizer, I will delve into the GCN assembly. I have been experimenting with global synchronization with some pretty interesting results.

As for Tonga and Hawaii, I used to own a whole bunch of them, but I sold them all... I'm thinking about getting a used Nano for testing purposes.

By the way, a new GTX 1060 finally arrived, so I can optimize the miner for NVIDIA cards as well. Good stuff.
Commit 10 and 11 reporting 0S/s on R9 390 under 14.04 fglrx, although GPU usage is 100%


Well, it's fglrx... I don't think the kernel even successfully builds with it. I will implement a workaround.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
reflexmk
Sr. Member
****
Offline Offline

Activity: 289
Merit: 250



View Profile
December 17, 2016, 11:49:48 PM
 #66

Please do include a executable of the miner, for those of us that dont know how to compile from source. Tnx

████████
████████
████
████





████
████
████████
████████
     ▄▄████████▄▄
   ▄██████████████▄
 ▄██████████████████▄
██████▀▀▀▀▀█████▀▀▀▀▀█
██████     █████     █
██████     █████     █             ▄▄▄
██████     ▀▀▀▀▀     █        ███  ███
 ▀████                  ▄▄▄   ███  ▄▄▄ ▄▄▄  ▄▄▄ ▄▄▄ ▄▄▄  ▄▄
   ▀██     ▄▄▄▄▄      ▄█████▄ ███  ███ ███  ███ ████████████▄
     ▀     █████      ███▄▄██ ███  ███ ███  ███ ███ ▀███ ▀███
           ▀▀███      ███▄▄▄  ███▄ ███ ███▄████ ███  ███  ███
               ▀       ▀████▀  ▀██ ███ ▀███▀███ ███  ███  ███
                   ▀█
████████
████████
████
████





████
████
████████
████████
█  ████▀  █
█  ██▀▄█  █
█  ▀▄███  █
█  ████▀  █
██▀▄█
▀▄███
████▀
██▀▄█

▀▄███

█  ████▀  █

█  ██▀▄█  █

█  ▀▄███  █

█  █████  █
|
█  ████▀  █
█  ██▀▄█  █
█  ▀▄███  █
█  ████▀  █
██▀▄█
▀▄███
████▀
██▀▄█

▀▄███

█  ████▀  █

█  ██▀▄█  █

█  ▀▄███  █

█  █████  █
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
December 18, 2016, 12:00:46 AM
 #67

Please do include a executable of the miner, for those of us that dont know how to compile from source. Tnx

I will do that with each point release. Both for AMD and NVIDIA now. No worries.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
December 18, 2016, 12:03:52 AM
 #68

I'm getting 90 sol/s with GTX 1060... So, I can catch up with Eqminer if I reach 180 sol/s. I see.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
December 18, 2016, 12:24:06 AM
 #69

I'm getting 128 sol/s after 10 min of tweaking.
This whole thing doesn't look hard at all...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
ioglnx
Sr. Member
****
Offline Offline

Activity: 574
Merit: 250

Fighting mob law and inquisition in this forum


View Profile
December 18, 2016, 12:31:09 AM
 #70

Good luck bro^^
But you should overrun them..and beat :-D

GTX 1080Ti rocks da house... seriously... this card is a beast³
Owning by now 18x GTX1080Ti :-D @serious love of efficiency
m0niker
Newbie
*
Offline Offline

Activity: 39
Merit: 0


View Profile
December 18, 2016, 12:52:41 AM
 #71

Let me know if you need anyone with windows boxes to test, have a few 480s and 7970s and windows 7/10 around, and would be glad to help with testing. Thanks for doing this open source!
Kompik
Sr. Member
****
Offline Offline

Activity: 463
Merit: 250


View Profile
December 18, 2016, 12:55:00 AM
 #72

I'm getting 90 sol/s with GTX 1060... So, I can catch up with Eqminer if I reach 180 sol/s. I see.
Great!! Smiley Looking forward to 200 sols on the 1060 on your miner!! Smiley

Bitrated user: Kompik.
reb0rn21
Legendary
*
Offline Offline

Activity: 1898
Merit: 1024


View Profile
December 18, 2016, 01:24:25 AM
 #73

For Nvidia you must solve specific memory access, duno if anyone managed to get full coalesced memory transaction which should provide max performance...

atm nicehash miner is doung 215-250sol/s on 1060 6GB (3GB version is less because GPU is crippled)

              ▄▄▄ ▀▀▀▀▀▀▀▀▀ ▄▄▄
           ▄▀▀    ▄▄▄▄▄▄▄▄▄    ▀▀▄
        ▄▀▀  ▄▄▀█          ▀█▀▄▄  ▀▀▄
      ▄▀▀ ▄▄▀    ▀▀▄▄▄▄▄▄▄▀▀    ▀▄▄ ▀▀▄
     █   █            ▀            █   █
   ▄▀ █  ▀▄▄                     ▄█▀  █ ▀▄
  ▄▀ ▄▀ █▄ ▀▀▀██▄▄▄       ▄▄▄██▀▀  ██ ▀▄ ▀▄
  ▀▄▀▀▄ ██ ▄▄▄▄▄▄  ▀▄   ▄▀  ▄▄▄▄▄▄ ██ ▄▀▀▄▀
 ██   █ ██ ▀▄    ▀▄ █   █ ▄▀    ▄▀ ██ █  ▀██
 █  ▄█  ▀█  ▀▀▀▀▀▀▀ █   █ ▀▀▀▀▀▀▀  █   █▄  █
█▀ █  █  █          █   █          █  █  █ ▀▀
 █▀  ▄▀  █▀▄        █   █        ▄▀█  ▀▄  ▀█
 ▄  █▀   █ ▀█▄      ▀   ▀      ▄█▀ █  ▄▀█  ▄
 █▄▀  █  █                         █  █  ▀▄█
 ▀▄  █   ▀█        ▄▄▀▄▀▄▄        █▀   █  ▄
  ▀▄▀▀  █▄ █     ▀█  ▀▀▀  █▀     █ ▄█ ▄▀▀▄▀
   ▀ ▄  ██ █▀▄     ▀▀▄▄▄▀▀     ▄▀█ ██ ▀▄ ▀
    ▀█  ██ █ █▀▄    ▄▄▄▄▄    ▄▀█ █ ██  █▀
      ▀▄ ▀ █ █ ██▄         ▄██ █ █ ▀ ▄▀
        ▀▄ █ █ █ ▀█▄     ▄█▀ █ █ █ ▄▀
          ▀▀▄█ █    ▀▀▀▀▀    █ █▄▀▀
              ▀▀ ▄▄▄▄▄▄▄▄▄▄▄ ▀▀
   
..I  D  E  N  A..
   
Proof-of-Person Blockchain

Join the mining of the first human-centric
cryptocurrency
 



 
▲    2 3 2 2

..N  O  D  E  S..
   
                ██
                ██
                ██
                ██
                ██
         ▄      ██      ▄
         ███▄   ██   ▄███
          ▀███▄ ██ ▄███▀
            ▀████████▀
              ▀████▀
                ▀▀
██▄                            ▄██
███                            ███
███                            ███
███                            ███
 ███▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄███
  ▀▀██████████████████████████▀▀
   
D O W N L O A D

Idena node

   
   
▄▄▄██████▄▄▄
▄▄████████████████▄▄
▄█████▀▀        ▀▀█████▄
████▀                ▀████
███▀    ▄▄▄▄▄▄▄▄▄       ▀███
███      █   ▄▄ █▀▄        ███
██▀      █  ███ █  ▀▄      ▀██
███       █   ▀▀ ▀▀▀▀█       ███
███       █  ▄▄▄▄▄▄  █       ███
███       █  ▄▄▄▄▄▄  █       ███
██▄      █  ▄▄▄▄▄▄  █      ▄██
███      █          █      ███
███▄    ▀▀▀▀▀▀▀▀▀▀▀▀    ▄███
████▄                ▄████
▀█████▄▄        ▄▄█████▀
▀▀████████████████▀▀
▀▀▀██████▀▀▀
   
    .REQUEST INVITATION.
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
December 18, 2016, 06:51:23 AM
 #74

For Nvidia you must solve specific memory access, duno if anyone managed to get full coalesced memory transaction which should provide max performance...

atm nicehash miner is doung 215-250sol/s on 1060 6GB (3GB version is less because GPU is crippled)

Thanks a lot for the heads up. I figured out a way to rearrange elements in the hash tables efficiently, so I can run a ton of experiments to see which access pattern would results in the best performance.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
December 18, 2016, 07:50:38 AM
 #75

If that is the issue, then it could be solved by synchronizing the kernel so all CUs are reading at the same time (copying slots to the LDS), then they all write at the same time.


One more reason to use a GCN assembler, then. How exciting!

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
December 18, 2016, 07:54:16 AM
 #76

I just pushed a workaround for fglrx to the repo.
Let me know if that works.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
December 18, 2016, 10:17:30 AM
 #77

I just pushed a workaround for fglrx to the repo.
Let me know if that works.
Still the same...Login and access credetials are the same with the 14.04 machine, you can check it out whenever you want.

EDIT: There is also something that keeps bothering me. On 16.04 clinfo recognized 14 CU out of 40 and was doing 180S/s, with fglrx all CUs were recognized correctly and the hash speed was the same. Do you think that this could be due to not fully utilizing CUs on the chip?

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
December 18, 2016, 05:33:51 PM
 #78

I just pushed a workaround for fglrx to the repo.
Let me know if that works.
Still the same...Login and access credetials are the same with the 14.04 machine, you can check it out whenever you want.

EDIT: There is also something that keeps bothering me. On 16.04 clinfo recognized 14 CU out of 40 and was doing 180S/s, with fglrx all CUs were recognized correctly and the hash speed was the same. Do you think that this could be due to not fully utilizing CUs on the chip?

Got it, thanks! Let me get to that when I'm done with GTX 1060.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
December 18, 2016, 05:41:37 PM
 #79

I think I figured out how to coalesce global memory reads.
(Memory writes cannot be coalesced because the destination of each slot is not predictable.)
It would have been impossible with the original design of SA, but it should be possible with GG because it loads slots differently.
If everything works out, there should be a massive speedup, hehehe...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
jiggytom
Legendary
*
Offline Offline

Activity: 1068
Merit: 1020


View Profile
December 18, 2016, 07:29:13 PM
 #80

I think I figured out how to coalesce global memory reads.
(Memory writes cannot be coalesced because the destination of each slot is not predictable.)
It would have been impossible with the original design of SA, but it should be possible with GG because it loads slots differently.
If everything works out, there should be a massive speedup, hehehe...

Great news!  Does that include CUDA also?

... PLAY SHARE EARN...
.LBRY...
                            ▄▄███▄▄
                        ▄▄█████▀█████▄▄
                    ▄▄█████▀▀     ▀▀█████▄▄
                ▄▄█████▀▀             ▀▀█████▄▄
            ▄▄█████▀▀                     ▀▀█████▄▄
        ▄▄█████▀▀                             ▀▀█████▄▄
    ▄▄█████▀▀                                     ▀▀███
▄▄█████▀▀                                         ▄▄███
███▀▀                                         ▄▄█████▀▀
███     █▄▄                               ▄▄█████▀▀
███     █████▄▄                       ▄▄█████▀▀  ▄▄▄▄▄▄▄▄
███       ▀▀█████▄▄               ▄▄█████▀▀       ██████
█████▄▄       ▀▀█████▄▄       ▄▄█████▀▀       ▄▄███████
  ▀▀█████▄▄       ▀▀█████▄▄▄█████▀▀       ▄▄█████▀▀ ██
      ▀▀█████▄▄       ▀▀█████▀▀       ▄▄█████▀▀
          ▀▀█████▄▄       ▀       ▄▄█████▀▀
              ▀▀█████▄▄       ▄▄█████▀▀
                  ▀▀█████▄▄▄█████▀▀
                      ▀▀█████▀▀
                          ▀
BTC: 174MGp3R5prNbuen31Kx5G5XuyuAXu9jye
LBC: bWYN8NXGKWsgEAd6tQnJ5YRo2Z4r6PjxBH
Pages: « 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 ... 197 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!