Bitcoin Forum
December 12, 2017, 12:54:03 AM *
News: Latest stable version of Bitcoin Core: 0.15.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 ... 85 »
  Print  
Author Topic: Gateless Gate Sharp 1.1.3: zawawa's open-source dual ETH/XMR/PASC/LBC miner  (Read 162147 times)
laik2
Sr. Member
****
Offline Offline

Activity: 392


View Profile
December 16, 2016, 10:05:43 PM
 #61

Yeah, that would be great. I just pushed an improved version of parallel writes.
It is much faster now, but it's still slower than the single thread version.

In the mean time, I will work on other optimizations.
I think I'm getting a hang of this whole thing.
I am expecting another 10-20% speedup today.
We will see.
I've turned on the linux server if you want to test.

ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
1513040043
Hero Member
*
Offline Offline

Posts: 1513040043

View Profile Personal Message (Offline)

Ignore
1513040043
Reply with quote  #2

1513040043
Report to moderator
1513040043
Hero Member
*
Offline Offline

Posts: 1513040043

View Profile Personal Message (Offline)

Ignore
1513040043
Reply with quote  #2

1513040043
Report to moderator
1513040043
Hero Member
*
Offline Offline

Posts: 1513040043

View Profile Personal Message (Offline)

Ignore
1513040043
Reply with quote  #2

1513040043
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1513040043
Hero Member
*
Offline Offline

Posts: 1513040043

View Profile Personal Message (Offline)

Ignore
1513040043
Reply with quote  #2

1513040043
Report to moderator
1513040043
Hero Member
*
Offline Offline

Posts: 1513040043

View Profile Personal Message (Offline)

Ignore
1513040043
Reply with quote  #2

1513040043
Report to moderator
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
December 17, 2016, 01:57:58 AM
 #62

Not so fast  Wink Easy, easy...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
nerdralph
Sr. Member
****
Offline Offline

Activity: 406


View Profile
December 17, 2016, 02:19:29 PM
 #63

Yeah, that would be great. I just pushed an improved version of parallel writes.
It is much faster now, but it's still slower than the single thread version.

If you have access to a Tonga or Hawaii card I'd suggest testing with one of those as well.  Ellesmere's sequential copy performance is much worse than Tonga and Hawaii in testing with my cl-mem utility.
https://github.com/nerdralph/cl-mem

Some of your changes could be faster on other GPUs even if they are slower on your Rx 480.  The slow copy speed on Ellesmere suggests the memory controller is not batching reads and writes as well as the older parts, causing the performance to be impacted by the bus turn around time.  If that is the issue, then it could be solved by synchronizing the kernel so all CUs are reading at the same time (copying slots to the LDS), then they all write at the same time.

As a general comment, your code has been getting more complicated and therefore takes more work to follow.  I know sometimes you can't avoid adding complexity when you are tuning performance, but don't forget the best optimizations are the simple ones that reduce code size/complexity.
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
December 17, 2016, 10:07:10 PM
 #64

I know, I know... The newly added complexity actually bothered me quite a bit and I feel bad about making you go through it, but it was necessary to ensure the correctness of the code and maximize LDS usage and thus occupancy. I feel like I have exhausted all the means of optimization at the OpenCL level except for an automatic optimizer as far as RX 480 is concerned. Once I'm done with an on-the-fly optimizer, I will delve into the GCN assembly. I have been experimenting with global synchronization with some pretty interesting results.

As for Tonga and Hawaii, I used to own a whole bunch of them, but I sold them all... I'm thinking about getting a used Nano for testing purposes.

By the way, a new GTX 1060 finally arrived, so I can optimize the miner for NVIDIA cards as well. Good stuff.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
laik2
Sr. Member
****
Offline Offline

Activity: 392


View Profile
December 17, 2016, 10:24:12 PM
 #65

I know, I know... The newly added complexity actually bothered me quite a bit and I feel bad about making you go through it, but it was necessary to ensure the correctness of the code and maximize LDS usage and thus occupancy. I feel like I have exhausted all the means of optimization at the OpenCL level except for an automatic optimizer as far as RX 480 is concerned. Once I'm done with an on-the-fly optimizer, I will delve into the GCN assembly. I have been experimenting with global synchronization with some pretty interesting results.

As for Tonga and Hawaii, I used to own a whole bunch of them, but I sold them all... I'm thinking about getting a used Nano for testing purposes.

By the way, a new GTX 1060 finally arrived, so I can optimize the miner for NVIDIA cards as well. Good stuff.
Commit 10 and 11 reporting 0S/s on R9 390 under 14.04 fglrx, although GPU usage is 100%

Quote
Gateless Gate, a Zcash miner
Copyright 2016 zawawa @ bitcointalk.org
Connecting to eu1-zcash.flypool.org:3333
Solver 0.0: launching
Successfully connected to eu1-zcash.flypool.org:3333
Received target 0020c49ba5e353f7ced916872b020c49ba5e353f7ced916872b020c49ba5e353
Received job "a50e8e46b67264ee610b"
Stratum server sent us the first job
Mining on 1 device
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 0.0 sol/s [dev0 0.0] 0 shares

ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
December 17, 2016, 10:35:13 PM
 #66

I know, I know... The newly added complexity actually bothered me quite a bit and I feel bad about making you go through it, but it was necessary to ensure the correctness of the code and maximize LDS usage and thus occupancy. I feel like I have exhausted all the means of optimization at the OpenCL level except for an automatic optimizer as far as RX 480 is concerned. Once I'm done with an on-the-fly optimizer, I will delve into the GCN assembly. I have been experimenting with global synchronization with some pretty interesting results.

As for Tonga and Hawaii, I used to own a whole bunch of them, but I sold them all... I'm thinking about getting a used Nano for testing purposes.

By the way, a new GTX 1060 finally arrived, so I can optimize the miner for NVIDIA cards as well. Good stuff.
Commit 10 and 11 reporting 0S/s on R9 390 under 14.04 fglrx, although GPU usage is 100%


Well, it's fglrx... I don't think the kernel even successfully builds with it. I will implement a workaround.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
reflexmk
Sr. Member
****
Offline Offline

Activity: 282



View Profile
December 17, 2016, 11:49:48 PM
 #67

Please do include a executable of the miner, for those of us that dont know how to compile from source. Tnx

████████
████████
████
████





████
████
████████
████████
     ▄▄████████▄▄
   ▄██████████████▄
 ▄██████████████████▄
██████▀▀▀▀▀█████▀▀▀▀▀█
██████     █████     █
██████     █████     █             ▄▄▄
██████     ▀▀▀▀▀     █        ███  ███
 ▀████                  ▄▄▄   ███  ▄▄▄ ▄▄▄  ▄▄▄ ▄▄▄ ▄▄▄  ▄▄
   ▀██     ▄▄▄▄▄      ▄█████▄ ███  ███ ███  ███ ████████████▄
     ▀     █████      ███▄▄██ ███  ███ ███  ███ ███ ▀███ ▀███
           ▀▀███      ███▄▄▄  ███▄ ███ ███▄████ ███  ███  ███
               ▀       ▀████▀  ▀██ ███ ▀███▀███ ███  ███  ███
                   ▀█
████████
████████
████
████





████
████
████████
████████
█  ████▀  █
█  ██▀▄█  █
█  ▀▄███  █
█  ████▀  █
██▀▄█
▀▄███
████▀
██▀▄█

▀▄███

█  ████▀  █

█  ██▀▄█  █

█  ▀▄███  █

█  █████  █
|
█  ████▀  █
█  ██▀▄█  █
█  ▀▄███  █
█  ████▀  █
██▀▄█
▀▄███
████▀
██▀▄█

▀▄███

█  ████▀  █

█  ██▀▄█  █

█  ▀▄███  █

█  █████  █
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
December 18, 2016, 12:00:46 AM
 #68

Please do include a executable of the miner, for those of us that dont know how to compile from source. Tnx

I will do that with each point release. Both for AMD and NVIDIA now. No worries.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
December 18, 2016, 12:03:52 AM
 #69

I'm getting 90 sol/s with GTX 1060... So, I can catch up with Eqminer if I reach 180 sol/s. I see.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
December 18, 2016, 12:24:06 AM
 #70

I'm getting 128 sol/s after 10 min of tweaking.
This whole thing doesn't look hard at all...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
ioglnx
Sr. Member
****
Offline Offline

Activity: 434

Fighting mob law and inquisition in this forum


View Profile
December 18, 2016, 12:31:09 AM
 #71

Good luck bro^^
But you should overrun them..and beat :-D

GTX 1080Ti rocks da house... seriously... this card is a beast³
Owning by now 18x GTX1080Ti :-D @serious love of efficiency
m0niker
Jr. Member
*
Offline Offline

Activity: 39


View Profile
December 18, 2016, 12:52:41 AM
 #72

Let me know if you need anyone with windows boxes to test, have a few 480s and 7970s and windows 7/10 around, and would be glad to help with testing. Thanks for doing this open source!
Kompik
Sr. Member
****
Online Online

Activity: 383


View Profile
December 18, 2016, 12:55:00 AM
 #73

I'm getting 90 sol/s with GTX 1060... So, I can catch up with Eqminer if I reach 180 sol/s. I see.
Great!! Smiley Looking forward to 200 sols on the 1060 on your miner!! Smiley
reb0rn21
Legendary
*
Offline Offline

Activity: 1246


View Profile
December 18, 2016, 01:24:25 AM
 #74

For Nvidia you must solve specific memory access, duno if anyone managed to get full coalesced memory transaction which should provide max performance...

atm nicehash miner is doung 215-250sol/s on 1060 6GB (3GB version is less because GPU is crippled)

... PLAY SHARE EARN...
.LBRY...
                            __¦¦¦__
                        __¦¦¦¦¦¯¦¦¦¦¦__
                    __¦¦¦¦¦¯¯     ¯¯¦¦¦¦¦__
                __¦¦¦¦¦¯¯             ¯¯¦¦¦¦¦__
            __¦¦¦¦¦¯¯                     ¯¯¦¦¦¦¦__
        __¦¦¦¦¦¯¯                             ¯¯¦¦¦¦¦__
    __¦¦¦¦¦¯¯                                     ¯¯¦¦¦
__¦¦¦¦¦¯¯                                         __¦¦¦
¦¦¦¯¯                                         __¦¦¦¦¦¯¯
¦¦¦     ¦__                               __¦¦¦¦¦¯¯
¦¦¦     ¦¦¦¦¦__                       __¦¦¦¦¦¯¯  ________
¦¦¦       ¯¯¦¦¦¦¦__               __¦¦¦¦¦¯¯       ¦¦¦¦¦¦
¦¦¦¦¦__       ¯¯¦¦¦¦¦__       __¦¦¦¦¦¯¯       __¦¦¦¦¦¦¦
  ¯¯¦¦¦¦¦__       ¯¯¦¦¦¦¦___¦¦¦¦¦¯¯       __¦¦¦¦¦¯¯ ¦¦
      ¯¯¦¦¦¦¦__       ¯¯¦¦¦¦¦¯¯       __¦¦¦¦¦¯¯
          ¯¯¦¦¦¦¦__       ¯       __¦¦¦¦¦¯¯
              ¯¯¦¦¦¦¦__       __¦¦¦¦¦¯¯
                  ¯¯¦¦¦¦¦___¦¦¦¦¦¯¯
                      ¯¯¦¦¦¦¦¯¯
                          ¯
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
December 18, 2016, 06:51:23 AM
 #75

For Nvidia you must solve specific memory access, duno if anyone managed to get full coalesced memory transaction which should provide max performance...

atm nicehash miner is doung 215-250sol/s on 1060 6GB (3GB version is less because GPU is crippled)

Thanks a lot for the heads up. I figured out a way to rearrange elements in the hash tables efficiently, so I can run a ton of experiments to see which access pattern would results in the best performance.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
December 18, 2016, 07:50:38 AM
 #76

If that is the issue, then it could be solved by synchronizing the kernel so all CUs are reading at the same time (copying slots to the LDS), then they all write at the same time.


One more reason to use a GCN assembler, then. How exciting!

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
December 18, 2016, 07:54:16 AM
 #77

I just pushed a workaround for fglrx to the repo.
Let me know if that works.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
laik2
Sr. Member
****
Offline Offline

Activity: 392


View Profile
December 18, 2016, 10:17:30 AM
 #78

I just pushed a workaround for fglrx to the repo.
Let me know if that works.
Still the same...Login and access credetials are the same with the 14.04 machine, you can check it out whenever you want.

EDIT: There is also something that keeps bothering me. On 16.04 clinfo recognized 14 CU out of 40 and was doing 180S/s, with fglrx all CUs were recognized correctly and the hash speed was the same. Do you think that this could be due to not fully utilizing CUs on the chip?

ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
December 18, 2016, 05:33:51 PM
 #79

I just pushed a workaround for fglrx to the repo.
Let me know if that works.
Still the same...Login and access credetials are the same with the 14.04 machine, you can check it out whenever you want.

EDIT: There is also something that keeps bothering me. On 16.04 clinfo recognized 14 CU out of 40 and was doing 180S/s, with fglrx all CUs were recognized correctly and the hash speed was the same. Do you think that this could be due to not fully utilizing CUs on the chip?

Got it, thanks! Let me get to that when I'm done with GTX 1060.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
December 18, 2016, 05:41:37 PM
 #80

I think I figured out how to coalesce global memory reads.
(Memory writes cannot be coalesced because the destination of each slot is not predictable.)
It would have been impossible with the original design of SA, but it should be possible with GG because it loads slots differently.
If everything works out, there should be a massive speedup, hehehe...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
Pages: « 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 ... 85 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!