Bitcoin Forum
December 13, 2017, 10:59:25 PM *
News: Latest stable version of Bitcoin Core: 0.15.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 [50] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 »
  Print  
Author Topic: Gateless Gate Sharp 1.1.4: zawawa's open-source dual ETH/XMR/PASC/LBC miner  (Read 163948 times)
SunStruck
Sr. Member
****
Online Online

Activity: 308



View Profile
March 31, 2017, 12:52:05 PM
 #981


The .bat file:

Code:
@echo off
set GPU_FORCE_64BIT_PTR 0
set GPU_MAX_HEAP_SIZE 100
set GPU_USE_SYNC_OBJECTS 1
set GPU_MAX_ALLOC_PERCENT 100
set GPU_SINGLE_ALLOC_PERCENT 100
gatelessgate.exe --gpu-platform 1 -k ethash-new -o stratum+tcp://eu1.ethermine.org:4444 -u 0x91fa32e00b0f365d629fb625182a83fed61f0642.gatelessgate -p x --xintensity 4620 --worksize 192 --gpu-threads 2 --no-extranonce
pause

I ran this experiment on Windows 10 with stock RX 480 and AMD  Crimson Software 16.9.2 as usual.
@laik2 It would be great if you could try the above settings as well. I am puzzled by the results myself...

--gpu-platform 0 , no ?

still cant connect to suprnova eth tho.. very odd.
1513205965
Hero Member
*
Offline Offline

Posts: 1513205965

View Profile Personal Message (Offline)

Ignore
1513205965
Reply with quote  #2

1513205965
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1513205965
Hero Member
*
Offline Offline

Posts: 1513205965

View Profile Personal Message (Offline)

Ignore
1513205965
Reply with quote  #2

1513205965
Report to moderator
1513205965
Hero Member
*
Offline Offline

Posts: 1513205965

View Profile Personal Message (Offline)

Ignore
1513205965
Reply with quote  #2

1513205965
Report to moderator
1513205965
Hero Member
*
Offline Offline

Posts: 1513205965

View Profile Personal Message (Offline)

Ignore
1513205965
Reply with quote  #2

1513205965
Report to moderator
nerdralph
Sr. Member
****
Offline Offline

Activity: 406


View Profile
March 31, 2017, 01:31:20 PM
 #982

I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

That's not optimized - you flipped the SLC and GLC bits, which will likely make it a tad SLOWER; it did when I tried that.

I was expecting just SLC (bypass L2) to help, though I recall Wolf's comments about GLC (bypass L1) actually helping.  I'd even expect GLC to hurt performance if you weren't very careful to ensure data was read in 64-byte chunks.

p.s.  There's also some easy optimizations to do with instruction reordering (though they might not make much difference in performance).  For example:
Code:
/*d11c6a3e 01a9013c*/ v_addc_u32      v62, vcc, v60, 0, vcc
/*2a7e62b2         */ v_xor_b32       v63, 50, v49
/*dc5c0000 4000003d*/ flat_load_dwordx4 v[64:67], v[61:62] slc glc
/*dc5c0000 3b00003b*/ flat_load_dwordx4 v[59:62], v[59:60] slc glc
/*bf8c0171         */ s_waitcnt       vmcnt(1) & lgkmcnt(1)

The v_xor_b32 can be moved to after the flat_load_dwordx4.
nerdralph
Sr. Member
****
Offline Offline

Activity: 406


View Profile
March 31, 2017, 01:51:41 PM
 #983

I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

I probably wouldn't even bother with Southern Islands; no flat_ instructions.  It should be reasonably easy to write a single kernel for Sea Islands and later, with the main differences being for the ABI changes for kernel parameter passing.
nerdralph
Sr. Member
****
Offline Offline

Activity: 406


View Profile
March 31, 2017, 02:08:58 PM
 #984

Just cloned the repo to do a linux build then noticed the autotools requirement.  Ugh.  I much prefer it when developers run autoreconf, and check the configure script into the repo.  Then to build it's just the usual ./configure;make
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
March 31, 2017, 02:14:45 PM
 #985

Just cloned the repo to do a linux build then noticed the autotools requirement.  Ugh.  I much prefer it when developers run autoreconf, and check the configure script into the repo.  Then to build it's just the usual ./configure;make


I will probably switch to CMake + ninja sooner than later. This whole autotools thing is too archaic to my taste.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
March 31, 2017, 03:29:26 PM
 #986

I run the same experiment on Ubuntu 16.04.1 LTC + Linux Kernel 4.10.2 with AMDGPU-Pro 16.60 and the difference was even stalker.

The assembly version:



The original ethash-new.cl:



I will go ahead and prepare binaries for the other cards.
I will probably include GCN1 as it really does not take too long.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
WBF1
Sr. Member
****
Offline Offline

Activity: 365


View Profile
March 31, 2017, 04:20:08 PM
 #987

I dropped your new files in over top of version pre4 and am also seeing a 1-1.5 mhs boost on my RX 470s and 480. Did not try on R9 390, but it was already performing as good as Claymore ever had.
laik2
Sr. Member
****
Offline Offline

Activity: 392


View Profile
March 31, 2017, 04:40:03 PM
 #988

I run the same experiment on Ubuntu 16.04.1 LTC + Linux Kernel 4.10.2 with AMDGPU-Pro 16.60 and the difference was even stalker.

The assembly version:



The original ethash-new.cl:



I will go ahead and prepare binaries for the other cards.
I will probably include GCN1 as it really does not take too long.
Please, do share your linux xorg.conf and also I see a lot of hw errors for stock GPU...Do you run gatelessgate as root or as user?
I see gpu monitoring temp/fan is working for you...

EDIT: Here are the latest tests...
GG ethash-new.cl copied from binary-kernel dir

GG ethash-new.cl copied from kernel dir

GG ethash-new.cl + compiled binaries from binary-kernel dir

Claymore 8.0 Linux default settings



ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
jstefanop
Hero Member
*****
Offline Offline

Activity: 848


View Profile
March 31, 2017, 05:31:04 PM
 #989

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

www.AriseChickun.com 0% fee segwit signaling Litecoin Pool!
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
LTC:LX5vpxrQE4eLRLPobKwZhw2comkKFCh3p4 - BTC:14w9Lea6kdVzspJk8TQRe7qSYu9LhzJJsh
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
March 31, 2017, 05:38:49 PM
 #990

I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

That's not optimized - you flipped the SLC and GLC bits, which will likely make it a tad SLOWER; it did when I tried that.

I was expecting just SLC (bypass L2) to help, though I recall Wolf's comments about GLC (bypass L1) actually helping.  I'd even expect GLC to hurt performance if you weren't very careful to ensure data was read in 64-byte chunks.

p.s.  There's also some easy optimizations to do with instruction reordering (though they might not make much difference in performance).  For example:
Code:
/*d11c6a3e 01a9013c*/ v_addc_u32      v62, vcc, v60, 0, vcc
/*2a7e62b2         */ v_xor_b32       v63, 50, v49
/*dc5c0000 4000003d*/ flat_load_dwordx4 v[64:67], v[61:62] slc glc
/*dc5c0000 3b00003b*/ flat_load_dwordx4 v[59:62], v[59:60] slc glc
/*bf8c0171         */ s_waitcnt       vmcnt(1) & lgkmcnt(1)

The v_xor_b32 can be moved to after the flat_load_dwordx4.


That's a good catch. I was actually thinking about automating this kind of instruction reordering.
My compiler driver rewrites the output of LLVM/Clang, so it shouldn't be that difficult.
I really want to combine this feature with register usage analysis.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
March 31, 2017, 05:41:22 PM
 #991

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
laik2
Sr. Member
****
Offline Offline

Activity: 392


View Profile
March 31, 2017, 06:00:27 PM
 #992

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.
Can you confirm that I'm doing it right?
There are 2 folders
kernels(default kernels of sgminer and equihash.cl)
binary-kernels( your opencl binary kernels + ethash-new.cl using asm)
I think copying kernels from binary-kernel folder to main gatelessgate install directory uses your last kernel...
I'm used to old sgminer style, all kernels in same folder with executable, but I might be wrong...

ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
jstefanop
Hero Member
*****
Offline Offline

Activity: 848


View Profile
March 31, 2017, 06:07:20 PM
 #993

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.

www.AriseChickun.com 0% fee segwit signaling Litecoin Pool!
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
LTC:LX5vpxrQE4eLRLPobKwZhw2comkKFCh3p4 - BTC:14w9Lea6kdVzspJk8TQRe7qSYu9LhzJJsh
laik2
Sr. Member
****
Offline Offline

Activity: 392


View Profile
March 31, 2017, 06:33:11 PM
 #994

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.
For me it's quite the opposite...using 2 threads gains additional +0.3/0.4 but too many hw errors.
As you can see in the picture 29.42(1 thread) does gg vs 29.8 on claymore vs 30 (2 threads gg).
EDIT: Did some timings mods and now hw errors are acceptable ~1/2 every minute.

ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
zawawa
Sr. Member
****
Offline Offline

Activity: 420


Miner Developer


View Profile
March 31, 2017, 06:38:04 PM
 #995

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.

The current version was working fine with Ubuntu, though. Strange, strange...
I just pushed a fix to the repo anyway.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
laik2
Sr. Member
****
Offline Offline

Activity: 392


View Profile
March 31, 2017, 06:58:29 PM
 #996

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.

The current version was working fine with Ubuntu, though. Strange, strange...
I just pushed a fix to the repo anyway.
It is Smiley
It's currently hashing with constant 30Mh vs. 29.7 with claymore. But I still have to copy binary-kernel/* to $gatelessgate_install_dir/bin

I am still wondering how your temp/fans were working but mine are not...Did you start gg with root priv and mining GPU as main one?

ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
sp_
Legendary
*
Offline Offline

Activity: 1246

Ccminer developer


View Profile
March 31, 2017, 07:02:41 PM
 #997

If you manage to beat claymore zawawa, thats impressive work. Smiley
jstefanop
Hero Member
*****
Offline Offline

Activity: 848


View Profile
March 31, 2017, 07:15:44 PM
 #998

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.

The current version was working fine with Ubuntu, though. Strange, strange...
I just pushed a fix to the repo anyway.

It does load the .bins in binary-kernal, but how do you use the asm file? (ethash-new-gcn3-ocl20.asm)

www.AriseChickun.com 0% fee segwit signaling Litecoin Pool!
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
LTC:LX5vpxrQE4eLRLPobKwZhw2comkKFCh3p4 - BTC:14w9Lea6kdVzspJk8TQRe7qSYu9LhzJJsh
jstefanop
Hero Member
*****
Offline Offline

Activity: 848


View Profile
March 31, 2017, 07:19:41 PM
 #999

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.
For me it's quite the opposite...using 2 threads gains additional +0.3/0.4 but too many hw errors.
As you can see in the picture 29.42(1 thread) does gg vs 29.8 on claymore vs 30 (2 threads gg).
EDIT: Did some timings mods and now hw errors are acceptable ~1/2 every minute.


Your running 4.10 kernel...im on the stock 4.4. I think thats where the difference is from.

www.AriseChickun.com 0% fee segwit signaling Litecoin Pool!
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
LTC:LX5vpxrQE4eLRLPobKwZhw2comkKFCh3p4 - BTC:14w9Lea6kdVzspJk8TQRe7qSYu9LhzJJsh
laik2
Sr. Member
****
Offline Offline

Activity: 392


View Profile
March 31, 2017, 08:00:32 PM
 #1000

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.
For me it's quite the opposite...using 2 threads gains additional +0.3/0.4 but too many hw errors.
As you can see in the picture 29.42(1 thread) does gg vs 29.8 on claymore vs 30 (2 threads gg).
EDIT: Did some timings mods and now hw errors are acceptable ~1/2 every minute.


Your running 4.10 kernel...im on the stock 4.4. I think thats where the difference is from.
https://drive.google.com/drive/folders/0B72yKpOokCMcVnV5LWNMS2ltYmM
I've uploaded some of my kernels. 4.10/4.11 are tested and working fine.
Just remember to update only opencl packages from amdgpu-pro 16.60.
Ditto about the asm ...

ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 [50] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!