Bitcoin Forum
April 25, 2018, 07:13:06 PM *
News: Latest stable version of Bitcoin Core: 0.16.0  [Torrent]. (New!)
 
   Home   Help Search Donate Login Register  
Poll
Question: Which algorithm would you like to see in GGS next?
Equihash - 8 (50%)
X17 - 6 (37.5%)
NIST5 - 1 (6.3%)
Groestl - 0 (0%)
Allium - 1 (6.3%)
Total Voters: 16

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 [50] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 ... 170 »
  Print  
Author Topic: Gateless Gate Sharp 1.3.5: Now with memory timing mods and X16R/X16S!  (Read 189721 times)
SunStruck
Sr. Member
****
Offline Offline

Activity: 420
Merit: 250



View Profile
March 31, 2017, 12:52:05 PM
 #981


The .bat file:

Code:
@echo off
set GPU_FORCE_64BIT_PTR 0
set GPU_MAX_HEAP_SIZE 100
set GPU_USE_SYNC_OBJECTS 1
set GPU_MAX_ALLOC_PERCENT 100
set GPU_SINGLE_ALLOC_PERCENT 100
gatelessgate.exe --gpu-platform 1 -k ethash-new -o stratum+tcp://eu1.ethermine.org:4444 -u 0x91fa32e00b0f365d629fb625182a83fed61f0642.gatelessgate -p x --xintensity 4620 --worksize 192 --gpu-threads 2 --no-extranonce
pause

I ran this experiment on Windows 10 with stock RX 480 and AMD  Crimson Software 16.9.2 as usual.
@laik2 It would be great if you could try the above settings as well. I am puzzled by the results myself...

--gpu-platform 0 , no ?

still cant connect to suprnova eth tho.. very odd.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1524683586
Hero Member
*
Offline Offline

Posts: 1524683586

View Profile Personal Message (Offline)

Ignore
1524683586
Reply with quote  #2

1524683586
Report to moderator
1524683586
Hero Member
*
Offline Offline

Posts: 1524683586

View Profile Personal Message (Offline)

Ignore
1524683586
Reply with quote  #2

1524683586
Report to moderator
nerdralph
Sr. Member
****
Offline Offline

Activity: 546
Merit: 251


View Profile
March 31, 2017, 01:31:20 PM
 #982

I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

That's not optimized - you flipped the SLC and GLC bits, which will likely make it a tad SLOWER; it did when I tried that.

I was expecting just SLC (bypass L2) to help, though I recall Wolf's comments about GLC (bypass L1) actually helping.  I'd even expect GLC to hurt performance if you weren't very careful to ensure data was read in 64-byte chunks.

p.s.  There's also some easy optimizations to do with instruction reordering (though they might not make much difference in performance).  For example:
Code:
/*d11c6a3e 01a9013c*/ v_addc_u32      v62, vcc, v60, 0, vcc
/*2a7e62b2         */ v_xor_b32       v63, 50, v49
/*dc5c0000 4000003d*/ flat_load_dwordx4 v[64:67], v[61:62] slc glc
/*dc5c0000 3b00003b*/ flat_load_dwordx4 v[59:62], v[59:60] slc glc
/*bf8c0171         */ s_waitcnt       vmcnt(1) & lgkmcnt(1)

The v_xor_b32 can be moved to after the flat_load_dwordx4.
nerdralph
Sr. Member
****
Offline Offline

Activity: 546
Merit: 251


View Profile
March 31, 2017, 01:51:41 PM
 #983

I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

I probably wouldn't even bother with Southern Islands; no flat_ instructions.  It should be reasonably easy to write a single kernel for Sea Islands and later, with the main differences being for the ABI changes for kernel parameter passing.
nerdralph
Sr. Member
****
Offline Offline

Activity: 546
Merit: 251


View Profile
March 31, 2017, 02:08:58 PM
 #984

Just cloned the repo to do a linux build then noticed the autotools requirement.  Ugh.  I much prefer it when developers run autoreconf, and check the configure script into the repo.  Then to build it's just the usual ./configure;make
zawawa
Sr. Member
****
Offline Offline

Activity: 546
Merit: 276


Miner Developer


View Profile
March 31, 2017, 02:14:45 PM
 #985

Just cloned the repo to do a linux build then noticed the autotools requirement.  Ugh.  I much prefer it when developers run autoreconf, and check the configure script into the repo.  Then to build it's just the usual ./configure;make


I will probably switch to CMake + ninja sooner than later. This whole autotools thing is too archaic to my taste.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Offline Offline

Activity: 546
Merit: 276


Miner Developer


View Profile
March 31, 2017, 03:29:26 PM
 #986

I run the same experiment on Ubuntu 16.04.1 LTC + Linux Kernel 4.10.2 with AMDGPU-Pro 16.60 and the difference was even stalker.

The assembly version:



The original ethash-new.cl:



I will go ahead and prepare binaries for the other cards.
I will probably include GCN1 as it really does not take too long.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
WBF1
Sr. Member
****
Offline Offline

Activity: 417
Merit: 250


View Profile
March 31, 2017, 04:20:08 PM
 #987

I dropped your new files in over top of version pre4 and am also seeing a 1-1.5 mhs boost on my RX 470s and 480. Did not try on R9 390, but it was already performing as good as Claymore ever had.

Anonymous, no-registration, no-frills BCH Mining - http://luckypool.co - Get Lucky Today!
laik2
Sr. Member
****
Offline Offline

Activity: 451
Merit: 258



View Profile
March 31, 2017, 04:40:03 PM
 #988

I run the same experiment on Ubuntu 16.04.1 LTC + Linux Kernel 4.10.2 with AMDGPU-Pro 16.60 and the difference was even stalker.

The assembly version:



The original ethash-new.cl:



I will go ahead and prepare binaries for the other cards.
I will probably include GCN1 as it really does not take too long.
Please, do share your linux xorg.conf and also I see a lot of hw errors for stock GPU...Do you run gatelessgate as root or as user?
I see gpu monitoring temp/fan is working for you...

EDIT: Here are the latest tests...
GG ethash-new.cl copied from binary-kernel dir

GG ethash-new.cl copied from kernel dir

GG ethash-new.cl + compiled binaries from binary-kernel dir

Claymore 8.0 Linux default settings



ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
jstefanop
Legendary
*
Offline Offline

Activity: 1063
Merit: 1048


View Profile
March 31, 2017, 05:31:04 PM
 #989

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

www.AriseChickun.com 0% fee segwit signaling Litecoin Pool!
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
LTC:LX5vpxrQE4eLRLPobKwZhw2comkKFCh3p4 - BTC:14w9Lea6kdVzspJk8TQRe7qSYu9LhzJJsh
zawawa
Sr. Member
****
Offline Offline

Activity: 546
Merit: 276


Miner Developer


View Profile
March 31, 2017, 05:38:49 PM
 #990

I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

That's not optimized - you flipped the SLC and GLC bits, which will likely make it a tad SLOWER; it did when I tried that.

I was expecting just SLC (bypass L2) to help, though I recall Wolf's comments about GLC (bypass L1) actually helping.  I'd even expect GLC to hurt performance if you weren't very careful to ensure data was read in 64-byte chunks.

p.s.  There's also some easy optimizations to do with instruction reordering (though they might not make much difference in performance).  For example:
Code:
/*d11c6a3e 01a9013c*/ v_addc_u32      v62, vcc, v60, 0, vcc
/*2a7e62b2         */ v_xor_b32       v63, 50, v49
/*dc5c0000 4000003d*/ flat_load_dwordx4 v[64:67], v[61:62] slc glc
/*dc5c0000 3b00003b*/ flat_load_dwordx4 v[59:62], v[59:60] slc glc
/*bf8c0171         */ s_waitcnt       vmcnt(1) & lgkmcnt(1)

The v_xor_b32 can be moved to after the flat_load_dwordx4.


That's a good catch. I was actually thinking about automating this kind of instruction reordering.
My compiler driver rewrites the output of LLVM/Clang, so it shouldn't be that difficult.
I really want to combine this feature with register usage analysis.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Offline Offline

Activity: 546
Merit: 276


Miner Developer


View Profile
March 31, 2017, 05:41:22 PM
 #991

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
laik2
Sr. Member
****
Offline Offline

Activity: 451
Merit: 258



View Profile
March 31, 2017, 06:00:27 PM
 #992

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.
Can you confirm that I'm doing it right?
There are 2 folders
kernels(default kernels of sgminer and equihash.cl)
binary-kernels( your opencl binary kernels + ethash-new.cl using asm)
I think copying kernels from binary-kernel folder to main gatelessgate install directory uses your last kernel...
I'm used to old sgminer style, all kernels in same folder with executable, but I might be wrong...

ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
jstefanop
Legendary
*
Offline Offline

Activity: 1063
Merit: 1048


View Profile
March 31, 2017, 06:07:20 PM
 #993

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.

www.AriseChickun.com 0% fee segwit signaling Litecoin Pool!
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
LTC:LX5vpxrQE4eLRLPobKwZhw2comkKFCh3p4 - BTC:14w9Lea6kdVzspJk8TQRe7qSYu9LhzJJsh
laik2
Sr. Member
****
Offline Offline

Activity: 451
Merit: 258



View Profile
March 31, 2017, 06:33:11 PM
 #994

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.
For me it's quite the opposite...using 2 threads gains additional +0.3/0.4 but too many hw errors.
As you can see in the picture 29.42(1 thread) does gg vs 29.8 on claymore vs 30 (2 threads gg).
EDIT: Did some timings mods and now hw errors are acceptable ~1/2 every minute.

ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
zawawa
Sr. Member
****
Offline Offline

Activity: 546
Merit: 276


Miner Developer


View Profile
March 31, 2017, 06:38:04 PM
 #995

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.

The current version was working fine with Ubuntu, though. Strange, strange...
I just pushed a fix to the repo anyway.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
laik2
Sr. Member
****
Offline Offline

Activity: 451
Merit: 258



View Profile
March 31, 2017, 06:58:29 PM
 #996

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.

The current version was working fine with Ubuntu, though. Strange, strange...
I just pushed a fix to the repo anyway.
It is Smiley
It's currently hashing with constant 30Mh vs. 29.7 with claymore. But I still have to copy binary-kernel/* to $gatelessgate_install_dir/bin

I am still wondering how your temp/fans were working but mine are not...Did you start gg with root priv and mining GPU as main one?

ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
sp_
Legendary
*
Offline Offline

Activity: 1386
Merit: 1009

Ccminer developer


View Profile
March 31, 2017, 07:02:41 PM
 #997

If you manage to beat claymore zawawa, thats impressive work. Smiley
jstefanop
Legendary
*
Offline Offline

Activity: 1063
Merit: 1048


View Profile
March 31, 2017, 07:15:44 PM
 #998

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.

The current version was working fine with Ubuntu, though. Strange, strange...
I just pushed a fix to the repo anyway.

It does load the .bins in binary-kernal, but how do you use the asm file? (ethash-new-gcn3-ocl20.asm)

www.AriseChickun.com 0% fee segwit signaling Litecoin Pool!
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
LTC:LX5vpxrQE4eLRLPobKwZhw2comkKFCh3p4 - BTC:14w9Lea6kdVzspJk8TQRe7qSYu9LhzJJsh
jstefanop
Legendary
*
Offline Offline

Activity: 1063
Merit: 1048


View Profile
March 31, 2017, 07:19:41 PM
 #999

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.
For me it's quite the opposite...using 2 threads gains additional +0.3/0.4 but too many hw errors.
As you can see in the picture 29.42(1 thread) does gg vs 29.8 on claymore vs 30 (2 threads gg).
EDIT: Did some timings mods and now hw errors are acceptable ~1/2 every minute.


Your running 4.10 kernel...im on the stock 4.4. I think thats where the difference is from.

www.AriseChickun.com 0% fee segwit signaling Litecoin Pool!
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
LTC:LX5vpxrQE4eLRLPobKwZhw2comkKFCh3p4 - BTC:14w9Lea6kdVzspJk8TQRe7qSYu9LhzJJsh
laik2
Sr. Member
****
Offline Offline

Activity: 451
Merit: 258



View Profile
March 31, 2017, 08:00:32 PM
 #1000

Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.
For me it's quite the opposite...using 2 threads gains additional +0.3/0.4 but too many hw errors.
As you can see in the picture 29.42(1 thread) does gg vs 29.8 on claymore vs 30 (2 threads gg).
EDIT: Did some timings mods and now hw errors are acceptable ~1/2 every minute.


Your running 4.10 kernel...im on the stock 4.4. I think thats where the difference is from.
https://drive.google.com/drive/folders/0B72yKpOokCMcVnV5LWNMS2ltYmM
I've uploaded some of my kernels. 4.10/4.11 are tested and working fine.
Just remember to update only opencl packages from amdgpu-pro 16.60.
Ditto about the asm ...

ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 [50] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 ... 170 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!