Bitcoin Forum
April 25, 2024, 08:12:07 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Poll
Question: Do you want to see improvements in Ethash dual-mining with GGS?
I desperately need it. - 8 (15.1%)
It would be nice. - 12 (22.6%)
It's not worth it anymore. - 33 (62.3%)
Total Voters: 53

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [21] 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 ... 197 »
  Print  
Author Topic: Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480!  (Read 214337 times)
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
January 23, 2017, 04:16:51 PM
 #401

I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Nope, no luck. O GDS, where art thou?

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
Activity + Trust + Earned Merit == The Most Recognized Users on Bitcointalk
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
January 23, 2017, 04:32:37 PM
 #402

Well, the worst case, I can still continue to develop the assembly version with 7990 while trying to figure out how to access it on newer cards.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
January 23, 2017, 05:16:58 PM
 #403

I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Nope, no luck. O GDS, where art thou?

I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1.  Perhaps some quirk of the driver that GDS only works in 64-bit mode?
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
January 23, 2017, 06:02:37 PM
 #404

I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Nope, no luck. O GDS, where art thou?

I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1.  Perhaps some quirk of the driver that GDS only works in 64-bit mode?


That wasn't it either. If you come up with any ideas, please let me know.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
January 23, 2017, 06:15:48 PM
 #405

I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Nope, no luck. O GDS, where art thou?

I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1.  Perhaps some quirk of the driver that GDS only works in 64-bit mode?


That wasn't it either. If you come up with any ideas, please let me know.
https://github.com/CLRX/CLRX-mirror/wiki/GcnInstrsDs ?

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
January 23, 2017, 06:26:04 PM
Last edit: January 23, 2017, 10:15:06 PM by zawawa
 #406

I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Nope, no luck. O GDS, where art thou?

I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1.  Perhaps some quirk of the driver that GDS only works in 64-bit mode?


That wasn't it either. If you come up with any ideas, please let me know.
https://github.com/CLRX/CLRX-mirror/wiki/GcnInstrsDs ?


Yeah, I read that page like 100 times...
I had to ask the author of that page to add support for GDS to his GCN assembler, so I don't think he has the answer.
I will ask him, though.

Edit: matszpk didn't know how to enable GDS on GCN2/3/4 devices, either. Such a nice guy, though.
https://github.com/CLRX/CLRX-mirror/issues/12

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
joaocha
Full Member
***
Offline Offline

Activity: 254
Merit: 100


View Profile
January 23, 2017, 10:28:17 PM
 #407

Claymore and optmizer are having the same problem with GCN and newer drivers.
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
January 24, 2017, 05:48:09 AM
 #408

Claymore's seems to be working fine with Crimson 16.9.2, though.
You are right. Claymore's doesn't use ASM for RX 480.

Quote
GPU #0: Tahiti, 3072 MB available, 32 compute units
GPU #0 recognized as Radeon 280X/380X
GPU #1: Ellesmere, 8192 MB available, 36 compute units
GPU #1 recognized as Radeon RX 480
GPU #2: Tahiti, 3072 MB available, 32 compute units
GPU #2 recognized as Radeon 280X/380X
POOL version
GPU #0 algorithm ASM, intensity 6
GPU #1 algorithm 2, intensity 6
GPU #2 algorithm ASM, intensity 6

I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
jstefanop
Legendary
*
Offline Offline

Activity: 2090
Merit: 1395


View Profile
January 24, 2017, 06:23:16 AM
 #409

Claymore's seems to be working fine with Crimson 16.9.2, though.
You are right. Claymore's doesn't use ASM for RX 480.

Quote
GPU #0: Tahiti, 3072 MB available, 32 compute units
GPU #0 recognized as Radeon 280X/380X
GPU #1: Ellesmere, 8192 MB available, 36 compute units
GPU #1 recognized as Radeon RX 480
GPU #2: Tahiti, 3072 MB available, 32 compute units
GPU #2 recognized as Radeon 280X/380X
POOL version
GPU #0 algorithm ASM, intensity 6
GPU #1 algorithm 2, intensity 6
GPU #2 algorithm ASM, intensity 6

I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows.

Yes, optiminer is using ASM on polaris driver under linux, so it works there. His Polaris speedup does NOT work under windows, so it must be a windows restriction.

Project Apollo: A Pod Miner Designed for the Home https://bitcointalk.org/index.php?topic=4974036
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
zzzzzzzzzz
Full Member
***
Offline Offline

Activity: 150
Merit: 100


View Profile
January 24, 2017, 06:23:55 AM
 #410

Claymore's seems to be working fine with Crimson 16.9.2, though.
You are right. Claymore's doesn't use ASM for RX 480.

Quote
GPU #0: Tahiti, 3072 MB available, 32 compute units
GPU #0 recognized as Radeon 280X/380X
GPU #1: Ellesmere, 8192 MB available, 36 compute units
GPU #1 recognized as Radeon RX 480
GPU #2: Tahiti, 3072 MB available, 32 compute units
GPU #2 recognized as Radeon 280X/380X
POOL version
GPU #0 algorithm ASM, intensity 6
GPU #1 algorithm 2, intensity 6
GPU #2 algorithm ASM, intensity 6

I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows.

I know Optiminer said you had to use his linux miner to get max hash rate from RX4xx, because on Windows he can't access some of the driver features he needs. On R9 Fury and earlier, he's not having any problems.
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
January 24, 2017, 06:35:03 AM
 #411

Claymore's seems to be working fine with Crimson 16.9.2, though.
You are right. Claymore's doesn't use ASM for RX 480.

Quote
GPU #0: Tahiti, 3072 MB available, 32 compute units
GPU #0 recognized as Radeon 280X/380X
GPU #1: Ellesmere, 8192 MB available, 36 compute units
GPU #1 recognized as Radeon RX 480
GPU #2: Tahiti, 3072 MB available, 32 compute units
GPU #2 recognized as Radeon 280X/380X
POOL version
GPU #0 algorithm ASM, intensity 6
GPU #1 algorithm 2, intensity 6
GPU #2 algorithm ASM, intensity 6

I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows.

I know Optiminer said you had to use his linux miner to get max hash rate from RX4xx, because on Windows he can't access some of the driver features he needs. On R9 Fury and earlier, he's not having any problems.

Thank you so much for the clarification. That means I have to switch my Ellesmere farm to Linux, though. Oh well.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
January 24, 2017, 06:48:39 AM
 #412

Yes, optiminer is using ASM on polaris driver under linux, so it works there. His Polaris speedup does NOT work under windows, so it must be a windows restriction.

That's really good to know. Gotta love AMD for its completely arbitrary decisions...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
January 24, 2017, 06:58:07 AM
 #413

I am so tired now...
I will work on the assembly version for 7990 tomorrow.
OpenCL dummy codes for GDS optimizations are ready, so it's a matter of changing several lines.
Hopefully I will bring you guys a good news then.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
jstefanop
Legendary
*
Offline Offline

Activity: 2090
Merit: 1395


View Profile
January 24, 2017, 07:44:50 AM
 #414

I am so tired now...
I will work on the assembly version for 7990 tomorrow.
OpenCL dummy codes for GDS optimizations are ready, so it's a matter of changing several lines.
Hopefully I will bring you guys a good news then.

Hmm have you looked at this? https://github.com/olvaffe/gpu-docs/blob/master/amd-open-gpu-docs/AMD_GCN3_Instruction_Set_Architecture.pdf

Its the ASM bible speced out for GCN 1.3. Looking at the GDS spec bit 16 sets GDS, so looks like GDS bit for GCN 1.0/1.1 is 17 with 18-25 for the OPCODE, and GCN 1.2+ GDS is bit 16 with 17-24 for OPCODE.

Project Apollo: A Pod Miner Designed for the Home https://bitcointalk.org/index.php?topic=4974036
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
January 24, 2017, 12:19:20 PM
 #415

I am so tired now...
I will work on the assembly version for 7990 tomorrow.
OpenCL dummy codes for GDS optimizations are ready, so it's a matter of changing several lines.
Hopefully I will bring you guys a good news then.

Hmm have you looked at this? https://github.com/olvaffe/gpu-docs/blob/master/amd-open-gpu-docs/AMD_GCN3_Instruction_Set_Architecture.pdf

Its the ASM bible speced out for GCN 1.3. Looking at the GDS spec bit 16 sets GDS, so looks like GDS bit for GCN 1.0/1.1 is 17 with 18-25 for the OPCODE, and GCN 1.2+ GDS is bit 16 with 17-24 for OPCODE.

Oh, I downloaded that manual the day it came out, and it has been my bed-time read for quite some time now. Smiley Actually, I was one of those who demanded the manual on the AMD forum when GCN3 first came out.

I triplechecked the instruction encoding with CodeXL. Sadly, a driver problem is the most plausible explanation at this point. They must have forgotten to issue a PM4 packet to initialize GDS (ALLOC_GDS). Another AMD driver woe.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
January 24, 2017, 12:56:49 PM
 #416

So GG is running at 413 sol/s on my old stock 7990 with Crimson 16.9.2.
With the same setup, the assembly version of Claymore's 11.1 Beta yields 522 sol/s.
Let's see how much I can improve GG's performance with the GCN assembly.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
January 24, 2017, 02:54:50 PM
 #417

GDS counters are finally working!
They still need optimizations, but this is definitely a step forward.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
January 24, 2017, 04:36:29 PM
 #418

I triplechecked the instruction encoding with CodeXL. Sadly, a driver problem is the most plausible explanation at this point. They must have forgotten to issue a PM4 packet to initialize GDS (ALLOC_GDS). Another AMD driver woe.

I think you are right.  I looked at the ROC-K source and there is no PM4 packet type defined for ALLOC_GDS there either.
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/1853014d96c6af2d81d424f98d320810f40391d8/drivers/gpu/drm/amd/amdkfd/kfd_pm4_opcodes.h
And it looks like the same code that is used in the Linux kernel:
https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdkfd/kfd_pm4_opcodes.h

Since there is no easy way to use GDS in OpenCL, and probably not in DX12 either, AMD likely doesn't have a test case for their windows drivers that checks GDS initialization.
zawawa (OP)
Sr. Member
****
Offline Offline

Activity: 728
Merit: 304


Miner Developer


View Profile
January 24, 2017, 05:34:18 PM
 #419

I knew it! Could you tell engineers at AMD that we need GDS both on Windows and Linux?
This is such a waste of time and energy...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
jstefanop
Legendary
*
Offline Offline

Activity: 2090
Merit: 1395


View Profile
January 24, 2017, 06:54:37 PM
 #420

I knew it! Could you tell engineers at AMD that we need GDS both on Windows and Linux?
This is such a waste of time and energy...

Not at all...your implementation will work just fine under linux, and im pretty sure the majority of miners (at least people with more than 1 or 2 cards) are on linux anyways.

Project Apollo: A Pod Miner Designed for the Home https://bitcointalk.org/index.php?topic=4974036
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [21] 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 ... 197 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!