zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 23, 2017, 04:16:51 PM |
|
I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.
I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now. Nope, no luck. O GDS, where art thou?
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 23, 2017, 04:32:37 PM |
|
Well, the worst case, I can still continue to develop the assembly version with 7990 while trying to figure out how to access it on newer cards.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
nerdralph
|
|
January 23, 2017, 05:16:58 PM |
|
I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.
I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now. Nope, no luck. O GDS, where art thou? I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1. Perhaps some quirk of the driver that GDS only works in 64-bit mode?
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 23, 2017, 06:02:37 PM |
|
I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.
I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now. Nope, no luck. O GDS, where art thou? I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1. Perhaps some quirk of the driver that GDS only works in 64-bit mode? That wasn't it either. If you come up with any ideas, please let me know.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
laik2
|
|
January 23, 2017, 06:15:48 PM |
|
I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.
I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now. Nope, no luck. O GDS, where art thou? I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1. Perhaps some quirk of the driver that GDS only works in 64-bit mode? That wasn't it either. If you come up with any ideas, please let me know. https://github.com/CLRX/CLRX-mirror/wiki/GcnInstrsDs ?
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 23, 2017, 06:26:04 PM Last edit: January 23, 2017, 10:15:06 PM by zawawa |
|
I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.
I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now. Nope, no luck. O GDS, where art thou? I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1. Perhaps some quirk of the driver that GDS only works in 64-bit mode? That wasn't it either. If you come up with any ideas, please let me know. https://github.com/CLRX/CLRX-mirror/wiki/GcnInstrsDs ? Yeah, I read that page like 100 times... I had to ask the author of that page to add support for GDS to his GCN assembler, so I don't think he has the answer. I will ask him, though. Edit: matszpk didn't know how to enable GDS on GCN2/3/4 devices, either. Such a nice guy, though. https://github.com/CLRX/CLRX-mirror/issues/12
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
joaocha
|
|
January 23, 2017, 10:28:17 PM |
|
Claymore and optmizer are having the same problem with GCN and newer drivers.
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 24, 2017, 05:48:09 AM |
|
Claymore's seems to be working fine with Crimson 16.9.2, though.You are right. Claymore's doesn't use ASM for RX 480. GPU #0: Tahiti, 3072 MB available, 32 compute units GPU #0 recognized as Radeon 280X/380X GPU #1: Ellesmere, 8192 MB available, 36 compute units GPU #1 recognized as Radeon RX 480 GPU #2: Tahiti, 3072 MB available, 32 compute units GPU #2 recognized as Radeon 280X/380X POOL version GPU #0 algorithm ASM, intensity 6 GPU #1 algorithm 2, intensity 6 GPU #2 algorithm ASM, intensity 6
I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
jstefanop
Legendary
Offline
Activity: 2174
Merit: 1401
|
|
January 24, 2017, 06:23:16 AM |
|
Claymore's seems to be working fine with Crimson 16.9.2, though.You are right. Claymore's doesn't use ASM for RX 480. GPU #0: Tahiti, 3072 MB available, 32 compute units GPU #0 recognized as Radeon 280X/380X GPU #1: Ellesmere, 8192 MB available, 36 compute units GPU #1 recognized as Radeon RX 480 GPU #2: Tahiti, 3072 MB available, 32 compute units GPU #2 recognized as Radeon 280X/380X POOL version GPU #0 algorithm ASM, intensity 6 GPU #1 algorithm 2, intensity 6 GPU #2 algorithm ASM, intensity 6
I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows. Yes, optiminer is using ASM on polaris driver under linux, so it works there. His Polaris speedup does NOT work under windows, so it must be a windows restriction.
|
|
|
|
zzzzzzzzzz
|
|
January 24, 2017, 06:23:55 AM |
|
Claymore's seems to be working fine with Crimson 16.9.2, though.You are right. Claymore's doesn't use ASM for RX 480. GPU #0: Tahiti, 3072 MB available, 32 compute units GPU #0 recognized as Radeon 280X/380X GPU #1: Ellesmere, 8192 MB available, 36 compute units GPU #1 recognized as Radeon RX 480 GPU #2: Tahiti, 3072 MB available, 32 compute units GPU #2 recognized as Radeon 280X/380X POOL version GPU #0 algorithm ASM, intensity 6 GPU #1 algorithm 2, intensity 6 GPU #2 algorithm ASM, intensity 6
I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows. I know Optiminer said you had to use his linux miner to get max hash rate from RX4xx, because on Windows he can't access some of the driver features he needs. On R9 Fury and earlier, he's not having any problems.
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 24, 2017, 06:35:03 AM |
|
Claymore's seems to be working fine with Crimson 16.9.2, though.You are right. Claymore's doesn't use ASM for RX 480. GPU #0: Tahiti, 3072 MB available, 32 compute units GPU #0 recognized as Radeon 280X/380X GPU #1: Ellesmere, 8192 MB available, 36 compute units GPU #1 recognized as Radeon RX 480 GPU #2: Tahiti, 3072 MB available, 32 compute units GPU #2 recognized as Radeon 280X/380X POOL version GPU #0 algorithm ASM, intensity 6 GPU #1 algorithm 2, intensity 6 GPU #2 algorithm ASM, intensity 6
I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows. I know Optiminer said you had to use his linux miner to get max hash rate from RX4xx, because on Windows he can't access some of the driver features he needs. On R9 Fury and earlier, he's not having any problems. Thank you so much for the clarification. That means I have to switch my Ellesmere farm to Linux, though. Oh well.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 24, 2017, 06:48:39 AM |
|
Yes, optiminer is using ASM on polaris driver under linux, so it works there. His Polaris speedup does NOT work under windows, so it must be a windows restriction.
That's really good to know. Gotta love AMD for its completely arbitrary decisions...
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 24, 2017, 06:58:07 AM |
|
I am so tired now... I will work on the assembly version for 7990 tomorrow. OpenCL dummy codes for GDS optimizations are ready, so it's a matter of changing several lines. Hopefully I will bring you guys a good news then.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
jstefanop
Legendary
Offline
Activity: 2174
Merit: 1401
|
|
January 24, 2017, 07:44:50 AM |
|
I am so tired now... I will work on the assembly version for 7990 tomorrow. OpenCL dummy codes for GDS optimizations are ready, so it's a matter of changing several lines. Hopefully I will bring you guys a good news then.
Hmm have you looked at this? https://github.com/olvaffe/gpu-docs/blob/master/amd-open-gpu-docs/AMD_GCN3_Instruction_Set_Architecture.pdfIts the ASM bible speced out for GCN 1.3. Looking at the GDS spec bit 16 sets GDS, so looks like GDS bit for GCN 1.0/1.1 is 17 with 18-25 for the OPCODE, and GCN 1.2+ GDS is bit 16 with 17-24 for OPCODE.
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 24, 2017, 12:19:20 PM |
|
I am so tired now... I will work on the assembly version for 7990 tomorrow. OpenCL dummy codes for GDS optimizations are ready, so it's a matter of changing several lines. Hopefully I will bring you guys a good news then.
Hmm have you looked at this? https://github.com/olvaffe/gpu-docs/blob/master/amd-open-gpu-docs/AMD_GCN3_Instruction_Set_Architecture.pdfIts the ASM bible speced out for GCN 1.3. Looking at the GDS spec bit 16 sets GDS, so looks like GDS bit for GCN 1.0/1.1 is 17 with 18-25 for the OPCODE, and GCN 1.2+ GDS is bit 16 with 17-24 for OPCODE. Oh, I downloaded that manual the day it came out, and it has been my bed-time read for quite some time now. Actually, I was one of those who demanded the manual on the AMD forum when GCN3 first came out. I triplechecked the instruction encoding with CodeXL. Sadly, a driver problem is the most plausible explanation at this point. They must have forgotten to issue a PM4 packet to initialize GDS (ALLOC_GDS). Another AMD driver woe.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 24, 2017, 12:56:49 PM |
|
So GG is running at 413 sol/s on my old stock 7990 with Crimson 16.9.2. With the same setup, the assembly version of Claymore's 11.1 Beta yields 522 sol/s. Let's see how much I can improve GG's performance with the GCN assembly.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 24, 2017, 02:54:50 PM |
|
GDS counters are finally working! They still need optimizations, but this is definitely a step forward.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 24, 2017, 05:34:18 PM |
|
I knew it! Could you tell engineers at AMD that we need GDS both on Windows and Linux? This is such a waste of time and energy...
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
jstefanop
Legendary
Offline
Activity: 2174
Merit: 1401
|
|
January 24, 2017, 06:54:37 PM |
|
I knew it! Could you tell engineers at AMD that we need GDS both on Windows and Linux? This is such a waste of time and energy...
Not at all...your implementation will work just fine under linux, and im pretty sure the majority of miners (at least people with more than 1 or 2 cards) are on linux anyways.
|
|
|
|
|