|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 21, 2017, 09:55:20 PM |
|
Ah, the joy... I'm pretty much over the hump. I didn't need the OpenCL 1.2 ABI or HSAIL after all. This most likely means I should be able to catch up with Optiminer. Good stuff, good stuff.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
nerdralph
|
|
January 22, 2017, 01:36:58 AM |
|
I didn't need the OpenCL 1.2 ABI or HSAIL after all. This most likely means I should be able to catch up with Optiminer. Good stuff, good stuff.
You wouldn't have had much luck with HSAIL anyway; I'm pretty sure I already mentioned there's no GDS instructions in HSAIL.
|
|
|
|
zzzzzzzzzz
|
|
January 22, 2017, 02:03:00 AM |
|
@zawawa Just in case, I'll ask: Are you working on GPUs other than RX4xx? I ask because that's the only GPU that anyone has even mentioned in this thread. How about R9 Fury/Nano, for instance? 290x? Etc..? In any case, thank you for all the effort you've given to this!
|
|
|
|
manotroll
|
|
January 22, 2017, 02:06:39 AM |
|
@zawawa Just in case, I'll ask: Are you working on GPUs other than RX4xx? I ask because that's the only GPU that anyone has even mentioned in this thread. How about R9 Fury/Nano, for instance? 290x? Etc..? In any case, thank you for all the effort you've given to this!
390x use for eth
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 22, 2017, 02:24:07 AM |
|
@zawawa Just in case, I'll ask: Are you working on GPUs other than RX4xx? I ask because that's the only GPU that anyone has even mentioned in this thread. How about R9 Fury/Nano, for instance? 290x? Etc..? In any case, thank you for all the effort you've given to this!
I am currently focusing on RX 480, but I am planning to work on other cards once I'm done with it.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 22, 2017, 03:55:32 AM |
|
The miner is running stably with 2 threads with a 32KB GDS segment each. Very cool...
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 22, 2017, 04:14:43 AM |
|
I added a new pseudo-op for Global Data Share (GDS) to CLRadeonExtender: https://github.com/CLRX/CLRX-mirror/pull/11It will be so much fun if we can freely exploit this killer feature at last...
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 22, 2017, 04:19:20 AM |
|
I didn't need the OpenCL 1.2 ABI or HSAIL after all. This most likely means I should be able to catch up with Optiminer. Good stuff, good stuff.
You wouldn't have had much luck with HSAIL anyway; I'm pretty sure I already mentioned there's no GDS instructions in HSAIL. Really? I don't recall that... The ROCm ABI does expose GDS, though. I will doublecheck.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
cryptominer420
|
|
January 22, 2017, 05:50:38 PM |
|
Sounds interesting, I'm anxious to see what you find out.
|
╖╖ ╓╖╖ ╖╖╖ ,╖╖─ ║▒▒ ╢▒╜,@╢▒▒▒║ ╓╣╢╝║║*║▒╢ ╢▒╣ ]▒▒,╢▒╢`]▒▒░╢▒▒╖ ╢▒ ╥╢▒▒▒╢ @║╝╢▒╜ ▒▒Ñ╝╝╢▒▒]▒▒` ]▒▒`╙╢╢║║╖┌▒▒╣▒╢▒▒ ╢▒╝▒▒▒ ╢▒╜║▒╢▒▒╢▒░║▒╜ ╥╥─╙╢╢╢║N ║▒╢ ▒▒╜ ║▒▒╢▒▒╣╓╢@@╢╢╜║▒║ ╢▒╜ ║▒▒ ╙▒▒,║▒▒░▒╣ ║▒▒║ ╢▒▒╢▒▒▒»@╢@@╢╜
|
. | | |
█ █ █ █ █ █ █ █ █ █ █ █ | | | | | |
█ █ █ █ █ █ █ █ █ █ █ █ |
|
|
|
nerdralph
|
|
January 22, 2017, 06:39:32 PM |
|
Nice. With this change there should be no more need to explicitly initialize M0 (except maybe for GCN1 devices since they only have OpenCL1.2 driver support).
|
|
|
|
nerdralph
|
|
January 22, 2017, 06:48:27 PM |
|
I didn't need the OpenCL 1.2 ABI or HSAIL after all. This most likely means I should be able to catch up with Optiminer. Good stuff, good stuff.
You wouldn't have had much luck with HSAIL anyway; I'm pretty sure I already mentioned there's no GDS instructions in HSAIL. Really? I don't recall that... The ROCm ABI does expose GDS, though. I will doublecheck. I confirmed it with one of the AMD devs working on llvm. He said there was plans for a GCN extension that never got implemented in the HSAIL llvm backend since they are now focused on the AMDGPU backend. ROCm also now supports OpenCL kernels. https://www.khronos.org/news/permalink/rocm-1.4-has-support-for-opencl-1.2-host-code-and-2.0-kernelsThe possibility of using inline asm for GDS access with the rest of the kernel in straight OpenCL looks promising to me...
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 23, 2017, 03:00:12 AM |
|
The possibility of using inline asm for GDS access with the rest of the kernel in straight OpenCL looks promising to me...
That would be really nice, but I need a solution that works right now. I had to go through another hoop and turn on the "enable_ordered_append_gds" bit, but I finally located where the GDS base is stored. I am getting really close!
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 23, 2017, 05:33:28 AM |
|
Do I need to initialize GDS before actually using it? These instructions are documented nowhere. DS_CONSUME DS_APPEND DS_ORDERED_COUNT
nerdralph, do you have any ideas?
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 23, 2017, 08:54:04 AM |
|
Hmm... It seems that GDS is not activated for some reasons. What to do, what to do...
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
chronek
Sr. Member
Offline
Activity: 273
Merit: 250
BD People Are Legend
|
|
January 23, 2017, 12:13:14 PM |
|
i heard that rx480 have opencl 2.0, would be any benefits when using abi 2.0?
|
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 23, 2017, 02:07:57 PM |
|
There you go! 2.9 Misc/Data Transfer Packets 2.9.1 ALLOC_GDS The packet will allocate a new segment within its corresponding GDS partition. The corresponding partition is determined from the Ring to which the packet is submitted. The microcode will first wait until the active partition count equals zero before continuing. This guarantees that the entire contents of the previous allocated segment have been dumped to memory before allocating the new segment within the current partition. It will also check if the segment size is less than partition size and interrupt if the current segment does not fit into its specified partition
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
nerdralph
|
|
January 23, 2017, 02:11:57 PM Last edit: January 23, 2017, 02:31:36 PM by nerdralph |
|
Do I need to initialize GDS before actually using it? These instructions are documented nowhere. DS_CONSUME DS_APPEND DS_ORDERED_COUNT
nerdralph, do you have any ideas? I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration. If you look in the GCN ISA docs, it says M0 has 16 bits for offset and 16 bits for size. M0 is also used for LDS, so when you use both in your code you'll need to save it to another register. I hadn't looked at the DS_ instructions you refer to, and a quick look at the ISA confirms your observation about them having no documentation. The llvm source would at least have the instruction encoding. I'm not sure why you want to use those instructions though. For the global row counters I'd use ds_add_u32 with the GDS bit set. p.s. the M0 description is in s. 3.7 of the GCN ISA docs.
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 23, 2017, 02:38:57 PM |
|
I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.
I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
|