m1n1ngP4d4w4n
Full Member
Offline
Activity: 224
Merit: 100
CryptoLearner
|
|
January 05, 2017, 12:11:32 PM |
|
xD, well to be honest there is no pros at keeping old drivers when there is numerous proof newer ones are working as good or even better, it's not THAT hard to update even if you got alot of rig and know a little about scripting/dev (if you got this much rig you must have some knowledge to have proper monitoring at least) i prefer the dev to be able to focus on one driver (that's what claymore for example does if i recall) otherwise you spend all your time for compatibility instead of improving performance.
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 05, 2017, 12:12:14 PM |
|
It turned out that the "legacy" AMD drivers require a totally different set of optimizations. This must be the reason why GG was running rather slow on older (GCN1/2) cards. I suppose optimizations for legacy drivers are worth the effort after all...
or people could update Well, the thing is, with older cards, even the latest drivers always switch back to the "legacy" mode. Weird, weird...
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
m1n1ngP4d4w4n
Full Member
Offline
Activity: 224
Merit: 100
CryptoLearner
|
|
January 05, 2017, 12:14:21 PM |
|
It turned out that the "legacy" AMD drivers require a totally different set of optimizations. This must be the reason why GG was running rather slow on older (GCN1/2) cards. I suppose optimizations for legacy drivers are worth the effort after all...
or people could update Well, the thing is, with older cards, even the latest drivers always switch back to the "legacy" mode. Weird, weird... Ah i see, so it's not the drivers that is updated but just the package then... lazy amd lol, they don't make newest drivers compatible with old hardware, they just pack different drivers version into one package, no wonder they're so big, nvidia prob does the same, when you see that the package is 350+ MB
|
|
|
|
nerdralph
|
|
January 05, 2017, 02:34:19 PM Last edit: January 05, 2017, 02:51:00 PM by nerdralph |
|
It looks like AMDIL is a dead-end anyway. http://lists.llvm.org/pipermail/llvm-dev/2015-May/085684.htmlHSAIL will probably short-lived since most of the work is now focused on the llvm amdgpu back-end. It even supports inline asm, but I'm not sure if it will generate a kernel binary that conforms to AMD's CL2.0 ABI. With clang/llvm-3.9, I've only got as far as getting it to output gcn assembler from the OpenCL + inline asm code. Like Wolf said, CLRX is the way to go if you haven't looked into it. I used it in my previous project with a great success. I am trying to figure out how to enable GDS on Ellesmere, which turned out to be rather tricky. It seems that there is no way to enable GDS with the CL2.0 ABI and you have to resort back to CL1.2 ABI with the "-legacy" build option. This totally sucks as I need to redo optimizations all over again. I have no idea as to what engineers at AMD had in mind when they decided to make this design change. I'm not so sure. While Mateusz has done some great work with CLRX, he hasn't fully reverse-engineered how the AMD drivers load kernel binaries. For example he thought the amdcl2 binaries were 64-bit only until I sent him a Tonga CL2.0 32-bit kernel binary. The fglrx linux ocl driver contains the string "gds_segment_byte_size", which the CLRX docs only lists in the ROCm ABI. Despite being "closed source", much of the llvm code contained in the drivers is open source. I think it may be possible to compile a mixed OpenCL/asm kernel offline using llvm that can be loaded (CreateProgramWithBinary) by the 16.x Crimson drivers on Windoze, the Linux fglrx drivers, and the AMDGPU-PRO drivers. p.s. According to a source at AMD, "The ROCm ABI is the same as the HSAIL ABI and will be common". With fglrx, building with CL2.0 and --save-temps generates hsail along with asm, meaning it supports the HSAIL ABI. Ergo the ROCm ABI should be supported.
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 05, 2017, 03:18:32 PM |
|
the string "gds_segment_byte_size" This is exactly what I needed! Thank you! I will stick with CLRX for now as I am already used to it. I will add the setting for GDS to CLRX and see if that would work. In the meantime, please let me know if you manage to get the ROCm stuff working.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
nerdralph
|
|
January 05, 2017, 03:38:23 PM |
|
the string "gds_segment_byte_size" This is exactly what I needed! Thank you! I will stick with CLRX for now as I am already used to it. I will add the setting for GDS to CLRX and see if that would work. In the meantime, please let me know if you manage to get the ROCm stuff working. It'll be a while. I'm not a "just make it work" kind of guy. I need to understand how things work at a low level first.
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 05, 2017, 08:51:52 PM |
|
the string "gds_segment_byte_size" This is exactly what I needed! Thank you! I will stick with CLRX for now as I am already used to it. I will add the setting for GDS to CLRX and see if that would work. In the meantime, please let me know if you manage to get the ROCm stuff working. I just noticed this string is not present in the output of CodeXL, which means that the AMD OpenCL 2.0 ABI is not capable of handling GDS. Oh well. I was able to achieve a reasonable performance in the legacy mode, so this should be a good foundation for the upcoming GCN assembly version. If I can get GDS and GWS working, the new version should be able to surpass Claymore's performance. We will see.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
QuintLeo
Legendary
Offline
Activity: 1498
Merit: 1030
|
|
January 05, 2017, 11:06:31 PM |
|
Ah i see, so it's not the drivers that is updated but just the package then... lazy amd lol, they don't make newest drivers compatible with old hardware, they just pack different drivers version into one package, no wonder they're so big, nvidia prob does the same, when you see that the package is 350+ MB
More like 450 for the Relive bloatware - as opposed to 250ish for the most Crimson versions. Seems like AMD has caught Mickey$loth Bloatware disease BAD the last few months.
|
I'm no longer legendary just in my own mind! Like something I said? Donations gratefully accepted. LYLnTKvLefz9izJFUvEGQEZzSkz34b3N6U (Litecoin) 1GYbjMTPdCuV7dci3iCUiaRrcNuaiQrVYY (Bitcoin)
|
|
|
m1n1ngP4d4w4n
Full Member
Offline
Activity: 224
Merit: 100
CryptoLearner
|
|
January 05, 2017, 11:40:01 PM |
|
Ah i see, so it's not the drivers that is updated but just the package then... lazy amd lol, they don't make newest drivers compatible with old hardware, they just pack different drivers version into one package, no wonder they're so big, nvidia prob does the same, when you see that the package is 350+ MB
More like 450 for the Relive bloatware - as opposed to 250ish for the most Crimson versions. Seems like AMD has caught Mickey$loth Bloatware disease BAD the last few months. ahah isn't that true... damn what a waste of space and bandwith , same with nvidia, they should pack the VR / 3D stuff apart from the rest, there isn't many people that actually use this...
|
|
|
|
nerdralph
|
|
January 06, 2017, 12:16:03 AM |
|
the string "gds_segment_byte_size" This is exactly what I needed! Thank you! I will stick with CLRX for now as I am already used to it. I will add the setting for GDS to CLRX and see if that would work. In the meantime, please let me know if you manage to get the ROCm stuff working. I just noticed this string is not present in the output of CodeXL, which means that the AMD OpenCL 2.0 ABI is not capable of handling GDS. Oh well. I was able to achieve a reasonable performance in the legacy mode, so this should be a good foundation for the upcoming GCN assembly version. If I can get GDS and GWS working, the new version should be able to surpass Claymore's performance. We will see. A CL2.0 kernel won't touch the GDS unless you use pipes or atomic counters (counter32_t).
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 06, 2017, 04:16:28 AM |
|
the string "gds_segment_byte_size" This is exactly what I needed! Thank you! I will stick with CLRX for now as I am already used to it. I will add the setting for GDS to CLRX and see if that would work. In the meantime, please let me know if you manage to get the ROCm stuff working. I just noticed this string is not present in the output of CodeXL, which means that the AMD OpenCL 2.0 ABI is not capable of handling GDS. Oh well. I was able to achieve a reasonable performance in the legacy mode, so this should be a good foundation for the upcoming GCN assembly version. If I can get GDS and GWS working, the new version should be able to surpass Claymore's performance. We will see. A CL2.0 kernel won't touch the GDS unless you use pipes or atomic counters (counter32_t). I don't know about pipes, but counter32_t is officially deprecated already, and I confirmed that counter32_t is not available with OpenCL 2.0 binaries. I suppose I am really lucky that I was able to touch GDS with the "-legacy" build option as even assembler programmers struggled to find a way to access GDS on newer devices: GDS memory revisited, moving from Tahiti to Hawaii https://community.amd.com/thread/170216
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 06, 2017, 07:21:40 AM |
|
the string "gds_segment_byte_size" This is exactly what I needed! Thank you! I will stick with CLRX for now as I am already used to it. I will add the setting for GDS to CLRX and see if that would work. In the meantime, please let me know if you manage to get the ROCm stuff working. I just noticed this string is not present in the output of CodeXL, which means that the AMD OpenCL 2.0 ABI is not capable of handling GDS. Oh well. I was able to achieve a reasonable performance in the legacy mode, so this should be a good foundation for the upcoming GCN assembly version. If I can get GDS and GWS working, the new version should be able to surpass Claymore's performance. We will see. A CL2.0 kernel won't touch the GDS unless you use pipes or atomic counters (counter32_t). I also tried using pipes to see if they result in GDS access, but CodeXL did not show any GDS instructions. What a bummer...
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 06, 2017, 08:21:29 AM |
|
I was finally able to assemble a GCN ISA code with the OpenCL 1.2 ABI for Ellesmere by modifying CLRX. This means that I have the ability to access GDS on RX 480 without restrictions now. I will be out of town for a week, but we should see pretty interesting things after I come back home.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 07, 2017, 01:32:25 PM |
|
I appreciate what you are doing and look forward to switching my farm to your miner when it is a bit faster. A moderate difference in hashrate is to costly with a bunch of miners running but I will accept a small loss in hashrate just to stop using the closed source stuff.
There are tons of optimizations that can be done with the GCN assembly, so that should happen sooner than later. Now I am thinking about taking a completely different strategy. My current game plan is (1) to reduce the number of kernels by using global synchronizations across work-items and (2) to move pathetically slow row counters in global memory to the fast 64KB GDS. (2) is GCN-specific, but (1) should also be possible with NVIDIA. Too bad I cannot try these ideas until next Wednesday...
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 07, 2017, 10:51:39 PM |
|
Since I am away from home until Wednesday and do not have access to dedicated graphics cards, I just decided to try potential replacements for gatelessgate.py as I feel more comfortable with C++ than Python and I think the Python component of SA v5 is rather lacking as far as functionality is concerned. I am planning to evaluate sgminer-gm and nheqminer. I hope they should bring GG on par with Claymore's in terms of usability.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
laik2
|
|
January 07, 2017, 11:39:00 PM |
|
Since I am away from home until Wednesday and do not have access to dedicated graphics cards, I just decided to try potential replacements for gatelessgate.py as I feel more comfortable with C++ than Python and I think the Python component of SA v5 is rather lacking as far as functionality is concerned. I am planning to evaluate sgminer-gm and nheqminer. I hope they should bring GG on par with Claymore's in terms of usability.
If you feel the need to test graphics I have windows 10+NV 1070 on it, linux 14.04 r9 390, linux 16.04 rx480
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 07, 2017, 11:43:33 PM |
|
Since I am away from home until Wednesday and do not have access to dedicated graphics cards, I just decided to try potential replacements for gatelessgate.py as I feel more comfortable with C++ than Python and I think the Python component of SA v5 is rather lacking as far as functionality is concerned. I am planning to evaluate sgminer-gm and nheqminer. I hope they should bring GG on par with Claymore's in terms of usability.
If you feel the need to test graphics I have windows 10+NV 1070 on it, linux 14.04 r9 390, linux 16.04 rx480 That would be great! Could you set up the Linux box with RX 480?
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
laik2
|
|
January 07, 2017, 11:51:38 PM |
|
Since I am away from home until Wednesday and do not have access to dedicated graphics cards, I just decided to try potential replacements for gatelessgate.py as I feel more comfortable with C++ than Python and I think the Python component of SA v5 is rather lacking as far as functionality is concerned. I am planning to evaluate sgminer-gm and nheqminer. I hope they should bring GG on par with Claymore's in terms of usability.
If you feel the need to test graphics I have windows 10+NV 1070 on it, linux 14.04 r9 390, linux 16.04 rx480 That would be great! Could you set up the Linux box with RX 480? I will P.M with details.
|
|
|
|
Subw
|
|
January 08, 2017, 11:41:27 AM |
|
I am planning to evaluate sgminer-gm
moving to sgminer would be awesome as it has great pool management support and popular API for monitoring and control
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 08, 2017, 12:47:42 PM |
|
Well guess I did something wrong, with the latest amdgpu-pro drivers I built with make, then ran gatelessgate.py, getting 10/sec on each 480, anyone know where I messed up? Thanks! There was a compatibility issue, but I already fixed it and pushed a patch to the repository.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
|