Bitcoin Forum
December 17, 2017, 10:54:04 AM *
News: Latest stable version of Bitcoin Core: 0.15.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [21] 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 ... 87 »
  Print  
Author Topic: Gateless Gate Sharp 1.1.5: zawawa's open-source dual ETH/XMR/PASC/LBC/FTC miner  (Read 164394 times)
zawawa
Sr. Member
****
Online Online

Activity: 420


Miner Developer


View Profile
January 23, 2017, 02:38:57 PM
 #401

I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
1513508044
Hero Member
*
Offline Offline

Posts: 1513508044

View Profile Personal Message (Offline)

Ignore
1513508044
Reply with quote  #2

1513508044
Report to moderator
1513508044
Hero Member
*
Offline Offline

Posts: 1513508044

View Profile Personal Message (Offline)

Ignore
1513508044
Reply with quote  #2

1513508044
Report to moderator
1513508044
Hero Member
*
Offline Offline

Posts: 1513508044

View Profile Personal Message (Offline)

Ignore
1513508044
Reply with quote  #2

1513508044
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1513508044
Hero Member
*
Offline Offline

Posts: 1513508044

View Profile Personal Message (Offline)

Ignore
1513508044
Reply with quote  #2

1513508044
Report to moderator
zawawa
Sr. Member
****
Online Online

Activity: 420


Miner Developer


View Profile
January 23, 2017, 04:16:51 PM
 #402

I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Nope, no luck. O GDS, where art thou?

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Online Online

Activity: 420


Miner Developer


View Profile
January 23, 2017, 04:32:37 PM
 #403

Well, the worst case, I can still continue to develop the assembly version with 7990 while trying to figure out how to access it on newer cards.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
nerdralph
Sr. Member
****
Offline Offline

Activity: 406


View Profile
January 23, 2017, 05:16:58 PM
 #404

I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Nope, no luck. O GDS, where art thou?

I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1.  Perhaps some quirk of the driver that GDS only works in 64-bit mode?
zawawa
Sr. Member
****
Online Online

Activity: 420


Miner Developer


View Profile
January 23, 2017, 06:02:37 PM
 #405

I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Nope, no luck. O GDS, where art thou?

I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1.  Perhaps some quirk of the driver that GDS only works in 64-bit mode?


That wasn't it either. If you come up with any ideas, please let me know.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
laik2
Sr. Member
****
Offline Offline

Activity: 392


View Profile
January 23, 2017, 06:15:48 PM
 #406

I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Nope, no luck. O GDS, where art thou?

I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1.  Perhaps some quirk of the driver that GDS only works in 64-bit mode?


That wasn't it either. If you come up with any ideas, please let me know.
https://github.com/CLRX/CLRX-mirror/wiki/GcnInstrsDs ?

ZEC: t1KbbHtXqzSS6qHBaPZDKyWnzxhRjr9oCtW
zawawa
Sr. Member
****
Online Online

Activity: 420


Miner Developer


View Profile
January 23, 2017, 06:26:04 PM
 #407

I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Nope, no luck. O GDS, where art thou?

I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1.  Perhaps some quirk of the driver that GDS only works in 64-bit mode?


That wasn't it either. If you come up with any ideas, please let me know.
https://github.com/CLRX/CLRX-mirror/wiki/GcnInstrsDs ?


Yeah, I read that page like 100 times...
I had to ask the author of that page to add support for GDS to his GCN assembler, so I don't think he has the answer.
I will ask him, though.

Edit: matszpk didn't know how to enable GDS on GCN2/3/4 devices, either. Such a nice guy, though.
https://github.com/CLRX/CLRX-mirror/issues/12

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
joaocha
Full Member
***
Offline Offline

Activity: 222


View Profile
January 23, 2017, 10:28:17 PM
 #408

Claymore and optmizer are having the same problem with GCN and newer drivers.
zawawa
Sr. Member
****
Online Online

Activity: 420


Miner Developer


View Profile
January 24, 2017, 05:48:09 AM
 #409

Claymore's seems to be working fine with Crimson 16.9.2, though.
You are right. Claymore's doesn't use ASM for RX 480.

Quote
GPU #0: Tahiti, 3072 MB available, 32 compute units
GPU #0 recognized as Radeon 280X/380X
GPU #1: Ellesmere, 8192 MB available, 36 compute units
GPU #1 recognized as Radeon RX 480
GPU #2: Tahiti, 3072 MB available, 32 compute units
GPU #2 recognized as Radeon 280X/380X
POOL version
GPU #0 algorithm ASM, intensity 6
GPU #1 algorithm 2, intensity 6
GPU #2 algorithm ASM, intensity 6

I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
jstefanop
Hero Member
*****
Offline Offline

Activity: 857


View Profile
January 24, 2017, 06:23:16 AM
 #410

Claymore's seems to be working fine with Crimson 16.9.2, though.
You are right. Claymore's doesn't use ASM for RX 480.

Quote
GPU #0: Tahiti, 3072 MB available, 32 compute units
GPU #0 recognized as Radeon 280X/380X
GPU #1: Ellesmere, 8192 MB available, 36 compute units
GPU #1 recognized as Radeon RX 480
GPU #2: Tahiti, 3072 MB available, 32 compute units
GPU #2 recognized as Radeon 280X/380X
POOL version
GPU #0 algorithm ASM, intensity 6
GPU #1 algorithm 2, intensity 6
GPU #2 algorithm ASM, intensity 6

I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows.

Yes, optiminer is using ASM on polaris driver under linux, so it works there. His Polaris speedup does NOT work under windows, so it must be a windows restriction.

www.AriseChickun.com 0% fee segwit signaling Litecoin Pool!
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
LTC:LX5vpxrQE4eLRLPobKwZhw2comkKFCh3p4 - BTC:14w9Lea6kdVzspJk8TQRe7qSYu9LhzJJsh
zzzzzzzzzz
Full Member
***
Offline Offline

Activity: 148


View Profile
January 24, 2017, 06:23:55 AM
 #411

Claymore's seems to be working fine with Crimson 16.9.2, though.
You are right. Claymore's doesn't use ASM for RX 480.

Quote
GPU #0: Tahiti, 3072 MB available, 32 compute units
GPU #0 recognized as Radeon 280X/380X
GPU #1: Ellesmere, 8192 MB available, 36 compute units
GPU #1 recognized as Radeon RX 480
GPU #2: Tahiti, 3072 MB available, 32 compute units
GPU #2 recognized as Radeon 280X/380X
POOL version
GPU #0 algorithm ASM, intensity 6
GPU #1 algorithm 2, intensity 6
GPU #2 algorithm ASM, intensity 6

I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows.

I know Optiminer said you had to use his linux miner to get max hash rate from RX4xx, because on Windows he can't access some of the driver features he needs. On R9 Fury and earlier, he's not having any problems.
zawawa
Sr. Member
****
Online Online

Activity: 420


Miner Developer


View Profile
January 24, 2017, 06:35:03 AM
 #412

Claymore's seems to be working fine with Crimson 16.9.2, though.
You are right. Claymore's doesn't use ASM for RX 480.

Quote
GPU #0: Tahiti, 3072 MB available, 32 compute units
GPU #0 recognized as Radeon 280X/380X
GPU #1: Ellesmere, 8192 MB available, 36 compute units
GPU #1 recognized as Radeon RX 480
GPU #2: Tahiti, 3072 MB available, 32 compute units
GPU #2 recognized as Radeon 280X/380X
POOL version
GPU #0 algorithm ASM, intensity 6
GPU #1 algorithm 2, intensity 6
GPU #2 algorithm ASM, intensity 6

I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows.

I know Optiminer said you had to use his linux miner to get max hash rate from RX4xx, because on Windows he can't access some of the driver features he needs. On R9 Fury and earlier, he's not having any problems.

Thank you so much for the clarification. That means I have to switch my Ellesmere farm to Linux, though. Oh well.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Online Online

Activity: 420


Miner Developer


View Profile
January 24, 2017, 06:48:39 AM
 #413

Yes, optiminer is using ASM on polaris driver under linux, so it works there. His Polaris speedup does NOT work under windows, so it must be a windows restriction.

That's really good to know. Gotta love AMD for its completely arbitrary decisions...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Online Online

Activity: 420


Miner Developer


View Profile
January 24, 2017, 06:58:07 AM
 #414

I am so tired now...
I will work on the assembly version for 7990 tomorrow.
OpenCL dummy codes for GDS optimizations are ready, so it's a matter of changing several lines.
Hopefully I will bring you guys a good news then.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
jstefanop
Hero Member
*****
Offline Offline

Activity: 857


View Profile
January 24, 2017, 07:44:50 AM
 #415

I am so tired now...
I will work on the assembly version for 7990 tomorrow.
OpenCL dummy codes for GDS optimizations are ready, so it's a matter of changing several lines.
Hopefully I will bring you guys a good news then.

Hmm have you looked at this? https://github.com/olvaffe/gpu-docs/blob/master/amd-open-gpu-docs/AMD_GCN3_Instruction_Set_Architecture.pdf

Its the ASM bible speced out for GCN 1.3. Looking at the GDS spec bit 16 sets GDS, so looks like GDS bit for GCN 1.0/1.1 is 17 with 18-25 for the OPCODE, and GCN 1.2+ GDS is bit 16 with 17-24 for OPCODE.

www.AriseChickun.com 0% fee segwit signaling Litecoin Pool!
FutureBit Moonlander 2 USB Scrypt Stick Miner: https://bitcointalk.org/index.php?topic=2125643.0
LTC:LX5vpxrQE4eLRLPobKwZhw2comkKFCh3p4 - BTC:14w9Lea6kdVzspJk8TQRe7qSYu9LhzJJsh
zawawa
Sr. Member
****
Online Online

Activity: 420


Miner Developer


View Profile
January 24, 2017, 12:19:20 PM
 #416

I am so tired now...
I will work on the assembly version for 7990 tomorrow.
OpenCL dummy codes for GDS optimizations are ready, so it's a matter of changing several lines.
Hopefully I will bring you guys a good news then.

Hmm have you looked at this? https://github.com/olvaffe/gpu-docs/blob/master/amd-open-gpu-docs/AMD_GCN3_Instruction_Set_Architecture.pdf

Its the ASM bible speced out for GCN 1.3. Looking at the GDS spec bit 16 sets GDS, so looks like GDS bit for GCN 1.0/1.1 is 17 with 18-25 for the OPCODE, and GCN 1.2+ GDS is bit 16 with 17-24 for OPCODE.

Oh, I downloaded that manual the day it came out, and it has been my bed-time read for quite some time now. Smiley Actually, I was one of those who demanded the manual on the AMD forum when GCN3 first came out.

I triplechecked the instruction encoding with CodeXL. Sadly, a driver problem is the most plausible explanation at this point. They must have forgotten to issue a PM4 packet to initialize GDS (ALLOC_GDS). Another AMD driver woe.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Online Online

Activity: 420


Miner Developer


View Profile
January 24, 2017, 12:56:49 PM
 #417

So GG is running at 413 sol/s on my old stock 7990 with Crimson 16.9.2.
With the same setup, the assembly version of Claymore's 11.1 Beta yields 522 sol/s.
Let's see how much I can improve GG's performance with the GCN assembly.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
zawawa
Sr. Member
****
Online Online

Activity: 420


Miner Developer


View Profile
January 24, 2017, 02:54:50 PM
 #418

GDS counters are finally working!
They still need optimizations, but this is definitely a step forward.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
nerdralph
Sr. Member
****
Offline Offline

Activity: 406


View Profile
January 24, 2017, 04:36:29 PM
 #419

I triplechecked the instruction encoding with CodeXL. Sadly, a driver problem is the most plausible explanation at this point. They must have forgotten to issue a PM4 packet to initialize GDS (ALLOC_GDS). Another AMD driver woe.

I think you are right.  I looked at the ROC-K source and there is no PM4 packet type defined for ALLOC_GDS there either.
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/1853014d96c6af2d81d424f98d320810f40391d8/drivers/gpu/drm/amd/amdkfd/kfd_pm4_opcodes.h
And it looks like the same code that is used in the Linux kernel:
https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdkfd/kfd_pm4_opcodes.h

Since there is no easy way to use GDS in OpenCL, and probably not in DX12 either, AMD likely doesn't have a test case for their windows drivers that checks GDS initialization.
zawawa
Sr. Member
****
Online Online

Activity: 420


Miner Developer


View Profile
January 24, 2017, 05:34:18 PM
 #420

I knew it! Could you tell engineers at AMD that we need GDS both on Windows and Linux?
This is such a waste of time and energy...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [21] 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 ... 87 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!