nerdralph
|
|
March 20, 2017, 08:22:53 PM |
|
Fucking little-endian byte order had me confused for over an hour, but now I`m almost done:
SEQ_PMG_TIMING tCKSRE:2 tCKSRX:2 tCKE_PULSE:8 tCKE:24 SEQ_IDLE:7 tCKE_PULSE_MSB:1 SEQ_IDLE_SS:0 SEQ_RAS_TIMING tRCDW:19 tRCDWA:19 tRCDR:27 tRCDRA:27 tRRD:8 tRC:83 SEQ_CAS_TIMING tNOPW:0 tNOPR:0 tR2W:24 tCCDL:2 tR2R:5 tW2R:21 tCL:19 SEQ_MISC_TIMING tRP_WRA:62 tRP_RDA:79 tRP:13 tRFC:197 SEQ_MISC_TIMING2 PA2RDATA:2 PA2WDATA:0 FAW:0 tREDC:0 tWEDC:17 t32AW:1 tWDATATR:2 ARB_DRAM_TIMING2 RAS2RAS:197 RP:48 WRPLUSRP:63 BUS_TURN:23
I`m ignoring some of the mask values since I think all 32 bits are being written to the memory controller registers.
|
|
|
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
|
|
|
nerdralph
|
|
March 20, 2017, 10:27:20 PM Last edit: March 20, 2017, 10:58:57 PM by nerdralph |
|
Here's the first working beta of my strap decoding software is ready. Brief instructions: 1. Read the python and C source. 2. If you can't understand, goto step 1. https://github.com/nerdralph/strapreadp.s. as others have mentioned, it seems the strap layout for Polaris is different than the previous generation cards. So trying to decode the straps from a Tonga BIOS gives wrong values.
|
|
|
|
azgal0r
Newbie
Offline
Activity: 25
Merit: 0
|
|
March 20, 2017, 11:49:16 PM |
|
You are a beast man, now I need to buy a dev rig...
|
|
|
|
nerdralph
|
|
March 21, 2017, 12:17:01 AM |
|
I see at least a couple people have written strap decoding programs, but I can't find publicly released. I was going to write one and release it publicly, but I figured if someone else has already written one...
So it's not as simple as using atombios.h to dump the fields in ATOM_MEMORY_TIMING_FORMAT_V2. https://raw.githubusercontent.com/torvalds/linux/master/drivers/gpu/drm/radeon/atombios.hStraps for GCN cards are 52 bytes long (3 bytes for memory clock, 1 byte for memory type, 48 bytes for strap), but sizeof(ATOM_MEMORY_TIMING_FORMAT_V2) = 40 bytes. So is it just a matter of old-fashioned reverse engineering? i.e. looking at different straps and reading through GDDR5 data sheets to figure out the strap offsets for different values? Hah, you don't know the format and you're going to make a public tool? Your threats are like skate park swimming pools - empty Looks like you haven't read the rest of the thread. It took less than an hour to figure it out from the Linux drm code.
|
|
|
|
kilo17 (OP)
Legendary
Offline
Activity: 980
Merit: 1001
aka "whocares"
|
|
March 21, 2017, 01:13:42 AM |
|
I see at least a couple people have written strap decoding programs, but I can't find publicly released. I was going to write one and release it publicly, but I figured if someone else has already written one...
So it's not as simple as using atombios.h to dump the fields in ATOM_MEMORY_TIMING_FORMAT_V2. https://raw.githubusercontent.com/torvalds/linux/master/drivers/gpu/drm/radeon/atombios.hStraps for GCN cards are 52 bytes long (3 bytes for memory clock, 1 byte for memory type, 48 bytes for strap), but sizeof(ATOM_MEMORY_TIMING_FORMAT_V2) = 40 bytes. So is it just a matter of old-fashioned reverse engineering? i.e. looking at different straps and reading through GDDR5 data sheets to figure out the strap offsets for different values? Hah, you don't know the format and you're going to make a public tool? Your threats are like skate park swimming pools - empty Looks like you haven't read the rest of the thread. It took less than an hour to figure it out from the Linux drm code. Not quite - they tell you part of the story - but look at MISC1, for example :3 lol - ssshhh... "fools rush in where angels fear to tread"
|
Bitcoin Will Only Succeed If The Community That Supports It Gets Support - Support Home Miners & Mining
|
|
|
OhGodAGirl
Full Member
Offline
Activity: 199
Merit: 108
Look, I'm really not that interesting. Promise.
|
|
March 21, 2017, 01:18:55 AM |
|
I see at least a couple people have written strap decoding programs, but I can't find publicly released. I was going to write one and release it publicly, but I figured if someone else has already written one...
So it's not as simple as using atombios.h to dump the fields in ATOM_MEMORY_TIMING_FORMAT_V2. https://raw.githubusercontent.com/torvalds/linux/master/drivers/gpu/drm/radeon/atombios.hStraps for GCN cards are 52 bytes long (3 bytes for memory clock, 1 byte for memory type, 48 bytes for strap), but sizeof(ATOM_MEMORY_TIMING_FORMAT_V2) = 40 bytes. So is it just a matter of old-fashioned reverse engineering? i.e. looking at different straps and reading through GDDR5 data sheets to figure out the strap offsets for different values? Hah, you don't know the format and you're going to make a public tool? Your threats are like skate park swimming pools - empty Looks like you haven't read the rest of the thread. It took less than an hour to figure it out from the Linux drm code. Not quite - they tell you part of the story - but look at MISC1, for example :3 Can you stop being a dick? You should be kind and considerate and thankful that people are working hard to document and unlock how all this works - knowledge should be shared, and distributed freely. This kind of optimization isn't just valuable to mining; it's valuable to a lot of operations (including scientific research, which requires a lot of compute power, and benefits quite heavily from this kind of optimization). I've been understanding of how much you want to brag, because finding this information is hard work, but not everyone has the resources you do, through me. nerdralph - you're doing really well. Keep it up. I'm really proud of you. I'm going to give you some help via PM.
|
|
|
|
OhGodAGirl
Full Member
Offline
Activity: 199
Merit: 108
Look, I'm really not that interesting. Promise.
|
|
March 21, 2017, 01:21:38 AM |
|
I see at least a couple people have written strap decoding programs, but I can't find publicly released. I was going to write one and release it publicly, but I figured if someone else has already written one...
So it's not as simple as using atombios.h to dump the fields in ATOM_MEMORY_TIMING_FORMAT_V2. https://raw.githubusercontent.com/torvalds/linux/master/drivers/gpu/drm/radeon/atombios.hStraps for GCN cards are 52 bytes long (3 bytes for memory clock, 1 byte for memory type, 48 bytes for strap), but sizeof(ATOM_MEMORY_TIMING_FORMAT_V2) = 40 bytes. So is it just a matter of old-fashioned reverse engineering? i.e. looking at different straps and reading through GDDR5 data sheets to figure out the strap offsets for different values? Hah, you don't know the format and you're going to make a public tool? Your threats are like skate park swimming pools - empty Looks like you haven't read the rest of the thread. It took less than an hour to figure it out from the Linux drm code. Not quite - they tell you part of the story - but look at MISC1, for example :3 Can you stop being a dick? You should be kind and considerate and thankful that people are working hard to document and unlock how all this works - knowledge should be shared, and distributed freely. This kind of optimization isn't just valuable to mining; it's valuable to a lot of operations (including scientific research, which requires a lot of compute power, and benefits quite heavily from this kind of optimization). I've been understanding of how much you want to brag, because finding this information is hard work, but not everyone has the resources you do, through me. nerdralph - you're doing really well. Keep it up. I'm really proud of you. I didn't need to use all those resources - I decoded it a lot through trial and error, and public knowledge. But sure, I'll stop. Except you did. And yes, you did a lot of the hard work, but so did many before you, including me. There's no need to act like you're a god; and there's no need for this pissing contest. Thankfully nerdralph will probably be able to take all of my work and do something useful with it, because I sure as hell don't have the time right now. EDIT: nerdralph, I want to make this public. I don't have the time to compile all of this information in a nice, easy to use way, though. Can you please help me, if I provide you with the assistance? I'm really short on time, and this is something I am passionate about.
|
|
|
|
nerdralph
|
|
March 21, 2017, 01:30:51 AM |
|
Not quite - they tell you part of the story - but look at MISC1, for example :3
I've been understanding of how much you want to brag, because finding this information is hard work, but not everyone has the resources you do, through me. Hmmm... sounds like someone with contacts inside AMD.
|
|
|
|
nerdralph
|
|
March 21, 2017, 01:44:35 AM |
|
So my first try at a custom strap didn't work (GPU crashed almost immediately when mining ETH). custom 1900: 1500RAS, 1625CAS, MISC2, & ARB 777000000000000022CC1C00AD515A3ED0570F15B98CA50A004AE7001C0714207A8900A00300000 01B11353F922A3217
A straight copy of the 1625 strap to 2000 works fine, while the 1500 strap gave errors even at 1900. I tried taking the 1900 strap, RAS from the 1500, and CAS, MISC2 & ARB2 from the 1625 strap and using it for the 2000 strap.
|
|
|
|
nerdralph
|
|
March 21, 2017, 02:16:44 AM |
|
Not quite - they tell you part of the story - but look at MISC1, for example :3
Yes, I'm intentionally not using the mask for tRP_WRA and RDA since there is data in the straps outside the mask. I suppose I could add unknown fields (i.e. uk1, uk2).
|
|
|
|
kilo17 (OP)
Legendary
Offline
Activity: 980
Merit: 1001
aka "whocares"
|
|
March 21, 2017, 02:34:02 AM |
|
So my first try at a custom strap didn't work (GPU crashed almost immediately when mining ETH). custom 1900: 1500RAS, 1625CAS, MISC2, & ARB 777000000000000022CC1C00AD515A3ED0570F15B98CA50A004AE7001C0714207A8900A00300000 01B11353F922A3217
A straight copy of the 1625 strap to 2000 works fine, while the 1500 strap gave errors even at 1900. I tried taking the 1900 strap, RAS from the 1500, and CAS, MISC2 & ARB2 from the 1625 strap and using it for the 2000 strap.
Most of the time those type of adjustments will crash or yield no benefit. The timings that are affected by a change must be changed as well to compensate. Secondly, tRAS has little affect on anything and is mostly changed to compensate for changes in other timings
|
Bitcoin Will Only Succeed If The Community That Supports It Gets Support - Support Home Miners & Mining
|
|
|
OhGodAGirl
Full Member
Offline
Activity: 199
Merit: 108
Look, I'm really not that interesting. Promise.
|
|
March 21, 2017, 02:37:11 AM |
|
Hello all, Wolf0 and I have, today, released OhGodATool, OhGodADecode and OhGodACsumFixer. Currently, they are without barebones documentation - I don't have the time right now with work, but once I do have a spare moment, I will update it. You can download OhGodATool here: https://github.com/OhGodACompany/OhGodATool/releases/You can download OhGodADecode here: https://github.com/OhGodACompany/OhGodADecode/releases/You can download OhGodACsumFixer here: https://github.com/OhGodACompany/OhGodACsumFixer/releasesEnjoy.
|
|
|
|
kilo17 (OP)
Legendary
Offline
Activity: 980
Merit: 1001
aka "whocares"
|
|
March 21, 2017, 02:41:00 AM |
|
Thanks for the links--- my old school spreadsheet works but that will be a lot easier btw- does the CheckSum work on linux --- I do not have any windoze machines
|
Bitcoin Will Only Succeed If The Community That Supports It Gets Support - Support Home Miners & Mining
|
|
|
OhGodAGirl
Full Member
Offline
Activity: 199
Merit: 108
Look, I'm really not that interesting. Promise.
|
|
March 21, 2017, 02:47:08 AM |
|
Thanks for the links--- my old school spreadsheet works but that will be a lot easier btw- does the CheckSum work on linux --- I do not have any windoze machines Yes, it does. Releases has it precompiled for you.
|
|
|
|
nerdralph
|
|
March 21, 2017, 03:12:03 AM |
|
Thanks. Where did you find the updated MC_SEQ_MISC_TIMING?
|
|
|
|
niko2004x
Member
Offline
Activity: 126
Merit: 10
|
|
March 21, 2017, 03:40:08 AM Last edit: March 21, 2017, 04:02:19 AM by niko2004x |
|
Aah. And the profits go to the drain. Here is my version https://github.com/niko2004x/atom_timing_editor. EDIT: There are different decoders for preRX(starting from HD7xxx) and RX series. Not sure if they are right but they give consistent values for Elpida EDW4032BABG and Hynix H5GC4H24AJR in cards of different generations.
|
|
|
|
OhGodAGirl
Full Member
Offline
Activity: 199
Merit: 108
Look, I'm really not that interesting. Promise.
|
|
March 21, 2017, 03:41:59 AM |
|
Well, to be fair, you're all swinging your dicks around like helicopters and acting like you're gods; it's about time someone levels the playing field. Thank you for sharing. Your code is very clean! Very nice job. People will still pay you to do this for them - there are many people out there who won't be able to use these tools. All this does is provide the ones who have the knowledge, the information. There are many people who will pay for the convenience of a tighter timing.
|
|
|
|
nerdralph
|
|
March 21, 2017, 04:09:45 AM |
|
tFAW = 0 Really, do I read this right??
Elpida... you can change some random things and it will still work... In comparison with Samsung, Elpida runs on 3 cycles, which is interesting Anyhow, tFAW 0, i'd suggest to try it, you'll see works fine. + they clock like beasts Greetings I finally got a chance to try tFAW = 0 on my Sapphire Rx470 with Samsung RAM. Works fine at 2000 with the 1625 strap (normally tFAW = 10). No change in mining performance though. Maybe if I try tRRD = 5 (instead of 6)...
|
|
|
|
kilo17 (OP)
Legendary
Offline
Activity: 980
Merit: 1001
aka "whocares"
|
|
March 21, 2017, 04:15:51 AM |
|
It is even more fun playing with HBM. Just look at the 100 strap and compare it to the 400,500,600 straps and it looks like it would be easy. Well it is not but it is fun to play with regardless.
|
Bitcoin Will Only Succeed If The Community That Supports It Gets Support - Support Home Miners & Mining
|
|
|
niko2004x
Member
Offline
Activity: 126
Merit: 10
|
|
March 21, 2017, 04:20:52 AM |
|
It is even more fun playing with HBM. Just look at the 100 strap and compare it to the 400,500,600 straps and it looks like it would be easy. Well it is not but it is fun to play with regardless.
Well, there is only one variant of timings table for HBM compared to >100 variants of timings table for GDDR5. Not much data to do science here.
|
|
|
|
|