niko2004x
Member
Offline
Activity: 126
Merit: 10
|
|
March 21, 2017, 12:53:33 PM Last edit: March 21, 2017, 01:09:40 PM by niko2004x |
|
Guess what, i know almost nothing about CAS, ARB, MISC, but by trial & error method i managed to get a pretty good strap Looks like its time to learn some shit, and understand what am i really doing when changing hex values LOL Thank you ohgod people for sharing the knowledge Since i am not Stilt and do not have access to internal AMD stuff, i did mine 'mad science style' (shameless dick swinging) with search routine based on genetic algorithms guided by regression forests on PCA transformed register field values (machine learning for fun and profit).
|
|
|
|
nerdralph
|
|
March 21, 2017, 01:15:26 PM |
|
...finally getting interesting... somebody might share Hynix AJR and Sam datasheet perhaps..? I wonder how the optimized custom timings are related to the given minimal values in the official datasheet? I mean how loose are the official values, are there even room for improvement below them, or custom straps simply are adjusted to the recommended official values...? Here's an older Hynix GDDR5 datasheet. https://drive.google.com/file/d/0BwLnDyLLT3WkeTBtekxTTloxMW8/view?usp=sharingI don't have one for Samsung. You can also find Elpida/Micron product briefs that will have basic stuff like #of banks (usually 16) and bank groups (4). Often you can push the official timings by ~30%. Sometimes, particularly when the RAM is not thermally connected to the heatsink, you may have a hard time pushing the timing by 10%. http://nerdralph.blogspot.ca/2017/01/hot-video-cards.htmlAs has been mentioned in these (and other) forums many times, the best straps depend on the mining algorithm. ZEC is harder to optimize for than ETH since it has lots of writes and more variation in memory access pattern. https://bitcointalk.org/index.php?topic=1679855.0
|
|
|
|
nerdralph
|
|
March 21, 2017, 01:19:22 PM |
|
So my first try at a custom strap didn't work (GPU crashed almost immediately when mining ETH). custom 1900: 1500RAS, 1625CAS, MISC2, & ARB 777000000000000022CC1C00AD515A3ED0570F15B98CA50A004AE7001C0714207A8900A00300000 01B11353F922A3217
A straight copy of the 1625 strap to 2000 works fine, while the 1500 strap gave errors even at 1900. I tried taking the 1900 strap, RAS from the 1500, and CAS, MISC2 & ARB2 from the 1625 strap and using it for the 2000 strap.
My friend, you have a lot to learn...I was like u...a few weeks ago, then I read all the documentation regarding GDDR5 and with a little help(well...not so little) I managed to understand what actually those timings do Keep up the good work by the way! EDIT: I am also very keen to understand HBM/2 timings, if anyone has some knowledge on those(I already know the mode registers) any help via PM is highly appreciated! I know how to make an optimized strap, just like I know how to re-shingle a shed. But tying a tarp over the roof is a lot easier...
|
|
|
|
NisamRobot
Newbie
Offline
Activity: 31
Merit: 0
|
|
March 21, 2017, 02:08:20 PM |
|
A kitten tamed the wolf
|
|
|
|
nerdralph
|
|
March 21, 2017, 02:27:35 PM Last edit: March 21, 2017, 05:03:47 PM by nerdralph |
|
Now that the strap tools are out, let's talk about how to optimize the timings. I want to start with ETH since it is the simplest (and coincidentally the most profitable ATM). Ethash is many 128-byte random DAG reads, 8KB of them per hash, so 20MH/s requires 160GB/s of random read bandwidth. For AMD cards 128 bytes is 2 cache lines of 64 bytes each, and each cache line fill reads 32 bytes from 2 GDDR5 memory chips. Each 32-byte GDDR5 read burst takes 2 clocks, so when the RAM is clocked at 2GHz, the data will be transferred in 1ns (each bit takes just 125ps!). Here's a couple references to help the noobs get started: https://www.micron.com/~/media/documents/products/technical-note/dram/tned01_gddr5_sgram_introduction.pdfhttps://www.micron.com/~/media/documents/products/data-sheet/dram/gddr5/4gb_gddr5_sgram_brief.pdfI'm not going to do one long post, so as to make this more readable. For the more experienced folks, here's a tidbit of ideas to come: set tFAW and t32AW to 0. Even Hynix's old H5GQ1H24AFR has FAW (23ns) =~ 4* RRD (5.5ns), so virtually all modern GGD5 should be able to work fine without FAW and 32AW limits. I get 27.0Mh with sgminer on my Rx470/K4G4 clocked at 2Ghz, tRRD=5, tFAW=0. Zeroing t32AW gives a bump to 27.35Mh.
|
|
|
|
tharp
Newbie
Offline
Activity: 19
Merit: 0
|
|
March 21, 2017, 02:53:56 PM |
|
Since everyone is sharing now I suppose i'll put what I've come up with out here. Running -125mv 470 Nitro Sapphire 8GB with Samsung memory with ETH hitting between 28.5MH/s to 29.2MH/s @1140 cor and @2100 mem pulling around 920watts at the wall with 6 GPU per rig. On XMR hitting 785h/s to 795h/s @1170 cor and @2100 mem pulling around 660 watts at the wall with 6 GPU per rig. Also running ethOS 1.2.0. Im here to learn more about the mistakes I made on the mod and see what others in the community have come up with. Here is the strap I've put together: 777000000000000022CC1C00106A5B47C0570E16B08C05090068C70014051420FA8900A00300000 0190D2F399D2D2E17 The timings from wolf and ohgodagirls vbios decode tools release: TRCDW = 16 TRCDWA = 16 TRCDR = 26 TRCDRA = 22 TRRD = 5 TRC = 71 Pad0 = 0 TRP_WRA = 48 Pad0 = 2 TRP_RDA = 12 TRP = 22 TRFC = 144 PA2RDATA = 0 Pad0 = 0 PA2WDATA = 0 Pad1 = 0 TFAW = 8 TCRCRL = 3 TCRCWL = 7 TFAW32 = 6 MC_SEQ_MISC1: 0x20140514 MC_SEQ_MISC3: 0xA00089FA MC_SEQ_MISC8: 0x00000003 ACTRD = 25 ACTWR = 13 RASMACTRD = 47 RASMACTWR = 57 RAS2RAS = 157 RP = 45 WRPLUSRP = 46 BUS_TURN = 23 Looking forward to others input!
|
|
|
|
Eliovp
Legendary
Offline
Activity: 1050
Merit: 1293
Huh?
|
|
March 21, 2017, 03:32:56 PM |
|
Since everyone is sharing now I suppose i'll put what I've come up with out here. Running -125mv 470 Nitro Sapphire 8GB with Samsung memory with ETH hitting between 28.5MH/s to 29.2MH/s @1140 cor and @2100 mem pulling around 920watts at the wall with 6 GPU per rig. On XMR hitting 785h/s to 795h/s @1170 cor and @2100 mem pulling around 660 watts at the wall with 6 GPU per rig. Also running ethOS 1.2.0. Im here to learn more about the mistakes I made on the mod and see what others in the community have come up with. Here is the strap I've put together: 777000000000000022CC1C00106A5B47C0570E16B08C05090068C70014051420FA8900A00300000 0190D2F399D2D2E17 .... Looking forward to others input! Cleaned it up for you. --> HEX strap: 777000000000000022CC1C00AD695D47C0570E16B08C05090048C70014051420FA8900A003000000190D2F399D2D2E17
--> MC_SEQ_WR_CTL_D0 DAT_DLY = 7, DQS_DLY = 7, DQS_XTR = 0, DAT_2Y_DLY = 0, ADR_2Y_DLY = 0, CMD_2Y_DLY = 0, OEN_DLY = 7, OEN_EXT = 0
--> MC_SEQ_WR_CTL_D1 DAT_DLY = 0, DQS_DLY = 0, DQS_XTR = 0, DAT_2Y_DLY = 0, ADR_2Y_DLY = 0, CMD_2Y_DLY = 0, OEN_DLY = 0, OEN_EXT = 0
--> MC_SEQ_PMG_TIMING TCKSRE = 2, Pad0 = 0, TCKSRX = 2, Pad1 = 0, TCKE_PULSE = 12, TCKE = 12, SEQ_IDLE = 7, Pad2 = 0, TCKE_PULSE_MSB = 0, SEQ_IDLE_SS = 0
--> MC_SEQ_RAS_TIMING TRCDW = 13, TRCDWA = 13, TRCDR = 26, TRCDRA = 26, TRRD = 5, TRC = 71, Pad0 = 0
--> MC_SEQ_CAS_TIMING TNOPW = 0, TNOPR = 0, TR2W = 28, TCCLD = 3, TR2R = 5, Pad0 = 0, TW2R = 14, TCL = 22, Pad1 = 0
--> MC_SEQ_MISC_TIMING TRP_WRA = 48, Pad0 = 2, TRP_RDA = 12, TRP = 22, TRFC = 144
--> MC_SEQ_MISC_TIMING2 PA2RDATA = 0, Pad0 = 0, PA2WDATA = 0, Pad1 = 0, FAW = 8, TREDC = 2, TWEDC = 7, T32AW = 6, Pad2 = 0, TWDATATR = 0
--> MC_SEQ_MISC1 -- MR0 WL = 4, CL = 23, TM = 0, WR = 25, BA0 = 0, BA1 = 0, BA2 = 0, BA3 = 0 -- MR1 DS = 0, DT = 1, ADR = 1, CAL = 0, PLL = 0, RDBI = 0, WDBI = 0, ABI = 0, RES = 0, BA0 = 0, BA1 = 1, BA2 = 0, BA3 = 0
--> MC_SEQ_MISC3 -- MR4 EDCHP = 10, CRC WL = 7, CRC RL = 3, RD CRC = 0, WR CRC = 0, EDCHPi = 1, BA0 = 0, BA1 = 0, BA2 = 0, BA3 = 1 -- MR5 LP1 = 0, LP2 = 0, LP3 = 0, PLL/DLL BW = 0, RAS = 0, BA0 = 0, BA1 = 1, BA2 = 0, BA3 = 1
--> MC_SEQ_MISC8 -- MR8 CLEHF = 1, WREHF = 1, RFU = 0, BA0 = 0, BA1 = 0, BA2 = 0, BA3 = 0 -- MR7 PLL Stby = 0, PLL Fclk = 0, PLL DelC = 0, LF Mode = 0, Auto Sync = 0, DQ PreA = 0, Temp Sensor = 0, HVFRED = 0, VDD Range = 0, RFU = 0, BA0 = 0, BA1 = 0, BA2 = 0, BA3 = 0
--> MC_ARB_DRAM_TIMING ACTRD = 25, ACTWR = 13, RASMACTRD = 47, RASMACTWR = 57
--> MC_ARB_DRAM_TIMING2 RAS2RAS = 157, RP = 45, WRPLUSRP = 46, BUS_TURN = 23
Lots of options.. lots of things to fine tune..
|
|
|
|
nerdralph
|
|
March 21, 2017, 03:41:57 PM Last edit: March 21, 2017, 04:12:17 PM by nerdralph |
|
Here is the strap I've put together: 777000000000000022CC1C00106A5B47C0570E16B08C05090068C70014051420FA8900A00300000 0190D2F399D2D2E17
The timings from wolf and ohgodagirls vbios decode tools release:
You should update to the version with my changes that show CAS timing. I see you're using CL=22. With Samsung CL=21 I was getting errors at 2100 (OK at 2000). I'll give 22 a try. Here's what I was using @2000: 555000000000000022CC1C00CE595B3ED0570F1531CB2409004007000B0314207A8900A00300000 0170F2E36922A3217 update: no luck with CL=22@2100; too many HW errors. I'm pretty sure the Samsung RAM on these cards is rated for 1750, so getting stable results at 2000 is still pretty good.
|
|
|
|
niko2004x
Member
Offline
Activity: 126
Merit: 10
|
|
March 21, 2017, 04:09:20 PM |
|
Since everyone is sharing now I suppose i'll put what I've come up with out here. Running -125mv 470 Nitro Sapphire 8GB with Samsung memory with ETH hitting between 28.5MH/s to 29.2MH/s @1140 cor and @2100 mem pulling around 920watts at the wall with 6 GPU per rig. On XMR hitting 785h/s to 795h/s @1170 cor and @2100 mem pulling around 660 watts at the wall with 6 GPU per rig. Also running ethOS 1.2.0. Im here to learn more about the mistakes I made on the mod and see what others in the community have come up with. Here is the strap I've put together: 777000000000000022CC1C00106A5B47C0570E16B08C05090068C70014051420FA8900A00300000 0190D2F399D2D2E17 The timings from wolf and ohgodagirls vbios decode tools release: TRCDW = 16 TRCDWA = 16 TRCDR = 26 TRCDRA = 22 TRRD = 5 TRC = 71 Pad0 = 0 TRP_WRA = 48 Pad0 = 2 TRP_RDA = 12 TRP = 22 TRFC = 144 PA2RDATA = 0 Pad0 = 0 PA2WDATA = 0 Pad1 = 0 TFAW = 8 TCRCRL = 3 TCRCWL = 7 TFAW32 = 6 MC_SEQ_MISC1: 0x20140514 MC_SEQ_MISC3: 0xA00089FA MC_SEQ_MISC8: 0x00000003 ACTRD = 25 ACTWR = 13 RASMACTRD = 47 RASMACTWR = 57 RAS2RAS = 157 RP = 45 WRPLUSRP = 46 BUS_TURN = 23 Looking forward to others input! TRCDR & TRCDRA should be the same. And i don't think that MISC decoded properly. Or at least it is reasonable to expect Pad0 in TRP_WRA = 48 Pad0 = 2 TRP_RDA = 12 TRP = 22 TRFC = 144
to be zero.
|
|
|
|
tharp
Newbie
Offline
Activity: 19
Merit: 0
|
|
March 21, 2017, 04:10:23 PM |
|
Since everyone is sharing now I suppose i'll put what I've come up with out here. Running -125mv 470 Nitro Sapphire 8GB with Samsung memory with ETH hitting between 28.5MH/s to 29.2MH/s @1140 cor and @2100 mem pulling around 920watts at the wall with 6 GPU per rig. On XMR hitting 785h/s to 795h/s @1170 cor and @2100 mem pulling around 660 watts at the wall with 6 GPU per rig. Also running ethOS 1.2.0. Im here to learn more about the mistakes I made on the mod and see what others in the community have come up with. Here is the strap I've put together: 777000000000000022CC1C00106A5B47C0570E16B08C05090068C70014051420FA8900A00300000 0190D2F399D2D2E17 .... Looking forward to others input! Cleaned it up for you. --> HEX strap: 777000000000000022CC1C00AD695D47C0570E16B08C05090048C70014051420FA8900A003000000190D2F399D2D2E17
--> MC_SEQ_WR_CTL_D0 DAT_DLY = 7, DQS_DLY = 7, DQS_XTR = 0, DAT_2Y_DLY = 0, ADR_2Y_DLY = 0, CMD_2Y_DLY = 0, OEN_DLY = 7, OEN_EXT = 0
--> MC_SEQ_WR_CTL_D1 DAT_DLY = 0, DQS_DLY = 0, DQS_XTR = 0, DAT_2Y_DLY = 0, ADR_2Y_DLY = 0, CMD_2Y_DLY = 0, OEN_DLY = 0, OEN_EXT = 0
--> MC_SEQ_PMG_TIMING TCKSRE = 2, Pad0 = 0, TCKSRX = 2, Pad1 = 0, TCKE_PULSE = 12, TCKE = 12, SEQ_IDLE = 7, Pad2 = 0, TCKE_PULSE_MSB = 0, SEQ_IDLE_SS = 0
--> MC_SEQ_RAS_TIMING TRCDW = 13, TRCDWA = 13, TRCDR = 26, TRCDRA = 26, TRRD = 5, TRC = 71, Pad0 = 0
--> MC_SEQ_CAS_TIMING TNOPW = 0, TNOPR = 0, TR2W = 28, TCCLD = 3, TR2R = 5, Pad0 = 0, TW2R = 14, TCL = 22, Pad1 = 0
--> MC_SEQ_MISC_TIMING TRP_WRA = 48, Pad0 = 2, TRP_RDA = 12, TRP = 22, TRFC = 144
--> MC_SEQ_MISC_TIMING2 PA2RDATA = 0, Pad0 = 0, PA2WDATA = 0, Pad1 = 0, FAW = 8, TREDC = 2, TWEDC = 7, T32AW = 6, Pad2 = 0, TWDATATR = 0
--> MC_SEQ_MISC1 -- MR0 WL = 4, CL = 23, TM = 0, WR = 25, BA0 = 0, BA1 = 0, BA2 = 0, BA3 = 0 -- MR1 DS = 0, DT = 1, ADR = 1, CAL = 0, PLL = 0, RDBI = 0, WDBI = 0, ABI = 0, RES = 0, BA0 = 0, BA1 = 1, BA2 = 0, BA3 = 0
--> MC_SEQ_MISC3 -- MR4 EDCHP = 10, CRC WL = 7, CRC RL = 3, RD CRC = 0, WR CRC = 0, EDCHPi = 1, BA0 = 0, BA1 = 0, BA2 = 0, BA3 = 1 -- MR5 LP1 = 0, LP2 = 0, LP3 = 0, PLL/DLL BW = 0, RAS = 0, BA0 = 0, BA1 = 1, BA2 = 0, BA3 = 1
--> MC_SEQ_MISC8 -- MR8 CLEHF = 1, WREHF = 1, RFU = 0, BA0 = 0, BA1 = 0, BA2 = 0, BA3 = 0 -- MR7 PLL Stby = 0, PLL Fclk = 0, PLL DelC = 0, LF Mode = 0, Auto Sync = 0, DQ PreA = 0, Temp Sensor = 0, HVFRED = 0, VDD Range = 0, RFU = 0, BA0 = 0, BA1 = 0, BA2 = 0, BA3 = 0
--> MC_ARB_DRAM_TIMING ACTRD = 25, ACTWR = 13, RASMACTRD = 47, RASMACTWR = 57
--> MC_ARB_DRAM_TIMING2 RAS2RAS = 157, RP = 45, WRPLUSRP = 46, BUS_TURN = 23
Lots of options.. lots of things to fine tune.. Thanks will give it a shot when I get back on a computer. Here is the strap I've put together: 777000000000000022CC1C00106A5B47C0570E16B08C05090068C70014051420FA8900A00300000 0190D2F399D2D2E17
The timings from wolf and ohgodagirls vbios decode tools release:
You should update to the version with my changes that show CAS timing. I see you're using CL=22. With Samsung CL=21 I was getting errors at 2100 (OK at 2000). I'll give 22 a try. Here's what I was using @2000: 555000000000000022CC1C00CE595B3ED0570F1531CB2409004007000B0314207A8900A00300000 0170F2E36922A3217 I am seeing HW errors with the current mod I'm running but not an exponential amount that affects performance on the pool hash rate. I will be able to test more once I get back to my computer.
|
|
|
|
laik2
|
|
March 21, 2017, 04:44:00 PM |
|
Since everyone is sharing now I suppose i'll put what I've come up with out here. Running -125mv 470 Nitro Sapphire 8GB with Samsung memory with ETH hitting between 28.5MH/s to 29.2MH/s @1140 cor and @2100 mem pulling around 920watts at the wall with 6 GPU per rig. On XMR hitting 785h/s to 795h/s @1170 cor and @2100 mem pulling around 660 watts at the wall with 6 GPU per rig. Also running ethOS 1.2.0. Im here to learn more about the mistakes I made on the mod and see what others in the community have come up with. Here is the strap I've put together: 777000000000000022CC1C00106A5B47C0570E16B08C05090068C70014051420FA8900A00300000 0190D2F399D2D2E17 .... Looking forward to others input! Cleaned it up for you. --> HEX strap: 777000000000000022CC1C00AD695D47C0570E16B08C05090048C70014051420FA8900A003000000190D2F399D2D2E17
--> MC_SEQ_WR_CTL_D0 DAT_DLY = 7, DQS_DLY = 7, DQS_XTR = 0, DAT_2Y_DLY = 0, ADR_2Y_DLY = 0, CMD_2Y_DLY = 0, OEN_DLY = 7, OEN_EXT = 0
--> MC_SEQ_WR_CTL_D1 DAT_DLY = 0, DQS_DLY = 0, DQS_XTR = 0, DAT_2Y_DLY = 0, ADR_2Y_DLY = 0, CMD_2Y_DLY = 0, OEN_DLY = 0, OEN_EXT = 0
--> MC_SEQ_PMG_TIMING TCKSRE = 2, Pad0 = 0, TCKSRX = 2, Pad1 = 0, TCKE_PULSE = 12, TCKE = 12, SEQ_IDLE = 7, Pad2 = 0, TCKE_PULSE_MSB = 0, SEQ_IDLE_SS = 0
--> MC_SEQ_RAS_TIMING TRCDW = 13, TRCDWA = 13, TRCDR = 26, TRCDRA = 26, TRRD = 5, TRC = 71, Pad0 = 0
--> MC_SEQ_CAS_TIMING TNOPW = 0, TNOPR = 0, TR2W = 28, TCCLD = 3, TR2R = 5, Pad0 = 0, TW2R = 14, TCL = 22, Pad1 = 0
--> MC_SEQ_MISC_TIMING TRP_WRA = 48, Pad0 = 2, TRP_RDA = 12, TRP = 22, TRFC = 144
--> MC_SEQ_MISC_TIMING2 PA2RDATA = 0, Pad0 = 0, PA2WDATA = 0, Pad1 = 0, FAW = 8, TREDC = 2, TWEDC = 7, T32AW = 6, Pad2 = 0, TWDATATR = 0
--> MC_SEQ_MISC1 -- MR0 WL = 4, CL = 23, TM = 0, WR = 25, BA0 = 0, BA1 = 0, BA2 = 0, BA3 = 0 -- MR1 DS = 0, DT = 1, ADR = 1, CAL = 0, PLL = 0, RDBI = 0, WDBI = 0, ABI = 0, RES = 0, BA0 = 0, BA1 = 1, BA2 = 0, BA3 = 0
--> MC_SEQ_MISC3 -- MR4 EDCHP = 10, CRC WL = 7, CRC RL = 3, RD CRC = 0, WR CRC = 0, EDCHPi = 1, BA0 = 0, BA1 = 0, BA2 = 0, BA3 = 1 -- MR5 LP1 = 0, LP2 = 0, LP3 = 0, PLL/DLL BW = 0, RAS = 0, BA0 = 0, BA1 = 1, BA2 = 0, BA3 = 1
--> MC_SEQ_MISC8 -- MR8 CLEHF = 1, WREHF = 1, RFU = 0, BA0 = 0, BA1 = 0, BA2 = 0, BA3 = 0 -- MR7 PLL Stby = 0, PLL Fclk = 0, PLL DelC = 0, LF Mode = 0, Auto Sync = 0, DQ PreA = 0, Temp Sensor = 0, HVFRED = 0, VDD Range = 0, RFU = 0, BA0 = 0, BA1 = 0, BA2 = 0, BA3 = 0
--> MC_ARB_DRAM_TIMING ACTRD = 25, ACTWR = 13, RASMACTRD = 47, RASMACTWR = 57
--> MC_ARB_DRAM_TIMING2 RAS2RAS = 157, RP = 45, WRPLUSRP = 46, BUS_TURN = 23
Lots of options.. lots of things to fine tune.. Thanks will give it a shot when I get back on a computer. Here is the strap I've put together: 777000000000000022CC1C00106A5B47C0570E16B08C05090068C70014051420FA8900A00300000 0190D2F399D2D2E17
The timings from wolf and ohgodagirls vbios decode tools release:
You should update to the version with my changes that show CAS timing. I see you're using CL=22. With Samsung CL=21 I was getting errors at 2100 (OK at 2000). I'll give 22 a try. Here's what I was using @2000: 555000000000000022CC1C00CE595B3ED0570F1531CB2409004007000B0314207A8900A00300000 0170F2E36922A3217 I am seeing HW errors with the current mod I'm running but not an exponential amount that affects performance on the pool hash rate. I will be able to test more once I get back to my computer. It was cleaned rom, without modding other than SEQ_RAS params, there is delibarate error for you to figure it out. Hint: MC_SEQ_MISC_TIMING EDIT: OhGodAGirl format: typedef struct _SEQ_MISC_TIMING_FORMAT { uint32_t TRP_WRA : 6; uint32_t Pad0 : 2; uint32_t TRP_RDA : 6; uint32_t TRP : 6; uint32_t TRFC : 11; } SEQ_MISC_TIMING_FORMAT;
Expected format: 5:0 TRP_WRA = 0x0 13:8 TRP_RDA = 0x0 19:15 TRP = 0x0 28:20 TRFC = 0x0
I must admit that I think OhGodAGirl's format looks more like it.
|
|
|
|
lexele
|
|
March 21, 2017, 04:45:55 PM |
|
Gosh, this thread is becoming hotter and hotter; If only I weren't tied to work lately. Good luck folks.
|
|
|
|
niko2004x
Member
Offline
Activity: 126
Merit: 10
|
|
March 21, 2017, 05:23:23 PM Last edit: March 21, 2017, 06:15:07 PM by niko2004x |
|
Expected format: 5:0 TRP_WRA = 0x0 13:8 TRP_RDA = 0x0 19:15 TRP = 0x0 28:20 TRFC = 0x0
I must admit that I think OhGodAGirl's format looks more like it. I think this one is correct but only for preRX series. At least corresponding header in linux kernel dated reasonable before RX. Here is the additional hint (produced by my tool). RX480(Elpida EDW4032BABG) 20000 0 999000000000000022aa1c0060881107c0540b078f82c00000204100150014209a8840a1000004c0030105070c0a100c [TRP_WRA=015,TRP_RDA=005,unused1=000,TRP=002,unused2=000,TRFC=012,unused3=000] 40000 0 999000000000000022aa1c006094120fd0540c0815449101002041001d0314209a8880a2000004c006010a0f190e160d [TRP_WRA=021,TRP_RDA=008,unused1=000,TRP=005,unused2=000,TRFC=025,unused3=000] 80000 0 999000000000000022aa1c00a5ac351f10550e0c21c73203004482003d0914202a8900a5000004c00c06141a33182210 [TRP_WRA=033,TRP_RDA=014,unused1=000,TRP=011,unused2=000,TRFC=051,unused3=000] 100000 0 777000000000000022aa1c002939572750550d0fa68803040068c200540c1420aa8900a6000004c00f0a191e401e2712 [TRP_WRA=038,TRP_RDA=017,unused1=000,TRP=014,unused2=000,TRFC=064,unused3=000] 125000 0 777000000000000022aa1c00ad49593270550e12ad8a14050068c300640f1420ba8980a7000004c0130e202551242e13 [TRP_WRA=045,TRP_RDA=021,unused1=000,TRP=018,unused2=000,TRFC=081,unused3=000] 137500 0 777000000000000022aa1c00ef516a3790550f14b20b9505006ae40074021420ca89c0a8020004c01510232859283315 [TRP_WRA=050,TRP_RDA=023,unused1=000,TRP=020,unused2=000,TRFC=089,unused3=000] 142500 0 777000000000000022aa1c0010d66a3990550f14344cc505006ae40074031420ca8900a9020004c0161124295c293515 [TRP_WRA=052,TRP_RDA=024,unused1=000,TRP=021,unused2=000,TRFC=092,unused3=000] 150000 0 777000000000000022aa1c00315a6b3ca0550f15b68c1506006ae4007c041420ca8980a9020004c01712262b612b3715 [TRP_WRA=054,TRP_RDA=025,unused1=000,TRP=022,unused2=000,TRFC=097,unused3=000] 162500 0 777000000000000022aa1c0073627c41b0551016ba0d9606006c060104061420ea8940aa030004c01914292e692e3b16 [TRP_WRA=058,TRP_RDA=027,unused1=000,TRP=024,unused2=000,TRFC=105,unused3=000] 175000 0 777000000000000022aa1c00b56a7d46c0551017be8e1607006c07010c081420fa8900ab030004c01b162c3171313f17 [TRP_WRA=062,TRP_RDA=029,unused1=000,TRP=026,unused2=000,TRFC=113,unused3=000] 200000 0 999000000000000022aa1c0018f77e4fd055121946501708006c07011d0c1420fa8980ac030004c01e19323781364718 [TRP_WRA=070,TRP_RDA=032,unused1=000,TRP=029,unused2=000,TRFC=129,unused3=000] R9390(Elpida EDW4032BABG) 20000 0 999133200000000060881107c0540b060f05c1000020410022aa1c08150014209a8840a1000000c0030105070c0a100c [TRP_WRA=015,unused1=000,TRP_RDA=005,unused2=000,TRP=002,TRFC=012,unused3=000] 40000 0 99913320000000006094120fd0540c07158892010020410022aa1c081d0314209a8880a2000000c006010a0f190e160d [TRP_WRA=021,unused1=000,TRP_RDA=008,unused2=000,TRP=005,TRFC=025,unused3=000] 80000 0 9991332000000000a5ac351f10550e0b218e35030044820022aa1c083d0914202a8900a5000000c00c06141a33182210 [TRP_WRA=033,unused1=000,TRP_RDA=014,unused2=000,TRP=011,TRFC=051,unused3=000] 100000 0 77713320000000002939572750550d0e261107040068c20022aa1c08540c1420aa8900a6000000c00f0a191e401e2712 [TRP_WRA=038,unused1=000,TRP_RDA=017,unused2=000,TRP=014,TRFC=064,unused3=000] 125000 0 7771332000000000ad49593270550e102d1519050068c30022aa1c08640f1420ba8980a7000000c0130e202551242e13 [TRP_WRA=045,unused1=000,TRP_RDA=021,unused2=000,TRP=018,TRFC=081,unused3=000] 137500 0 7771332000000000ef516a3790550f1232179a05006ae40022aa1c0874021420ca89c0a8020000c01510232859283315 [TRP_WRA=050,unused1=000,TRP_RDA=023,unused2=000,TRP=020,TRFC=089,unused3=000] 142500 0 777133200000000010d66a3990550f123498ca05006ae40022aa1c0874031420ca8900a9020000c0161124295c293515 [TRP_WRA=052,unused1=000,TRP_RDA=024,unused2=000,TRP=021,TRFC=092,unused3=000] 150000 0 7771332000000000315a6b3ca0550f1336191b06006ae40022aa1c087c041420ca8980a9020000c01712262b612b3715 [TRP_WRA=054,unused1=000,TRP_RDA=025,unused2=000,TRP=022,TRFC=097,unused3=000] 162500 0 777133200000000073627c41b05510143a1b9c06006c060122aa1c0804061420ea8940aa030000c01914292e692e3b16 [TRP_WRA=058,unused1=000,TRP_RDA=027,unused2=000,TRP=024,TRFC=105,unused3=000] 175000 0 7771332000000000b56a7d46c05510153e1d1d07006c070122aa1c080c081420fa8900ab030000c01b162c3171313f17 [TRP_WRA=062,unused1=000,TRP_RDA=029,unused2=000,TRP=026,TRFC=113,unused3=000] 200000 0 999133200000000018f77e4f0054121a06a01e08006c070122aa1c08350c1420fa8980ac030000c01e1932378139471a [TRP_WRA=006,unused1=000,TRP_RDA=032,unused2=000,TRP=029,TRFC=129,unused3=000] RX480(Hynix H5GC4H24AJR) 40000 0 555000000000000022dd1c0084941212f0540b0795847102002041001b0414209a8800a00000312006050d0e270f160e [TRP_WRA=021,TRP_RDA=009,unused1=000,TRP=006,unused2=000,TRFC=039,unused3=000] 80000 0 777000000000000022dd1c00e7ac352210550d0a20c7f20400248100340914209a8800a0000031200c08171b4f172110 [TRP_WRA=032,TRP_RDA=014,unused1=000,TRP=011,unused2=000,TRFC=079,unused3=000] 90000 0 777000000000000022dd1c002931462620550e0ba20793050026a2003c0a1420aa8800a0000031200d0a1a1d59192311 [TRP_WRA=034,TRP_RDA=015,unused1=000,TRP=012,unused2=000,TRFC=089,unused3=000] 100000 0 777000000000000022dd1c0029b5462930550e0c244823060026a200440b1420aa8800a0000031200e0a1c20621b2511 [TRP_WRA=036,TRP_RDA=016,unused1=000,TRP=013,unused2=000,TRFC=098,unused3=000] 112500 0 777000000000000022ff1c006bbd572f40550f0d28c9f3060048c5004c0d14205a8900a000003120100c20246f1e2912 [TRP_WRA=040,TRP_RDA=018,unused1=000,TRP=015,unused2=000,TRFC=111,unused3=000] 125000 0 777000000000000022ff1c008cc5583460550f0f2c4ab4070048c5005c0f14205a8900a000003120120d23287b222d13 [TRP_WRA=044,TRP_RDA=020,unused1=000,TRP=017,unused2=000,TRFC=123,unused3=000] 137500 0 777000000000000022339d00cecd593980551111ae8a84080048c6006c0014206a8900a002003120140f262b88252f15 [TRP_WRA=046,TRP_RDA=021,unused1=000,TRP=018,unused2=000,TRFC=136,unused3=000] 142500 0 777000000000000022339d00ce516a3b805511112fcbd408004ae6006c0014206a8900a002003120150f272d8d263015 [TRP_WRA=047,TRP_RDA=022,unused1=000,TRP=019,unused2=000,TRFC=141,unused3=000] 150000 0 777000000000000022339d00ce516a3d9055111230cb4409004ae600740114206a8900a002003120150f292f94273116 [TRP_WRA=048,TRP_RDA=022,unused1=000,TRP=019,unused2=000,TRFC=148,unused3=000] 162500 0 999000000000000022559d0010de7b4480551312b78c450a004c0601750414206a8900a00200312018112d34a42a3816 [TRP_WRA=055,TRP_RDA=025,unused1=000,TRP=022,unused2=000,TRFC=164,unused3=000] 175000 0 999000000000000022559d0031627c489055131339cdd50a004c06017d0514206a8900a00200312019123037ad2c3a17 [TRP_WRA=057,TRP_RDA=026,unused1=000,TRP=023,unused2=000,TRFC=173,unused3=000] 200000 0 bbb000000000000022889d0073ee8d53805515133ecf560c004e26017e0514206a8900a0020031201c143840c5303f17 [TRP_WRA=062,TRP_RDA=030,unused1=000,TRP=027,unused2=000,TRFC=197,unused3=000] R9390(Hynix H5GC4H24AJR) 40000 0 555133200000000084941212f0540b07150973020020410022dd1c081b0414209a8800a00000012006050d0e270f160e [TRP_WRA=021,unused1=000,TRP_RDA=009,unused2=000,TRP=006,TRFC=039,unused3=000] 80000 0 7771332000000000e7ac352210550d0a208ef5040024810022dd1c08340914209a8800a0000001200c08171b4f172110 [TRP_WRA=032,unused1=000,TRP_RDA=014,unused2=000,TRP=011,TRFC=079,unused3=000] 90000 0 77713320000000002931462620550e0b220f96050026a20022dd1c083c0a1420aa8800a0000001200d0a1a1d59192311 [TRP_WRA=034,unused1=000,TRP_RDA=015,unused2=000,TRP=012,TRFC=089,unused3=000] 100000 0 777133200000000029b5462930550e0c249026060026a20022dd1c08440b1420aa8800a0000001200e0a1c20621b2511 [TRP_WRA=036,unused1=000,TRP_RDA=016,unused2=000,TRP=013,TRFC=098,unused3=000] 112500 0 77713320000000006bbd572f40550f0d2892f7060048c50022ff1c084c0d14205a8900a000000120100c20246f1e2912 [TRP_WRA=040,unused1=000,TRP_RDA=018,unused2=000,TRP=015,TRFC=111,unused3=000] 125000 0 77713320000000008cc5583460550f0f2c94b8070048c50022ff1c085c0f14205a8900a000000120120d23287b222d13 [TRP_WRA=044,unused1=000,TRP_RDA=020,unused2=000,TRP=017,TRFC=123,unused3=000] 137500 0 7771332000000000cecd5939805511112e1589080048c60022339d086c0014206a8900a002000120140f262b88252f15 [TRP_WRA=046,unused1=000,TRP_RDA=021,unused2=000,TRP=018,TRFC=136,unused3=000] 142500 0 7771332000000000ce516a3b805511112f96d908004ae60022339d086c0014206a8900a002000120150f272d8d263015 [TRP_WRA=047,unused1=000,TRP_RDA=022,unused2=000,TRP=019,TRFC=141,unused3=000] 150000 0 7771332000000000ce516a3d9055111230964909004ae60022339d08740114206a8900a002000120150f292f94273116 [TRP_WRA=048,unused1=000,TRP_RDA=022,unused2=000,TRP=019,TRFC=148,unused3=000] 162500 0 999133200000000010de7b448055131237194b0a004c060122559d08750414206a8900a00200012018112d34a42a3816 [TRP_WRA=055,unused1=000,TRP_RDA=025,unused2=000,TRP=022,TRFC=164,unused3=000] 175000 0 999133200000000031627c4890551313399adb0a004c060122559d087d0514206a8900a00200012019123037ad2c3a17 [TRP_WRA=057,unused1=000,TRP_RDA=026,unused2=000,TRP=023,TRFC=173,unused3=000] 200000 0 bbb133200000000073ee8d53805515133e9e5d0c004e260122889d087e0514206a8900a0020001201c143840c5303f17 [TRP_WRA=062,unused1=000,TRP_RDA=030,unused2=000,TRP=027,TRFC=197,unused3=000]
Registers in RX and preRX obviously at different offsets but additionally there is no way to decode MISC with same decoder to produce reasonably similar values for same memory type in RX and R9 cards. EDIT: For whose who is wondering why TRP_WRA=006 for Elpida in R9 my theory is that it is a bug in the bios (6 bits was designated for field) and 64 from (70=64+6) was cut off. EDIT: Thinking about it i think realignment of MISC parts was done to give TRP one additional bit. In Elpida at higher straps TRP=029 which is almost overflow.
|
|
|
|
doktor83
|
|
March 21, 2017, 05:56:33 PM |
|
yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32
|
|
|
|
nerdralph
|
|
March 21, 2017, 05:58:18 PM |
|
I've started doing the detailed analysis on memory timing for Eth mining.
With tRRD=6, tRC=62, tCL=21 and 2000 mem clock, I can get almost 27Mh/s mining eth. Each hash takes 64 random DAG reads of 128 bytes each, and since they are random, each read should be from a different page. As well, the L2 cache hit rate should be near 0, so each DAG access requires a read from GDDR (2x32-byte reads from 2 GDDR chips).
Before reading, a page (row) has to be activated(opened), so 27Mh * 64 activate = 1728M activates per second. The Rx470/480 has 4 independent cache controllers, so a single GDDR5 chip will open 432M pages per second. With a 2Ghz mem clock, that's about 5 (4.73) clocks per activate. The closer that gets to 4, the better. Lower than 4 is not possible with Eth mining, since it takes 4 clocks to transfer 64 bytes (half a DAG entry). Note that if tRRD=6, means 6 clocks, some other timing factor is allowing the RAM to sustain <5 clocks per activate
I tried tRRD=5, and it only makes a small (~1%) improvement. That makes sense, since RRD is the delay between 2 activate commands when they are going to different banks. With only 16 banks, the memory controller has lots of opportunity to batch activate commands together in the same bank. However tRC is defined as, "The minimum time interval between two successive ACTIVE commands on the same bank". With tRC=62, the fastest access pattern would be to spread the accesses across different banks rather than batching them in the same bank.
So it seems I'm missing something about how the RAM timing. I know there are multiple clocks for GDDR5, and some run at double data rate (i.e. WCK). If tRRD=6 means six DDR address clocks, that would be 3 SDR command clocks (2Ghz is the command clock rate).
|
|
|
|
niko2004x
Member
Offline
Activity: 126
Merit: 10
|
|
March 21, 2017, 06:03:14 PM |
|
yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32 3 highest bits are unused anyway (so difference between 31 and 32 is irrelevant).
|
|
|
|
nerdralph
|
|
March 21, 2017, 06:06:30 PM |
|
yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32 I think the linux kernel asic reg headers are misleading. As far as I can tell the straps are copied into 32-bit registers, and therefore the mask and offset definitions have no functional effect. Some of the old register names can't even be found in the GDDR5 datasheets. For example you won't find tR2R in the Hynix datasheet, but you will find tCCDL and tCCDS. I suspect what the Linux headers refer to as tR2R may actually be tCCDS.
|
|
|
|
laik2
|
|
March 21, 2017, 06:44:26 PM |
|
yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32 3 highest bits are unused anyway (so difference between 31 and 32 is irrelevant). And whats the correct structure for MC_SEQ_MISC_TIMING according to your decoding tool for RX series?
|
|
|
|
niko2004x
Member
Offline
Activity: 126
Merit: 10
|
|
March 21, 2017, 06:45:22 PM |
|
yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32 I think the linux kernel asic reg headers are misleading. As far as I can tell the straps are copied into 32-bit registers, and therefore the mask and offset definitions have no functional effect. Some of the old register names can't even be found in the GDDR5 datasheets. For example you won't find tR2R in the Hynix datasheet, but you will find tCCDL and tCCDS. I suspect what the Linux headers refer to as tR2R may actually be tCCDS. Well, you could be right. But linked Hynix H5GQ2H24AFR (last seen in R9 290) is dated by 2009 and linux header is more recent (although if data is up to date here is questionable) and from my point of view it is about which one is more deprecated.
|
|
|
|
niko2004x
Member
Offline
Activity: 126
Merit: 10
|
|
March 21, 2017, 06:46:01 PM |
|
yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32 3 highest bits are unused anyway (so difference between 31 and 32 is irrelevant). And whats the correct structure for MC_SEQ_MISC_TIMING according to your decoding tool for RX series? As stated in atom_rom_timings.py in git.
|
|
|
|
|