Bitcoin Forum
May 12, 2024, 05:19:33 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [18] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 »
  Print  
Author Topic: Custom RAM Timings for GPU's with GDDR5 - DOWNLOAD LINKS - UPDATED  (Read 155460 times)
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
March 21, 2017, 06:47:14 PM
 #341

yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32 Smiley

3 highest bits are unused anyway (so difference between 31 and 32 is irrelevant).

And whats the correct structure for MC_SEQ_MISC_TIMING according to your decoding tool for RX series?

As stated in atom_rom_timings.py in git.
Oh Gosh...I'm an idiot I forgot u've made it public Cheesy

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
1715491173
Hero Member
*
Offline Offline

Posts: 1715491173

View Profile Personal Message (Offline)

Ignore
1715491173
Reply with quote  #2

1715491173
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715491173
Hero Member
*
Offline Offline

Posts: 1715491173

View Profile Personal Message (Offline)

Ignore
1715491173
Reply with quote  #2

1715491173
Report to moderator
1715491173
Hero Member
*
Offline Offline

Posts: 1715491173

View Profile Personal Message (Offline)

Ignore
1715491173
Reply with quote  #2

1715491173
Report to moderator
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
March 21, 2017, 07:37:11 PM
 #342

yeah, just wanted to ask which one is accurate, ohgod's or niko's MISC_TIMING cause one is 31 bits the other one is 32 Smiley

I think the linux kernel asic reg headers are misleading.  As far as I can tell the straps are copied into 32-bit registers, and therefore the mask and offset definitions have no functional effect.
Some of the old register names can't even be found in the GDDR5 datasheets.  For example you won't find tR2R in the Hynix datasheet, but you will find tCCDL and tCCDS.  I suspect what the Linux headers refer to as tR2R may actually be tCCDS.

Well, you could be right.
But linked Hynix H5GQ2H24AFR (last seen in R9 290) is dated by 2009 and
linux header is more recent (although if data is up to date here is questionable)
and from my point of view it is about which one is more deprecated.

I'm confident tCCDS is part of the JEDEC GDDR5 spec, and therefore not unique to the H5GQ2H24AFR.
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
March 21, 2017, 08:01:21 PM
 #343

I've been looking at more of the strap data that most people ignore, and I'm thinking there could be some important data there.  Based on some information in a PM, after MC_SEQ_MISC_TIMING2 comes MC_SEQ_MISC1, which supposedly contains mode register 1/0, and the next 32 bits is mode register 5/4.  GDDR5 specifies 10 Mode Registers to define the specific mode of operation.  Some of the mode register data appears to be duplicated elsewhere in the strap, while some is not.  For example MC_SEQ_MISC_TIMING2 has CRC read/write latency, and that is part of MR4 (2 bits for read latency and 3 bits for write latency).

I seem to recall both Wolf and Eliovp mentioning there is some important timing in the mode registers.
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
March 21, 2017, 08:26:32 PM
 #344

I've been looking at more of the strap data that most people ignore, and I'm thinking there could be some important data there.  Based on some information in a PM, after MC_SEQ_MISC_TIMING2 comes MC_SEQ_MISC1, which supposedly contains mode register 1/0, and the next 32 bits is mode register 5/4.  GDDR5 specifies 10 Mode Registers to define the specific mode of operation.  Some of the mode register data appears to be duplicated elsewhere in the strap, while some is not.  For example MC_SEQ_MISC_TIMING2 has CRC read/write latency, and that is part of MR4 (2 bits for read latency and 3 bits for write latency).

I seem to recall both Wolf and Eliovp mentioning there is some important timing in the mode registers.

If you've read the document you mentioned then you should know what's in there already.
ElioVP exposed Mode registers to you.

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
March 21, 2017, 09:20:06 PM
 #345

I've started doing the detailed analysis on memory timing for Eth mining.

With tRRD=6, tRC=62, tCL=21 and 2000 mem clock, I can get almost 27Mh/s mining eth.  Each hash takes 64 random DAG reads of 128 bytes each, and since they are random, each read should be from a different page.  As well, the L2 cache hit rate should be near 0, so each DAG access requires a read from GDDR (2x32-byte reads from 2 GDDR chips).

Before reading, a page (row) has to be activated(opened), so 27Mh * 64 activate = 1728M activates per second.  The Rx470/480 has 4 independent cache controllers, so a single GDDR5 chip will open 432M pages per second.  With a 2Ghz mem clock, that's about 5 (4.73) clocks per activate.  The closer that gets to 4, the better.  Lower than 4 is not possible with Eth mining, since it takes 4 clocks to transfer 64 bytes (half a DAG entry).  Note that if tRRD=6, means 6 clocks, some other timing factor is allowing the RAM to sustain <5 clocks per activate

I tried tRRD=5, and it only makes a small (~1%) improvement.  That makes sense, since RRD is the delay between 2 activate commands when they are going to different banks.  With only 16 banks, the memory controller has lots of opportunity to batch activate commands together in the same bank.  However tRC is defined as, "The minimum time interval between two successive ACTIVE commands on the same bank".  With tRC=62, the fastest access pattern would be to spread the accesses across different banks rather than batching them in the same bank.

So it seems I'm missing something about how the RAM timing.  I know there are multiple clocks for GDDR5, and some run at double data rate (i.e. WCK).  If tRRD=6 means six DDR address clocks, that would be 3 SDR command clocks (2Ghz is the command clock rate).

The GDDR5 specs refer to tRRDL (same bank group) and tRRDS (different bank group or bank groups disabled).  Maybe what people are labeling tRRD is tRRDS, and some other data in the strap is tRRDL=4.
I tried reducing tRRD in SEQ_RAS_TIMING from 5 to 4, and don't see any improvement.  I should be able to get ~29Mh with fully optimized timing at 2000 mem clock, but so far I can't get much more than 27.
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
March 21, 2017, 09:34:38 PM
 #346

I've started doing the detailed analysis on memory timing for Eth mining.

With tRRD=6, tRC=62, tCL=21 and 2000 mem clock, I can get almost 27Mh/s mining eth.  Each hash takes 64 random DAG reads of 128 bytes each, and since they are random, each read should be from a different page.  As well, the L2 cache hit rate should be near 0, so each DAG access requires a read from GDDR (2x32-byte reads from 2 GDDR chips).

Before reading, a page (row) has to be activated(opened), so 27Mh * 64 activate = 1728M activates per second.  The Rx470/480 has 4 independent cache controllers, so a single GDDR5 chip will open 432M pages per second.  With a 2Ghz mem clock, that's about 5 (4.73) clocks per activate.  The closer that gets to 4, the better.  Lower than 4 is not possible with Eth mining, since it takes 4 clocks to transfer 64 bytes (half a DAG entry).  Note that if tRRD=6, means 6 clocks, some other timing factor is allowing the RAM to sustain <5 clocks per activate

I tried tRRD=5, and it only makes a small (~1%) improvement.  That makes sense, since RRD is the delay between 2 activate commands when they are going to different banks.  With only 16 banks, the memory controller has lots of opportunity to batch activate commands together in the same bank.  However tRC is defined as, "The minimum time interval between two successive ACTIVE commands on the same bank".  With tRC=62, the fastest access pattern would be to spread the accesses across different banks rather than batching them in the same bank.

So it seems I'm missing something about how the RAM timing.  I know there are multiple clocks for GDDR5, and some run at double data rate (i.e. WCK).  If tRRD=6 means six DDR address clocks, that would be 3 SDR command clocks (2Ghz is the command clock rate).

The GDDR5 specs refer to tRRDL (same bank group) and tRRDS (different bank group or bank groups disabled).  Maybe what people are labeling tRRD is tRRDS, and some other data in the strap is tRRDL=4.
I tried reducing tRRD in SEQ_RAS_TIMING from 5 to 4, and don't see any improvement.  I should be able to get ~29Mh with fully optimized timing at 2000 mem clock, but so far I can't get much more than 27.

Keep in mind that there is huge diff linux/windows and amdgpu-pro <16.60, I've wrote u on zawawa's thread to update kernel to 4.10/4.11 and install only amdgpu-pro 16.60 ocl packages and their deps. Hashrate will increase +1.2MH guaranteed. Also ras/cas timings must be equally calculated for better stability. MC_SEQ_MISC_TIMING contains tRP value which combined with tCL equals tRAS. By raising memory above 2000 you should increase refresh rate and keep read/write operations at the same level or close. ARB_DRAM timings can improve stability on driver level also. If u get 29MH on 2000 it means your timings are too tight and won't be as stable as you think.

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
March 21, 2017, 09:40:53 PM
 #347

Keep in mind that there is huge diff linux/windows and amdgpu-pro <16.60, I've wrote u on zawawa's thread to update kernel to 4.10/4.11 and install only amdgpu-pro 16.60 ocl packages and their deps. Hashrate will increase +1.2MH guaranteed.

I saw that but I didn't catch that performance is significantly better on 4.10/16.60 than 4.8/16.40.  I've only got 2 Rx470 cards in that rig, and have been meaning to drop in a R9 380 to test out AMDGPU-Pro's Tonga support.  Maybe I'll upgrade to 16.60 as well...
lexele
Full Member
***
Offline Offline

Activity: 190
Merit: 100


View Profile
March 21, 2017, 10:18:33 PM
 #348


The GDDR5 specs refer to tRRDL (same bank group) and tRRDS (different bank group or bank groups disabled).  Maybe what people are labeling tRRD is tRRDS, and some other data in the strap is tRRDL=4.
I tried reducing tRRD in SEQ_RAS_TIMING from 5 to 4, and don't see any improvement.  I should be able to get ~29Mh with fully optimized timing at 2000 mem clock, but so far I can't get much more than 27.


On Rx 470/480 I get close to 29Mh by using 1375 straps on 2000 mem clock. By comparing the mem straps you could get some hints.
kilo17 (OP)
Legendary
*
Offline Offline

Activity: 980
Merit: 1001

aka "whocares"


View Profile
March 21, 2017, 10:25:13 PM
 #349

I've started doing the detailed analysis on memory timing for Eth mining.

With tRRD=6, tRC=62, tCL=21 and 2000 mem clock, I can get almost 27Mh/s mining eth.  Each hash takes 64 random DAG reads of 128 bytes each, and since they are random, each read should be from a different page.  As well, the L2 cache hit rate should be near 0, so each DAG access requires a read from GDDR (2x32-byte reads from 2 GDDR chips).

Before reading, a page (row) has to be activated(opened), so 27Mh * 64 activate = 1728M activates per second.  The Rx470/480 has 4 independent cache controllers, so a single GDDR5 chip will open 432M pages per second.  With a 2Ghz mem clock, that's about 5 (4.73) clocks per activate.  The closer that gets to 4, the better.  Lower than 4 is not possible with Eth mining, since it takes 4 clocks to transfer 64 bytes (half a DAG entry).  Note that if tRRD=6, means 6 clocks, some other timing factor is allowing the RAM to sustain <5 clocks per activate

I tried tRRD=5, and it only makes a small (~1%) improvement.  That makes sense, since RRD is the delay between 2 activate commands when they are going to different banks.  With only 16 banks, the memory controller has lots of opportunity to batch activate commands together in the same bank.  However tRC is defined as, "The minimum time interval between two successive ACTIVE commands on the same bank".  With tRC=62, the fastest access pattern would be to spread the accesses across different banks rather than batching them in the same bank.

So it seems I'm missing something about how the RAM timing.  I know there are multiple clocks for GDDR5, and some run at double data rate (i.e. WCK).  If tRRD=6 means six DDR address clocks, that would be 3 SDR command clocks (2Ghz is the command clock rate).

The GDDR5 specs refer to tRRDL (same bank group) and tRRDS (different bank group or bank groups disabled).  Maybe what people are labeling tRRD is tRRDS, and some other data in the strap is tRRDL=4.
I tried reducing tRRD in SEQ_RAS_TIMING from 5 to 4, and don't see any improvement.  I should be able to get ~29Mh with fully optimized timing at 2000 mem clock, but so far I can't get much more than 27.

Keep in mind that there is huge diff linux/windows and amdgpu-pro <16.60, I've wrote u on zawawa's thread to update kernel to 4.10/4.11 and install only amdgpu-pro 16.60 ocl packages and their deps. Hashrate will increase +1.2MH guaranteed. Also ras/cas timings must be equally calculated for better stability. MC_SEQ_MISC_TIMING contains tRP value which combined with tCL equals tRAS. By raising memory above 2000 you should increase refresh rate and keep read/write operations at the same level or close. ARB_DRAM timings can improve stability on driver level also. If u get 29MH on 2000 it means your timings are too tight and won't be as stable as you think.

The other option is to loosen them up a bit and increase the clock and add in a + offset.  I am mostly referring to loosening the tRC to increase the stability and then change the tRAS and tRP accordingly.  Also trying to maintain the "normal" parameters is helpful when altering the other values.  For example, the common equation tRAS = tCL + tRCD + tRP -1 will help maintain the stabilty as well.  

I have been messing around with these values for a while now and from my experience the best results are getting the stability I need first for a given target and then adjusting the "advanced" timings

Lastly, I haven't had much success or seen a lot of benefit changing the Read to Write delay or Write to Read delay but I am sure it can be helpful if the other timings are synced.

Bitcoin Will Only Succeed If The Community That Supports It Gets Support - Support Home Miners & Mining
kilo17 (OP)
Legendary
*
Offline Offline

Activity: 980
Merit: 1001

aka "whocares"


View Profile
March 21, 2017, 10:26:04 PM
 #350

Sometimes loosening the timings is better, and clocking higher - specifically on Eth.

Great minds think alike  Wink

Bitcoin Will Only Succeed If The Community That Supports It Gets Support - Support Home Miners & Mining
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
March 21, 2017, 10:54:17 PM
 #351

Sometimes loosening the timings is better, and clocking higher - specifically on Eth.

Great minds think alike  Wink
And are modest Cheesy
tRAS on hynix and samsung is 0, Only Elpida and Micron seem to have tRAS value by default.
I haven't tried adding it on hynix but it might give some improvements on samsung.

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
kilo17 (OP)
Legendary
*
Offline Offline

Activity: 980
Merit: 1001

aka "whocares"


View Profile
March 22, 2017, 12:00:41 AM
 #352

Sometimes loosening the timings is better, and clocking higher - specifically on Eth.

Great minds think alike  Wink
And are modest Cheesy
tRAS on hynix and samsung is 0, Only Elpida and Micron seem to have tRAS value by default.
I haven't tried adding it on hynix but it might give some improvements on samsung.

I cannot remember off the top of my head but a think I have changed it on Samsung, or maybe I changed the other values to essentially imply a value.  I will log on later and look.

Bitcoin Will Only Succeed If The Community That Supports It Gets Support - Support Home Miners & Mining
nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
March 22, 2017, 12:43:29 AM
 #353

Sometimes loosening the timings is better, and clocking higher - specifically on Eth.

I've seen you mention loosening the CAS timings.  I tried bumping up tCL by 1, but still get crashes on the K4G4 at 2100.  So is it just loosening tCL that usually does the trick, or something else too?

nerdralph
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
March 22, 2017, 12:46:17 AM
 #354

Keep in mind that there is huge diff linux/windows and amdgpu-pro <16.60, I've wrote u on zawawa's thread to update kernel to 4.10/4.11 and install only amdgpu-pro 16.60 ocl packages and their deps. Hashrate will increase +1.2MH guaranteed. Also ras/cas timings must be equally calculated for better stability. MC_SEQ_MISC_TIMING contains tRP value which combined with tCL equals tRAS. By raising memory above 2000 you should increase refresh rate and keep read/write operations at the same level or close. ARB_DRAM timings can improve stability on driver level also. If u get 29MH on 2000 it means your timings are too tight and won't be as stable as you think.

Looks like part of the problem was throttling.  I bumped up the TDP & TDC to 90W/90A, and am getting just over 27.5@2000.  If 16.60 really gives me a 1.2Mh boost, I'll be pretty close to 29.
OhGodAGirl
Full Member
***
Offline Offline

Activity: 199
Merit: 108

Look, I'm really not that interesting. Promise.


View Profile WWW
March 22, 2017, 02:33:43 AM
 #355

"If u get 29MH on 2000 it means your timings are too tight and won't be as stable"

lpedretti
Full Member
***
Offline Offline

Activity: 152
Merit: 100


View Profile
March 22, 2017, 04:36:40 AM
 #356

Sometimes loosening the timings is better, and clocking higher - specifically on Eth.

I've seen you mention loosening the CAS timings.  I tried bumping up tCL by 1, but still get crashes on the K4G4 at 2100.  So is it just loosening tCL that usually does the trick, or something else too?



You have to loosen it on the DRAM, too - you're loosening the tCL on the ASIC, but not the DRAM, throwing them off.

Weren't the straps that control the dram settings precisely? You have to change values somewhere else?

AC: ANuRoFPkCjZSxsw2S41djrrA1D4xMMmwhs
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
March 22, 2017, 06:52:26 AM
 #357

"If u get 29MH on 2000 it means your timings are too tight and won't be as stable"





This is not

And we cannot do it


Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
doktor83
Hero Member
*****
Offline Offline

Activity: 2520
Merit: 626


View Profile WWW
March 22, 2017, 07:19:21 AM
 #358

"If u get 29MH on 2000 it means your timings are too tight and won't be as stable"





This is not

And we cannot do it








Cheesy

SRBMiner-MULTI thread - HERE
http://www.srbminer.com
laik2
Sr. Member
****
Offline Offline

Activity: 652
Merit: 266



View Profile WWW
March 22, 2017, 07:41:32 AM
 #359

I was implying nerdralphs current strap.

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/
doktor83
Hero Member
*****
Offline Offline

Activity: 2520
Merit: 626


View Profile WWW
March 22, 2017, 07:42:54 AM
 #360

dont be rude, you were too generic : "If u get 29MH on 2000 it means your timings are too tight and won't be as stable"

SRBMiner-MULTI thread - HERE
http://www.srbminer.com
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [18] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!