Bitcoin Forum
May 09, 2024, 08:50:40 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: optimal eth mining core:mem ratio 0.56  (Read 840 times)
nerdralph (OP)
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
October 26, 2016, 04:44:32 PM
 #1

Eth mining uses the ethash algorithm, which does a keccak hash (essentially sha3-512), 64 x 128-byte random DAG reads, then another keccak hash.  The keccak hashing is GPU intensive, while the DAG reads are memory intensive.  All the public miners use a similar opencl implementation, and so have similar GPU/MEM requirements.  For AMD cards with a 256-bit memory interface (R9 380, Rx 470/480...), that ratio works out to 0.56.  So with a 1500Mhz memory clock, the optimal core clock is 840Mhz.  Increasing the core clock beyond 840 will not increase hashrate, and ends up using more power.  Similarly, a Rx 470 with a 1750Mhz memory clock should have a core clock of 980Mhz.
One additional condition is that the GPU needs enough compute units so that it can saturate the memory bandwidth.  With the publicly available miners, that minimum is around 22-24 compute units, so cards like the R7 370 do not max out their memory bandwidth.

This ratio is not a fundamental limit of the ethash algorithm, so a kernel with a highly optimized keccak implementation (such as Wolf's private kernel) likely has a lower ratio and a lower minimum number of compute units required for optimal performance.  This means you won't see any miner get more than 28Mh from a Rx 470 at 1750Mhz, someone could release a miner that gets its maximum hashrate with a core clock of much lower than 980Mhz, and therefore reduces power consumption.

1715244640
Hero Member
*
Offline Offline

Posts: 1715244640

View Profile Personal Message (Offline)

Ignore
1715244640
Reply with quote  #2

1715244640
Report to moderator
Even if you use Bitcoin through Tor, the way transactions are handled by the network makes anonymity difficult to achieve. Do not expect your transactions to be anonymous unless you really know what you're doing.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715244640
Hero Member
*
Offline Offline

Posts: 1715244640

View Profile Personal Message (Offline)

Ignore
1715244640
Reply with quote  #2

1715244640
Report to moderator
1715244640
Hero Member
*
Offline Offline

Posts: 1715244640

View Profile Personal Message (Offline)

Ignore
1715244640
Reply with quote  #2

1715244640
Report to moderator
1715244640
Hero Member
*
Offline Offline

Posts: 1715244640

View Profile Personal Message (Offline)

Ignore
1715244640
Reply with quote  #2

1715244640
Report to moderator
adaseb
Legendary
*
Offline Offline

Activity: 3752
Merit: 1710



View Profile
October 26, 2016, 04:51:31 PM
 #2

With a 280X setting a core clock of 800Mhz reduces the speed to like 10MH/s from 17MH/s.

.BEST..CHANGE.███████████████
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
███████████████
..BUY/ SELL CRYPTO..
Castreat
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
October 26, 2016, 04:52:09 PM
 #3

But that ratio also depends on the memory timing. With lower latency, the ratio can be increased further.
nerdralph (OP)
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
October 26, 2016, 06:09:32 PM
 #4

With a 280X setting a core clock of 800Mhz reduces the speed to like 10MH/s from 17MH/s.

The 280x has a 384-bit memory interface, not 256.
nerdralph (OP)
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
October 26, 2016, 06:16:21 PM
 #5

But that ratio also depends on the memory timing. With lower latency, the ratio can be increased further.

With stock bios and slower memory timing, you can get away with .54 or .55.  With a tuned bios, going over 0.56 is a waste of power.  It's impossible to get more than 24Mh/s with a 1500Mhz memory clock, no matter how much you tweak the memory timing.  And once you add the overhead for memory refresh and memory bus contention between the compute units, 22.5Mh is about the max you can actually get.
deadsix
Hero Member
*****
Offline Offline

Activity: 751
Merit: 517


Fail to plan, and you plan to fail.


View Profile
October 26, 2016, 06:55:14 PM
 #6

Eth mining uses the ethash algorithm, which does a keccak hash (essentially sha3-512), 64 x 128-byte random DAG reads, then another keccak hash.  The keccak hashing is GPU intensive, while the DAG reads are memory intensive.  All the public miners use a similar opencl implementation, and so have similar GPU/MEM requirements.  For AMD cards with a 256-bit memory interface (R9 380, Rx 470/480...), that ratio works out to 0.56.  So with a 1500Mhz memory clock, the optimal core clock is 840Mhz.  Increasing the core clock beyond 840 will not increase hashrate, and ends up using more power.  Similarly, a Rx 470 with a 1750Mhz memory clock should have a core clock of 980Mhz.
One additional condition is that the GPU needs enough compute units so that it can saturate the memory bandwidth.  With the publicly available miners, that minimum is around 22-24 compute units, so cards like the R7 370 do not max out their memory bandwidth.

This ratio is not a fundamental limit of the ethash algorithm, so a kernel with a highly optimized keccak implementation (such as Wolf's private kernel) likely has a lower ratio and a lower minimum number of compute units required for optimal performance.  This means you won't see any miner get more than 28Mh from a Rx 470 at 1750Mhz, someone could release a miner that gets its maximum hashrate with a core clock of much lower than 980Mhz, and therefore reduces power consumption.



Can confirm, a lot of trial and error over weeks got me to 1050 core/1870 mem as the most efficient for my RX 470's and it fits the 0.56 ratio.

Ethereum/Zcash/Monero Mining Bangalore https://bitcointalk.org/index.php?topic=1703592
adaseb
Legendary
*
Offline Offline

Activity: 3752
Merit: 1710



View Profile
October 26, 2016, 07:11:46 PM
 #7

With a 280X setting a core clock of 800Mhz reduces the speed to like 10MH/s from 17MH/s.

The 280x has a 384-bit memory interface, not 256.


So whats the formula for the 280x? 290?

.BEST..CHANGE.███████████████
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
██
███████████████
..BUY/ SELL CRYPTO..
nerdralph (OP)
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
October 26, 2016, 11:25:59 PM
 #8

But that ratio also depends on the memory timing. With lower latency, the ratio can be increased further.

With stock bios and slower memory timing, you can get away with .54 or .55.  With a tuned bios, going over 0.56 is a waste of power.  It's impossible to get more than 24Mh/s with a 1500Mhz memory clock, no matter how much you tweak the memory timing.  And once you add the overhead for memory refresh and memory bus contention between the compute units, 22.5Mh is about the max you can actually get.


How did you work out the ratio?

Trial and error with some interpolation, starting months ago when I wanted to reduce the power use for some R9 380 cards.  I remember starting at around 750Mhz with a couple R9 380 MSI cards running stock BIOS with memory at 1500.  Despite the >20% drop in core rate from 980Mhz, hashrate only dropped ~10% (around 19.5 from 21.5Mh).  Doing a little math suggested the optimal rate was in the low 800's, so I tried 10Mhz increments from 800 to 850, finding that ~820 was the best.  With another R9 380 that had better stock memory timings, 840/1500 was the best (giving 22-22.5Mh).  After tweaking the BIOS memory timings, with memory at 1600, 900 was the optimal core.  I recently got a R9 385 and a Rx 470, and the 0.56 ratio worked for them as well.

I suspect with your private GCN assembler kernel the optimal ratio would be lower; in the 0.4-0.5 range.
nerdralph (OP)
Sr. Member
****
Offline Offline

Activity: 588
Merit: 251


View Profile
October 26, 2016, 11:34:17 PM
 #9

With a 280X setting a core clock of 800Mhz reduces the speed to like 10MH/s from 17MH/s.

The 280x has a 384-bit memory interface, not 256.

So whats the formula for the 280x? 290?

Only ever had one 280x and sold it months ago.  I have a 290x and have had some 290's before, and with a Stilt BIOS, the optimal ratio is close to 1:1.  I have my 290x clocked at 950/1050, which gives me ~26Mh with Genoil's miner and a fraction over 27 with Wolf's sgminer fork.
restless
Legendary
*
Offline Offline

Activity: 1151
Merit: 1001


View Profile
October 27, 2016, 08:26:26 AM
 #10

I guess older cards (7xxx) have bigger optimal ratio, because their implementation of memory subsystem (cache? latency?) is worse.
For me optimal 7950/7970 ratio is near 2:3, above 0.6 definitely, maybe 0.62-0.64 (like 950/1500).
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!