Bitcoin Forum
November 10, 2024, 03:06:58 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2]  All
  Print  
Author Topic: XMR RandomX mining with Ryzen 3900x  (Read 470 times)
JayDDee
Full Member
***
Offline Offline

Activity: 1424
Merit: 225


View Profile
March 29, 2021, 05:00:32 PM
 #21

I've found noticeable uplifts in performance from tuning RAM timings on my 9900K, 3950X and 5950X systems.

I'll fish out some exact numbers later from the 5950X system. I'm not sure it's 25%, but it's quite a hefty bump, I'm sure it's over 10%.

A breakdown would be interesting to measure the effects of enabling huge pages, disabling next line prefetch,
and adjusting DRAM timing.

Walrusbonzo
Hero Member
*****
Offline Offline

Activity: 682
Merit: 500


View Profile
March 29, 2021, 07:21:46 PM
Last edit: March 29, 2021, 09:12:04 PM by Walrusbonzo
 #22

I've found noticeable uplifts in performance from tuning RAM timings on my 9900K, 3950X and 5950X systems.

I'll fish out some exact numbers later from the 5950X system. I'm not sure it's 25%, but it's quite a hefty bump, I'm sure it's over 10%.

A breakdown would be interesting to measure the effects of enabling huge pages, disabling next line prefetch,
and adjusting DRAM timing.

For 24/7 running of XMRig I set a fixed CPU Vcore of 0.975v. This allows fixed clocks of 4.2GHz on the first 8 cores and 3.85GHz on the last 8 cores and a relatively low CPU package power reading of 133w in HWINFO64. RAM frequency is 1866MHz(3733mt/s) and IF is set to 1:1 with RAM frequency. There shouldn't be any random core boost behaviour affecting these results.

I'm using XMRig 6.10.0 with all 32 threads activated as this provides the best performance.

My 24/7 settings, tuned RAM timings + huge pages ON - MSR Mod ON - 19,830 h/s

Tuned RAM Timings + huge pages ON - MSR Mod OFF - 18,350 h/s

Tuned RAM timings + huge pages OFF - MSR Mod On - 16,250 h/s

Tuned RAM timings + huge pages OFF - MSR Mod OFF - 10,750 h/s

Stock RAM timings + MSR Mod On - 17,480 h/s

Stock RAM timings + MSR Mod OFF -  14,275 h/s

Ordering the performance uplift from each change combination
MSR Mod off to on for Tuned RAM timings gives a 1480h/s uplift, 8%

Stock RAM timings to Tuned RAM timings with MSR Mod on gives a 2350h/s uplift, 13.4%

MSR Mod off to on for Stock RAM timings gives a 3205h/s uplift, 22.5%

Stock RAM timings to Tuned RAM timings with MSR Mod off gives a 4075h/s uplift, 28.5%


It seems for me, tuned RAM timings provide a bigger performance uplift than using MSR Mod. Turning MSR Mod on gives a bigger benefit where stock/poor RAM timings are used.

You're probably wondering what RAM timings I've tuned. I change nearly all primaries and secondaries, some tertiaries too. Command Rate is 1t and GDM is on in both cases.

Stock primaries are 18,20,20,44,92(RC). Tuned I'm running 14,14,14,30,44.

I can't remember stock for the following, but I tune tRRDS 4, tRRDL 6, tWTRS 4, tWTRL 12, tWR 12, tCWL 14, tFAW 16, tRTP 8, tRFC 308

You've probably guessed I'm running RAM sticks with Samsung B-die Wink I'm running 4 sticks as well so Rank interleaving is enabled also. Not sure how much that is helping me.

I wasn't 100% sure what you meant by next line prefetch. Did you mean disabling hardware prefetch in the BIOS or setting scratchpad prefetch mode in the XMRig config.json file to 0 from 1?

EDIT: something that just came to mind, I used to use XMR-Stak, it was a lot more sensitive to RAM timings than XMRig, this was why I switched.
EDIT2: Swapped "huge pages" to "MSR Mod" and added some real huge page on/off results.
JayDDee
Full Member
***
Offline Offline

Activity: 1424
Merit: 225


View Profile
March 29, 2021, 08:07:58 PM
 #23

I wasn't 100% sure what you meant by next line prefetch. Did you mean disabling hardware prefetch in the BIOS or setting scratchpad prefetch mode in the XMRig config.json file to 0 from 1?

Thanks, RAM timing did better than I expected, however, one has to know what one is doing to get it right.

Next line prefetch is when the CPU speculatively prefetches the next cache line to optimize sequential memory access.
It's no good for random access. It's reported as "MSR MOD" by xmrig, need to run as admin.

Xmrig is smart enough to optimize the number of threads, are you sure you were using all 32?

The best performance is usually 1 thread for every 2MB of cache. This matches up with the number
of physical cores on most mainstream CPUs, meaning hyperthreading isn't necessary or helpful.

Walrusbonzo
Hero Member
*****
Offline Offline

Activity: 682
Merit: 500


View Profile
March 29, 2021, 08:51:04 PM
Last edit: March 29, 2021, 09:10:30 PM by Walrusbonzo
 #24

I wasn't 100% sure what you meant by next line prefetch. Did you mean disabling hardware prefetch in the BIOS or setting scratchpad prefetch mode in the XMRig config.json file to 0 from 1?

Thanks, RAM timing did better than I expected, however, one has to know what one is doing to get it right.

Next line prefetch is when the CPU speculatively prefetches the next cache line to optimize sequential memory access.
It's no good for random access. It's reported as "MSR MOD" by xmrig, need to run as admin.

Xmrig is smart enough to optimize the number of threads, are you sure you were using all 32?

The best performance is usually 1 thread for every 2MB of cache. This matches up with the number
of physical cores on most mainstream CPUs, meaning hyperthreading isn't necessary or helpful.

You know what, I realise I've made a big mistake  Roll Eyes.... I've confused the MSR Mod and huge page options! I'll edit my existing post and test with huge pages off tomorrow.

I'm definitely using all 32, same with my 3950x. As you suggest though, this isn't optimal for all CPUs, my 9900k performs better using 8 threads(as opposed to 16) preferably with affinity set to 1 per physical core.

It seems to be a Ryzen thing where performance is better with all threads used.

EDIT: Added some results to original results post.

Tuned RAM timings + huge pages OFF - MSR Mod On - 16,250 h/s

Tuned RAM timings + huge pages OFF - MSR Mod OFF - 10,750 h/s

Seems you are right, enabling huge pages is the biggest performance factor here. Then RAM timings, then MSR mod.
JayDDee
Full Member
***
Offline Offline

Activity: 1424
Merit: 225


View Profile
March 29, 2021, 10:16:47 PM
 #25

I'm definitely using all 32, same with my 3950x. As you suggest though, this isn't optimal for all CPUs, my 9900k performs better using 8 threads(as opposed to 16) preferably with affinity set to 1 per physical core.

It seems to be a Ryzen thing where performance is better with all threads used.

Yeah, I forgot Ryzen doubled the cache with Zen2, a 3950x can definitely use all threads.
My only Ryzen is a 1700.

Regarding the gain by fine tuning the DRAM timing, I was thinking more of DRAM OC which only provides
a marginal improvement in the best of cases. I think of DRAM timing as more of a penalty when it's wrong.
But the difference between wrong and right can be significant.

sech1
Member
**
Offline Offline

Activity: 116
Merit: 66


View Profile WWW
March 29, 2021, 11:17:36 PM
 #26

Randomx is dependant on latency which is why cache performance is more important than DRAM speed. There's no way
increasing DRAM performance, even by 25%, will increase the hashrate by the same amount. Huge pages, on the other
hand, can have that much of an effect by reducing effective memory latency with fewer TLB lookups.

I stand by my advice. There's nothing wrong with faster DRAM, it's just low on the priority list in this case.
If it's bottlenecked by RAM like 3900X/3950X then performance increase from lower latency RAM is almost 1:1. Slower CPUs like 6 and 8 core Ryzens are less affected but they also benefit from low latency RAM.
shinji.link
Newbie
*
Offline Offline

Activity: 5
Merit: 0


View Profile
April 03, 2021, 07:31:30 PM
 #27

I wasn't 100% sure what you meant by next line prefetch. Did you mean disabling hardware prefetch in the BIOS or setting scratchpad prefetch mode in the XMRig config.json file to 0 from 1?

Thanks, RAM timing did better than I expected, however, one has to know what one is doing to get it right.

Next line prefetch is when the CPU speculatively prefetches the next cache line to optimize sequential memory access.
It's no good for random access. It's reported as "MSR MOD" by xmrig, need to run as admin.

Xmrig is smart enough to optimize the number of threads, are you sure you were using all 32?

The best performance is usually 1 thread for every 2MB of cache. This matches up with the number
of physical cores on most mainstream CPUs, meaning hyperthreading isn't necessary or helpful.

You know what, I realise I've made a big mistake  Roll Eyes.... I've confused the MSR Mod and huge page options! I'll edit my existing post and test with huge pages off tomorrow.

I'm definitely using all 32, same with my 3950x. As you suggest though, this isn't optimal for all CPUs, my 9900k performs better using 8 threads(as opposed to 16) preferably with affinity set to 1 per physical core.

It seems to be a Ryzen thing where performance is better with all threads used.

EDIT: Added some results to original results post.

Tuned RAM timings + huge pages OFF - MSR Mod On - 16,250 h/s

Tuned RAM timings + huge pages OFF - MSR Mod OFF - 10,750 h/s

Seems you are right, enabling huge pages is the biggest performance factor here. Then RAM timings, then MSR mod.


fyi, checked my friend's rig with 3900X and run using all 32 with huge pages ON (well, RAM timings is default) produce better performance than Tuned RAM timings first. i can verify that your approach enabling the huge pages made a better performance  Wink
Metroid
Sr. Member
****
Offline Offline

Activity: 2142
Merit: 353


Xtreme Monster


View Profile
April 05, 2021, 09:12:12 AM
 #28

I just bought a 5900x to replace my old 3600, I found it to be stable 24/7 at 4.4ghz 1.14v, temps around 80c, I know, insane temp, I have a dual 2400rpm fans, 240mm watercooling setup, reviewers even said that with their triple fan watercooling setup 360mm, their temps were around 86c, insane, anyway, up to 80c is all right, more is a no go. I used to keep my 3600 around 70c full load, this one 10 degrees Celsius more, mining monero now for a bit cause is a very good stability test.

So with those settings, 24 threads 15300 h/s, memory 20cl and as most of you said latency is important, I found memory cl20 3333mhz to be a lot better than cl16 2666mhz, for cl16 I had to downclock it to 2666mhz. I guess in my system bandwidth was more important than latency, yeah buying a new binned 3600 cl16 memory could improve this hashrate but i don't think by a lot, 32gb of ram 2133mhz cl14 1.2v, overclocked all the way to 3333mhz cl20 1.36v is pretty good. Also mining monero is just stability a test, to me is a waste of time and efforts mining monero 24/7 at moment, not worth making 1.30 usd per day, I paid 600 usd for it, if profit was starting at 5 usd per day then I would think about it. Monero is and has always been a botnet cave.

BTC Address: 1DH4ok85VdFAe47fSVXNVctxkFhUv4ujbR
Pages: « 1 [2]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!