Bitcoin Forum
June 25, 2024, 12:07:46 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2]  All
  Print  
Author Topic: RANDOM-X on XEON... CACHE, FREQ'S OR CORES?  (Read 490 times)
wacko
Legendary
*
Offline Offline

Activity: 1106
Merit: 1014


View Profile
March 11, 2021, 04:10:42 PM
 #21

Could someone explain why 10-core Ivy Bridge CPUs seem to show the same hashrates as the 12-core CPUs? I'm trying to pick processors for a couple of simple home servers (dual 2011), that I'd also use for mining while it's profitable, and looking at the benchmarks on the xmrig website - I'm confused. E5-2680V2 is a 115W TDP 10-core 2.8 GHz base that's supposed to hit 3.1 GHz all-core turbo, and E5-2696V2 is a 120W TDP 12-core 2.5 GHz that is also supposed to hit 3.1 GHz all-core turbo. So they look identical other than 10 threads vs 12 threads, but xmrig benchmarks suggest that they both only hash at ~ 5k each. I figured if the 10-core does 5k, then the 12-core should be about 6k?

Am I missing something here? I thought maybe 2696V2 doesn't reach the 3.1 GHz turbo during mining for some reason, and that's why it shows ~ the same hashrate as 2680V2 - cause the latter has higher base clock. But then I looked at the benchmarks of 2697V2 (which has higher than 2696V2 base of 2.7 GHz, but lower turbo of 3.0 GHz), and it's the same thing - also hovers around 5k per CPU. They're all 256KB L2 cache, 25/30MB L3 cache, yet extra 2 cores don't seem to bring any hashrate improvements? Is this a RandomX thing, or something with XMRig miner or benchmarks?
sech1
Member
**
Offline Offline

Activity: 116
Merit: 66


View Profile WWW
March 12, 2021, 08:46:10 AM
 #22

It's something to do with L3 cache on Intel CPUs, it doesn't scale well with more cores. I observed similar problems with 6-core vs 4-core Xeons.
wacko
Legendary
*
Offline Offline

Activity: 1106
Merit: 1014


View Profile
March 12, 2021, 02:03:47 PM
 #23

It's something to do with L3 cache on Intel CPUs, it doesn't scale well with more cores. I observed similar problems with 6-core vs 4-core Xeons.
What could it have to do with L3 cache? RandomX needs 256KB of L2 and 2MB of L3 for every thread. None of the Ivy Bridge EP CPUs are limited by L3 as far as I can see, every single E5 v2 CPU has more than 2MB of L3 per core, so should scale with increased core count just fine. I'm looking at 4-cores vs 6-cores v2, and the scale is pretty much linear.

Here's a quad 2637v2 (3.5GHz base, 3.6GHz turbo):
https://xmrig.com/benchmark?cpu=Intel%28R%29+Xeon%28R%29+CPU+E5-2637+v2+%40+3.50GHz
And here's a hexa 2643v2 (also 3.5GHz base, 3.6GHz turbo):
https://xmrig.com/benchmark?cpu=Intel%28R%29+Xeon%28R%29+CPU+E5-2643+v2+%40+3.50GHz

Each thread hashes at ~ 550-600 H/s, resulting in ~ 4.5Kh for a pair of quads and 6.8-7Kh for a pair of hexa CPUs. Compare these to the 8-core 2667V2, and it also does ~ 550H per thread, or 8.7-9KH for a pair:
https://xmrig.com/benchmark?cpu=Intel%28R%29+Xeon%28R%29+CPU+E5-2667+v2+%40+3.30GHz

None of the 10-core Ivy EP CPUs are clocked as high as 3.6 GHz, so no direct comparison can be done here, the fastest 2690V2 is 3.0GHz base / 3.3GHz turbo, and it does 460H per thread, or 9.2KH for a pair:
https://xmrig.com/benchmark?cpu=Intel%28R%29+Xeon%28R%29+CPU+E5-2690+v2+%40+3.00GHz
So it also seems to scale, especially if the turbo wasn't working right on that one (and the memory was at 1066MHz). A slower 2680v2 10-core does up to 515H per thread, and it's only 2.8/3.1GHz cpu:
https://xmrig.com/benchmark?cpu=Intel%28R%29+Xeon%28R%29+CPU+E5-2680+v2+%40+2.80GHz

So at least up to 10 cores the scaling looks pretty much linear. It's the 12-core CPUs that aren't any faster for some reason, at least by looking at those benchmarks. 2696V2 is supposed to work at the same 3.1GHz turbo as 2680V2. If the latter does 10KH for a pair, then why wouldn't a pair of 2696V2 do 12KH? Do they throttle and not reach 3.1GHz all-core turbo? It's only 5W difference in TDP on paper between 2696v2 and 2680v2, but most likely a bit more in real power draw. Too bad xmrig benchmarks don't show the actual clocks. Sad
JayDDee
Full Member
***
Offline Offline

Activity: 1400
Merit: 222


View Profile
March 12, 2021, 04:36:07 PM
 #24

You're counting physical cores but the CPUs are hyperthreaded.

Divide the L3 cache size by 2M and that's the optimum number of threads to run.
Any more and total hashrate starts to drop.

The question was answered early on in this thread.

wacko
Legendary
*
Offline Offline

Activity: 1106
Merit: 1014


View Profile
March 12, 2021, 04:45:45 PM
 #25

You're counting physical cores but the CPUs are hyperthreaded.

Divide the L3 cache size by 2M and that's the optimum number of threads to run.
Any more and total hashrate starts to drop.

The question was answered early on in this thread.
I'm not following. What does HT have to do with this? It's never used for mining with E5 v1/v2 - either it's disabled in BIOS, or the miner's threads are bound to the physical cores. Of course I'm "counting physical cores", cause that's what matters in RandomX mining with these Xeons. Every single E5 V2 CPU in existence has 256KB of L2 and 2M+ of L3 per physical core, so that's how they're used for mining - with the number of miner's threads equal to the number of cores. All the benchmarks out there are like that, number of threads = number of physical cores. I don't understand your post and my question was not answered in this thread, it has nothing to do with HT whatsoever.
JayDDee
Full Member
***
Offline Offline

Activity: 1400
Merit: 222


View Profile
March 12, 2021, 05:33:48 PM
 #26

It's never used for mining with E5 v1/v2 - either it's disabled in BIOS, or the miner's threads are bound to the physical cores.

That's just you. Don't assume everyone disables HT, most don't because it helps compute bound algos. The number of physical cores
is irrelevant, it's the number of miner threads, whether HT is enabled or not.

Whether by design or by coincidence most CPUs have around 2MB of cache per physical core and RandomX has been
engineered to that spec.


wacko
Legendary
*
Offline Offline

Activity: 1106
Merit: 1014


View Profile
March 12, 2021, 07:40:12 PM
 #27

That's just you. Don't assume everyone disables HT, most don't because it helps compute bound algos.
I'm still not sure what HT has to do with this. Whether it's on or off doesn't really matter since we're talking E5 CPUs and xmrig. Every E5 -EP CPU from v1 to v4 is the same in regards to cache sizes: 256KB L2 per every core, and 2+MB L3 per every core. It's like this for literally every single E5 Xeon: Sandy Bridge-EP (v1), Ivy Bridge-EP (v2), Haswell-EP (v3) and Broadwell-EP (v4) - only Skylake has brought change to this, but that's also when they dropped "E5" name. So when we're talking E5 Xeons, they're all the same and they're all limited by L2 cache. L3 cache is basically irrelevant, yet everyone and their mom keeps talking about 2MB of L3 cache like it matters - it doesn't.

Whether by design or by coincidence most CPUs have around 2MB of cache per physical core
But they don't. 99% of E5 CPUs have exactly 2.5MB of L3 cache per core, not 2MB, only a few oddballs like E5-1650 or E5-4607 have 2MB per core. Some have even 3+ MB of L3 per core, like E5-2667v2, 2673v2, 2687wv2 etc. It doesn't matter anyway, cause all of them are limited by L2 cache size: it's 256KB per physical core, and therefore number of threads for the miner is exactly the same as the number of physical cores.

The number of physical cores is irrelevant, it's the number of miner threads, whether HT is enabled or not.
The number of physical cores is not irrelevant, it's everything actually, cause each miner's thread needs 256KB of L2, and all E5 CPUs have only 256KB of L2 per physical core. Which means that if it's a 6-core CPU, then it's gonna be 6 threads in the miner, if it's a 10-core, then it's 10 miner threads etc. Even though almost all of them have 2.5MB of L3 per core, and some have even more (like those 8-cores with 3.125MB per core) - it doesn't matter cause they're all still limited by L2. A 10-core E5-2680v2 has 25MB of L3 cache, so if one would blindly follow the "2MB of L3 per thread" rule, and tried to run 12 threads - the hashrate would not be higher than with 10 threads. Same with something like E5-2667v2 - it also has 25MB of L3 cache, so up to 12 miner threads is ok then? No - like every other E5, it's limited by L2, and thus the highest hashrate is gonna be with 8 threads.

Number of miner threads = number of physical cores - that's the rule for every single E5 Xeon, cause they're all limited by those 256KB of L2, and not by L3 size (since none of them have less than 2MB of L3 per core). Hyper-Threading is something that is completely irrelevant here, it doesn't matter whether it's on or off, highest hashrate is achieved with number of miner threads = number of physical cores and mining software (at least xmrig) automatically detects it and sets the proper number of threads based on the cpu model, whether HT is on or off.

None of that is new, it's been said in this thread before. But my question has not been "answered early on in this thread", as you said. The question is - why the 10-core E5-2680v2 (3.1GHz all-core turbo) shows the same hashrate as the 12-core E5-2696v2 (3.1 GHz all-core turbo)? Why is there linear scaling from 4 cores to 10 cores, but not from 10 cores to 12 cores? Both E5-2696v2 and E5-2697v2 show about the same (or even lower) hashrates than E5-2680v2, so their single-thread performance is lower for some reason. I just thought maybe someone here have an idea why.
JayDDee
Full Member
***
Offline Offline

Activity: 1400
Merit: 222


View Profile
March 12, 2021, 08:11:53 PM
 #28

You've got the CPUs, do the testing yourself instead of speculating about what they "would" do.

wacko
Legendary
*
Offline Offline

Activity: 1106
Merit: 1014


View Profile
March 12, 2021, 08:22:23 PM
 #29

You've got the CPUs, do the testing yourself instead of speculating about what they "would" do.
I don't have the CPUs, that's why I'm here, as I said in my first post - I'm trying to choose processors for a few home servers, and current candidates are the 10-core 2680v2 and 12-core 2696v2. I do have a few E5 systems around, but none of them have 12-core CPUs, and the only 10-core models I've got are slow 2648Lv2 - so I can't really test anything comparable right now. I'll probably end up with 6 servers in total, so it's 12 CPUs, and the price difference adds up. One of the ways to get back at least some part of the money spent is to use them for mining while it's profitable, hence the question about the hashrates. The benchmarks on xmrig site show ~ the same hashrates between these 2, and I just don't understand why. Thought someone here might have an idea.
JayDDee
Full Member
***
Offline Offline

Activity: 1400
Merit: 222


View Profile
March 12, 2021, 10:26:50 PM
 #30

That's overthinking if mining is not the primary purpose. I certainly wouldn't rely on uncontrolled
mining benchmarks posted by random users.

Also keep in mind the 2MB rule only applies to RandomX and Cryptonight, in case you might want to mine
something else.

Regardless, the decision isn't between one Xeon model vs another, or even Xeon vs Core, but Intel vs AMD.
You could probably get the same level of performnce for a much lower price, regardless of the application,
with 12 or 16 core Ryzens.

astraleureka
Member
**
Offline Offline

Activity: 236
Merit: 16


View Profile
March 13, 2021, 10:45:03 AM
Merited by wacko (1)
 #31

E5-2667v2 is still pretty good for RandomX (especially on lighter versions like for Wownero) - 5 KH/s per CPU is possible
Pages: « 1 [2]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!