gridecon
Newbie
Offline
Activity: 35
Merit: 0
|
|
August 16, 2010, 03:15:44 AM |
|
I have two quadcore Phenom II 64-bit linux machines (ubuntu 9.10 both) and the -4way option increases my hashing speed so much I'm suspicious. I get about 5-6khash/sec on these boxes previously and without -4way option. With -4way I get over 11khash/sec! In other words, the -4way switch almost DOUBLES the reported hashing speed. This level of improvement seems more than expected and makes me wonder if my boxes are really doing the hashing that much faster or if there could possible be an issue where the math operations are actually being skipped over for some reason, causing illusory speed and an inability to actually generate blocks?
|
|
|
|
Vasiliev
Newbie
Offline
Activity: 55
Merit: 0
|
|
August 16, 2010, 03:17:07 AM |
|
I propose to compile sha256.cpp with -O3 -march=amdfamk10 (will work on 32bit and 64bit) as only CPUs supporting this instruction set (AMD Phenom, Intel i5 and newer) benefit from -4way and it'll improve performance by ~9%.
GCC 4.3.3 doesn't support -march=amdfamk10. I get: sha256.cpp:1: error: bad value (amdfamk10) for -march= switch try -march=amdfam10
|
|
|
|
satoshi (OP)
Founder
Sr. Member
Offline
Activity: 364
Merit: 7193
|
|
August 16, 2010, 03:23:04 AM |
|
try -march=amdfam10
That works. That's strange... are we sure that's the same thing? tcatm, try amdfam10 and make sure you get the same speed measurement.
|
|
|
|
|
lfm
|
|
August 16, 2010, 03:30:35 AM |
|
model name : Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz, linux 64
no difference at about 4950 khash/s
|
|
|
|
jgarzik
Legendary
Offline
Activity: 1596
Merit: 1100
|
|
August 16, 2010, 03:35:28 AM |
|
Update for cpu family : 6 model : 26 model name : Genuine Intel(R) CPU 000 @ 3.20GHz stepping : 4 Machine has 4 cores, each with 2 hyperthreads. /proc/cpuinfo shows 8 virtual processors. without -4way, setgen 4: 5.7 Mhash/sec without -4way, setgen 8: 5.0 Mhash/sec with -4way, setgen 4: 7.0 Mhash/sec with -4way, setgen 8: 9.3 Mhash/sec So, the old wisdom of "hyperthreading slows things down" is now shattered, on this machine.
|
Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own. Visit bloq.com / metronome.io Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
|
|
|
Ground Loop
Member
Offline
Activity: 111
Merit: 10
|
|
August 16, 2010, 04:34:20 AM |
|
No winners for 4way in my other three Intel machines either:
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz (64-bit Linux) 4way: 1565 std: 3002
Intel(R) Xeon(TM) CPU 3.00GHz (32-bit Linux) 4way: 1243 std: 2048
Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz 4way: 932 std: 1733
(All running 0.3.10, -1 proclimit) Experiments with proclimit weren't any better.
|
Bitcoin accepted here: 1HrAmQk9EuH3Ak6ugsw3qi3g23DG6YUNPq
|
|
|
satoshi (OP)
Founder
Sr. Member
Offline
Activity: 364
Merit: 7193
|
|
August 16, 2010, 04:36:59 AM |
|
cpu family : 6 model : 26 model name : Genuine Intel(R) CPU 000 @ 3.20GHz stepping : 4 cpu family 6 model 26 stepping 4 is an Intel Core i7. That's a 23% speedup with -4way, 63% total speedup with -4way + hyperthreading. 33% faster with hyperthreading than without it.
|
|
|
|
NewLibertyStandard
|
|
August 16, 2010, 05:02:31 AM |
|
I have two quadcore Phenom II 64-bit linux machines (ubuntu 9.10 both) and the -4way option increases my hashing speed so much I'm suspicious. I get about 5-6khash/sec on these boxes previously and without -4way option. With -4way I get over 11khash/sec! In other words, the -4way switch almost DOUBLES the reported hashing speed. This level of improvement seems more than expected and makes me wonder if my boxes are really doing the hashing that much faster or if there could possible be an issue where the math operations are actually being skipped over for some reason, causing illusory speed and an inability to actually generate blocks?
o_O... good luck hashing, you're gonna need it! With 4way, I get significantly better performance when I have all my virtual cores enabled. I think I get about the same amount of hashes when hyper threading is turned off with or without 4way.
Hey, you may be onto something! hyperthreading didn't help before because all the work was in the arithmetic and logic units, which the hyperthreads share. tcatm's SSE2 code must be a mix of normal x86 instructions and SSE2 instructions, so while one is doing x86 code, the other can do SSE2. How much of an improvement do you get with hyperthreading? Some numbers? What CPU is that? Here are the results from my very poor memory on an i7 860 2.8 GHz with Ubuntu 10.04 amd64. Some of the numbers may be a bit off. Without 4way, with HT, 4/8 virtual cores, 4.5-5 Mhash/sec Without 4way, with HT, 8/8 virtual cores, a bit less than above, but basically the same With 4way, with HT, 8/8 virtual cores, 6.5-8 Mhash/sec (It may be my imagination, but it seems noticeably more variable.) With 4way, with HT, 4/8 virtual cores, 5-6 Mhash/sec Without 4way, without HT, 4/4 physical cores, 4.5-5 Mhas/sec (But a bit slower than the first result.) With 4way, without HT, 4/4 physical cores, 5-6 Mhash/sec
|
Treazant: A Fullever Rewarding Bitcoin - Backup Your Wallet TODAY to Double Your Money! - Dual Currency Donation Address: 1Dnvwj3hAGSwFPMnkJZvi3KnaqksRPa74p
|
|
|
gridecon
Newbie
Offline
Activity: 35
Merit: 0
|
|
August 16, 2010, 05:30:15 AM |
|
I have two quadcore Phenom II 64-bit linux machines (ubuntu 9.10 both) and the -4way option increases my hashing speed so much I'm suspicious. I get about 5-6khash/sec on these boxes previously and without -4way option. With -4way I get over 11khash/sec! In other words, the -4way switch almost DOUBLES the reported hashing speed. This level of improvement seems more than expected and makes me wonder if my boxes are really doing the hashing that much faster or if there could possible be an issue where the math operations are actually being skipped over for some reason, causing illusory speed and an inability to actually generate blocks?
o_O... good luck hashing, you're gonna need it! I guess that should read either mhash/sec or THOUSANDS of khash/sec...but hey, what's 3 orders of magnitude among friends? Perhaps that typographical error is why nobody has answered whether or not a nearly 100% speeded from the -4way option is at all realistic? I'm not convinced the crypto hashing is really taking place at the rate of 11000khash/sec on my desktop box.
|
|
|
|
jgarzik
Legendary
Offline
Activity: 1596
Merit: 1100
|
|
August 16, 2010, 05:33:56 AM |
|
cpu family : 6 model : 26 model name : Genuine Intel(R) CPU 000 @ 3.20GHz stepping : 4 cpu family 6 model 26 stepping 4 is an Intel Core i7. That's a 23% speedup with -4way, 63% total speedup with -4way + hyperthreading. 33% faster with hyperthreading than without it. Does bitcoin perform any self-tests at startup, to verify that hashing is working?
|
Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own. Visit bloq.com / metronome.io Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
|
|
|
NewLibertyStandard
|
|
August 16, 2010, 06:16:44 AM Last edit: August 16, 2010, 06:40:13 AM by NewLibertyStandard |
|
More importantly, about how long should it take 10 Mhash/sec to verify difficulty 1 blocks? After the 64-bit Linux hashing bug was fixed I generated a block or two in short order, but since that one or two blocks, I have not generated a single block. It's starting to seem a little fishy. I'm currently testing Bitcoin on two Linux 64-bit computers. Is there anything in the code blocking early block verification? Edit: Never mind. I used the Bitcoin Generation Calculator and divided out the difficulty. Everything is fine here, I've generated a couple blocks with 4way. About to start testing without 4way. Another Edit: My test only verifies that hashing works. It does not verify whether I'm really getting the displayed speed.
|
Treazant: A Fullever Rewarding Bitcoin - Backup Your Wallet TODAY to Double Your Money! - Dual Currency Donation Address: 1Dnvwj3hAGSwFPMnkJZvi3KnaqksRPa74p
|
|
|
tcatm
|
|
August 16, 2010, 11:15:04 AM |
|
@satoshi: Oops, I meant -march=amdfam10. Sorry.
@everyone confused about improvement on Phenoms: I developed the code on a Phenom (940) and verified it (at least in 64bit mode) and the improvement you see is real.
Concerning Hyperthreading: It seems to give a little performance gain, maybe from running load/store instructions in parallel with aritmethic instructions. There's only a tiny bit of plain x86 instructions for glueing the function into the ABI. They take less than ~2% of the total CPU time (measured with gprof).
|
|
|
|
teknohog
|
|
August 16, 2010, 12:31:51 PM |
|
On a Core 2 Duo T7200, the default code gives about 1.8 Mhash/s, and 4way is slower at 1.0 Mhash/s. It has 4 MB of L2 cache, so it is probably not a question of cache size, as suggested at some point. Unfortunately, the code (from svn) no longer compiles on ARM, as it now has SSE intrinsics hardcoded. I have removed the -msse2 and -DFOURWAYSSE2 flags from the makefile, and it still produces errors like this sha256.cpp:8:23: error: xmmintrin.h: No such file or directory sha256.cpp:34: error: ‘__m128i’ does not name a type
but hopefully this is easy to fix.
|
|
|
|
satoshi (OP)
Founder
Sr. Member
Offline
Activity: 364
Merit: 7193
|
|
August 16, 2010, 01:38:01 PM |
|
I wrapped sha256.cpp in #ifdef FOURWAYSSE2 #endif // FOURWAYSSE2
try it now.
|
|
|
|
tommy
Newbie
Offline
Activity: 44
Merit: 0
|
|
August 16, 2010, 03:42:55 PM Last edit: August 16, 2010, 04:08:35 PM by tommy |
|
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5600+
w/o -4way "hashespersec" : 2539397
with -4way "hashespersec" : 2108791
Linux, Debian, 32 bit.
|
|
|
|
teknohog
|
|
August 16, 2010, 04:41:31 PM |
|
I wrapped sha256.cpp in #ifdef FOURWAYSSE2 #endif // FOURWAYSSE2
try it now.
Thanks, works fine now.
|
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
August 17, 2010, 06:26:27 PM |
|
Model: Intel Atom n330 (2 cores, 4 virtual).
OS: Ubuntu 10.04 64bit
Using the -4way option I get half the speed than using no option.
|
|
|
|
denaje
Newbie
Offline
Activity: 2
Merit: 0
|
|
August 18, 2010, 07:06:04 PM |
|
64-bit Gentoo / Intel Core i7
W/O 4way: 4324294 With 4way: 7649415
32-bit Ubuntu VM on XP host / Intel Core 2 Duo
W/O 4way: 1751518 With 4way: 793100
|
|
|
|
Ground Loop
Member
Offline
Activity: 111
Merit: 10
|
|
August 18, 2010, 11:00:08 PM |
|
So is it accurate to say that, so far, only Intel Core i7 processors and certain (Phenom?) AMD processors enjoy a speed bump from -4way?
|
Bitcoin accepted here: 1HrAmQk9EuH3Ak6ugsw3qi3g23DG6YUNPq
|
|
|
|