Bitcoin Forum
September 25, 2016, 12:16:31 AM *
News: Due to DDoS attacks, there may be periodic downtime.
 
   Home   Help Search Donate Login Register  
Pages: [1]
  Print  
Author Topic: Auto-detect for 128-bit 4-way SSE2  (Read 1834 times)
satoshi
Founder
Sr. Member
*
qt
Offline Offline

Activity: 364


View Profile
September 09, 2010, 01:04:05 AM
 #1

SVN rev 150 has some code to try to auto-detect whether to use 4-way SSE2.  We need this because it's only faster on certain newer CPUs that have 128-bit SSE2 and not ones with 64-bit SSE2.

It uses the CPUID instruction to get the CPU brand, family, model number and stepping.  That's the easy part.  Knowing what to do with the model number is the hard part.  I was not able to find any table of family, model and stepping numbers for CPUs.  I had to go by various random reports I saw.

Here's what I ended up with:
Code:
 // We need Intel Nehalem or AMD K10 or better for 128bit SSE2
  // Nehalem = i3/i5/i7 and some Xeon
  // K10 = Opterons with 4 or more cores, Phenom, Phenom II, Athlon II
  //  Intel Core i5  family 6, model 26 or 30
  //  Intel Core i7  family 6, model 26 or 30
  //  Intel Core i3  family 6, model 37
  //  AMD Phenom    family 16, model 10
  bool fUseSSE2 = ((fIntel && nFamily * 10000 + nModel >=  60026) ||
                   (fAMD   && nFamily * 10000 + nModel >= 160010));

I saw some sporadic inconsistent model numbers for AMD CPUs, so I'm not sure if this will catch all capable AMDs.

If it's wrong, you can still override it with -4way or -4way=0.

It prints what it finds in debug.log.  Search on CPUID.

This is only enabled if built with GCC.
1474762591
Hero Member
*
Offline Offline

Posts: 1474762591

View Profile Personal Message (Offline)

Ignore
1474762591
Reply with quote  #2

1474762591
Report to moderator
1474762591
Hero Member
*
Offline Offline

Posts: 1474762591

View Profile Personal Message (Offline)

Ignore
1474762591
Reply with quote  #2

1474762591
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1474762591
Hero Member
*
Offline Offline

Posts: 1474762591

View Profile Personal Message (Offline)

Ignore
1474762591
Reply with quote  #2

1474762591
Report to moderator
1474762591
Hero Member
*
Offline Offline

Posts: 1474762591

View Profile Personal Message (Offline)

Ignore
1474762591
Reply with quote  #2

1474762591
Report to moderator
1474762591
Hero Member
*
Offline Offline

Posts: 1474762591

View Profile Personal Message (Offline)

Ignore
1474762591
Reply with quote  #2

1474762591
Report to moderator
tcatm
Sr. Member
****
qt
Offline Offline

Activity: 337


View Profile
September 09, 2010, 03:02:27 PM
 #2

You should benchmark all implementations (using cpu time, not realtime) and choose the fastest and while benchmarking check whether the algorithm actually works.
nelisky
Legendary
*
Offline Offline

Activity: 1554


View Profile
September 09, 2010, 03:39:08 PM
 #3

You should benchmark all implementations (using cpu time, not realtime) and choose the fastest and while benchmarking check whether the algorithm actually works.

Yeah, while implementing the cuda hasher I thought about this. There should be an interface to the hashing handler (or even a full miner per implementation) and we should have a simple way of giving it a known block, ask it to hash 1000 nonces and compare the result, while benchmarking at the same time. Shouldn't be too hard to implement and would help when developing new algorithms.

The interface schema would also help if we were to plug in an fpga based engine or something of the kind, having specific entry points into the code without having to tweak on the default mining schema.
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1470


View Profile
September 09, 2010, 04:07:20 PM
 #4

You should benchmark all implementations (using cpu time, not realtime) and choose the fastest and while benchmarking check whether the algorithm actually works.

+1 agreed.  It's not difficult or time-consuming for each user to do this at startup.


Jeff Garzik, bitcoin core dev team and BitPay engineer; opinions are my own, not my employer.
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
teknohog
Sr. Member
****
Offline Offline

Activity: 410


minor developer


View Profile WWW
September 09, 2010, 07:32:05 PM
 #5

Since the function CallCPUID function contains x86 assembler, it breaks the build on other architectures. I've changed line 2770 in main.cpp to

#if defined(__GNUC__) && defined(CRYPTOPP_X86_ASM_AVAILABLE)

to make it compile again, at least on ARM.

lfm
Full Member
***
Offline Offline

Activity: 196



View Profile
September 10, 2010, 02:47:31 AM
 #6

I wonder if we could get the VIA C7 code included with an autodetect in the standard clients? Or is this just too rare a beast to trouble the main code over? The C7 does work with the standard clients with regular pentium or sse2 code albeit slower.

nimnul
Sr. Member
****
Offline Offline

Activity: 255


View Profile WWW
September 10, 2010, 12:34:11 PM
 #7

> There should be an interface to the hashing handler (or even a full miner per implementation)

+1. In case someone wants FPGA/whatever specific accelerator he has.

satoshi
Founder
Sr. Member
*
qt
Offline Offline

Activity: 364


View Profile
September 10, 2010, 06:11:06 PM
 #8

Since the function CallCPUID function contains x86 assembler, it breaks the build on other architectures. I've changed line 2770 in main.cpp to

#if defined(__GNUC__) && defined(CRYPTOPP_X86_ASM_AVAILABLE)

to make it compile again, at least on ARM.
Added in SVN rev 152
Pages: [1]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!