Bitcoin Forum
November 15, 2024, 04:02:32 PM *
News: Check out the artwork 1Dq created to commemorate this forum's 15th anniversary
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 [5]  All
  Print  
Author Topic: 4 hashes parallel on SSE2 CPUs for 0.3.6  (Read 22049 times)
tcatm (OP)
Sr. Member
****
qt
Offline Offline

Activity: 337
Merit: 285


View Profile
August 14, 2010, 12:50:28 AM
 #81

1. Do we know why it doesn't work on 32bit? Is is it because it's using 128bits and if so, would it help if we dropped it to 64?

No idea, maybe some alignment problem. Someone was trying to figure it out on IRC. I don't have a SSE2 capable 32bit system. The additional registers in 64bit mode are also useful. I don't know if your PE2650 has a recent enough CPU. You might experience a performance drop of 50% if the CPU is too old.

Btw, did anyone with Intel CPU compare performance with Hyperthreading enabled/disabled? The SSE2 loop keeps the arithmetic units and pipelines pretty busy and I can imagine Hyperthreading might decrease performance.
tcatm (OP)
Sr. Member
****
qt
Offline Offline

Activity: 337
Merit: 285


View Profile
August 14, 2010, 12:53:07 AM
 #82

MinGW on Windows has trouble compiling it:

g++ -c -mthreads -O2 -w -Wno-invalid-offsetof -Wformat -g -D__WXDEBUG__ -DWIN32 -D__WXMSW__ -D_WINDOWS -DNOPCH -I"/boost" -I"/db/build_unix" -I"/openssl/include" -I"/wxwidgets/lib/gcc_lib/mswud" -I"/wxwidgets/include" -msse2 -O3 -o obj/sha256.o sha256.cpp

sha256.cpp: In function `long long int __vector__ Ch(long long int __vector__, long long int __vector__, long long int __vector__)':
sha256.cpp:31: internal compiler error: in perform_integral_promotions, at cp/typeck.c:1454
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://www.mingw.org/bugs.shtml> for instructions.
make: *** [obj/sha256.o] Error 1


Looks like we're triggering a compiler bug in the tree optimizer. Can you try to compile it -O0?
satoshi
Founder
Sr. Member
*
qt
Offline Offline

Activity: 364
Merit: 7248


View Profile
August 14, 2010, 04:22:29 AM
Last edit: August 14, 2010, 05:25:50 AM by satoshi
 #83

If you haven't already, try aligning thash.  It might matter.  Couldn't hurt.

Looks like we're triggering a compiler bug in the tree optimizer. Can you try to compile it -O0?
No help from -O0, same error.

MinGW is GCC 3.4.5.  Probably the problem.

I'll see if I can get a newer version of MinGW.

satoshi
Founder
Sr. Member
*
qt
Offline Offline

Activity: 364
Merit: 7248


View Profile
August 14, 2010, 05:55:37 PM
 #84

Got the test working on 32-bit with MinGW GCC 4.5.  Exactly 50% slower than stock with Core 2.
satoshi
Founder
Sr. Member
*
qt
Offline Offline

Activity: 364
Merit: 7248


View Profile
August 14, 2010, 10:06:13 PM
Last edit: August 15, 2010, 03:43:47 AM by satoshi
 #85

MinGW GCC 4.5.0:
Crypto++ doesn't work, X86_SHA256_HashBlocks() never returns
I only got 4-way working with test.cpp but not when called by BitcoinMiner

MinGW GCC 4.4.1:
Crypto++ works
4-way SIGSEGV

GCC is definitely not aligning __m128i.

Even if we align our own __m128i variables, the compiler may decide to use a __m128i behind the scenes as a temporary variable.

By making our __m128i variables aligned and changing these inlines to defines, I was able to get it to work on 4.4.1 with -O0 only:
#define Ch(b, c, d)  ((b & c) ^ (~b & d))
#define Maj(b, c, d)  ((b & c) ^ (b & d) ^ (c & d))
#define ROTR(x, n) (_mm_srli_epi32(x, n) | _mm_slli_epi32(x, 32 - n))
#define SHR(x, n)  _mm_srli_epi32(x, n)

But that's with -O0.

sgtstein
Member
**
Offline Offline

Activity: 61
Merit: 10


View Profile
August 15, 2010, 03:19:31 AM
 #86

Well, reporting back.

I got it to compile by specifying -msse and -msse2 to gcc when compiling. I first was hashing about 692kh/s (50% of SVN r130[1400kh/s]) but recompiled and am now receiving about ~1120kh/s. This is currently the equivalent of using both of my CPUs without HyperThreading, though I can verify that it IS using HyperThreading. With HyperThreading turned off, I get ~1350kh/s. Pretty close to the stock build.

Also, does the git contain the patched and updated code?

Code:
// SVN r130 Using HT.
08/14/10 19:02 hashmeter   4 CPUs   1392 khash/s
08/14/10 19:32 hashmeter   4 CPUs   1387 khash/s
08/14/10 20:02 hashmeter   4 CPUs   1386 khash/s
08/14/10 20:32 hashmeter   4 CPUs   1380 khash/s
08/14/10 21:02 hashmeter   4 CPUs   1363 khash/s
// With -msse -msse2, first run. Using HT.
08/14/10 21:32 hashmeter   4 CPUs    692 khash/s
08/14/10 22:06 hashmeter   4 CPUs   1011 khash/s
08/14/10 22:11 hashmeter   4 CPUs   1104 khash/s
08/14/10 22:16 hashmeter   4 CPUs   1120 khash/s
// NOT using HT.
08/14/10 22:21 hashmeter   2 CPUs   1359 khash/s
08/14/10 22:26 hashmeter   2 CPUs   1340 khash/s


Just wanted to tell my story and help with whatever information I could.
satoshi
Founder
Sr. Member
*
qt
Offline Offline

Activity: 364
Merit: 7248


View Profile
August 15, 2010, 03:40:29 AM
 #87

On both MinGW GCC 4.4.1 and 4.5.0 I have it working with test.cpp but SIGSEGV when called by BitcoinMiner.  So now it doesn't look like it's the version of GCC, it's something else, maybe just the luck of how the stack is aligned.

I have it working fine on GCC 4.3.3 on Ubuntu 32-bit.

I found the problem with Crypto++ on MinGW 4.5.0.  Here's the patch for that:
Code:
--- \old\sha.cpp	Mon Jul 26 13:31:11 2010
+++ \new\sha.cpp Sat Aug 14 20:21:08 2010
@@ -336,7 +336,7 @@
  ROUND(14, 0, eax, ecx, edi, edx)
  ROUND(15, 0, ecx, eax, edx, edi)
 
- ASL(1)
+    ASL(label1)   // Bitcoin: fix for MinGW GCC 4.5
  AS2(add WORD_REG(si), 4*16)
  ROUND(0, 1, eax, ecx, edi, edx)
  ROUND(1, 1, ecx, eax, edx, edi)
@@ -355,7 +355,7 @@
  ROUND(14, 1, eax, ecx, edi, edx)
  ROUND(15, 1, ecx, eax, edx, edi)
  AS2( cmp WORD_REG(si), K_END)
- ASJ( jne, 1, b)
+    ASJ(    jne,    label1,  )   // Bitcoin: fix for MinGW GCC 4.5
 
  AS2( mov WORD_REG(dx), DATA_SAVE)
  AS2( add WORD_REG(dx), 64)
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
August 17, 2010, 06:23:11 PM
 #88

Tryed Bitcoin 3.10 on Ubuntu Lucid 64 bit on Intel Atom 330.

Using the option -4way produces half the hash/s than not using the option. I tried using 1 to 4 (virtual) cores and -4way option produces less than no option always (arround half). Its probably due to the Intel thing.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
Pages: « 1 2 3 4 [5]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!