Olipro (OP)
Member

Offline
Activity: 70
Merit: 10
|
 |
July 26, 2010, 08:17:20 AM |
|
Figured I'd make a new topic since anyone on x86 Windows probably won't even bother to read the x64 thread, however, since you're here, I suggest you read page 5 of that thread after reading this. OK, so essentially I've compiled 2 builds of Bitcoin with the new SHA caching optimisation, one build has full optimisation for the SSE instruction sets and will require a modern CPU, and the other is compiled without any SSE optimisation at all and should therefore run on pretty much any CPU capable of running XP or higher. The SSE version is a bit faster than the non-SSE version and both are inferior to the x64 builds, if you have a 64-bit OS, don't bother with these. Beware however that the libeay32.dll that I included may have SSE and therefore, if you can't get either to run on your machine, replace that DLL with the one bundled with the stock BitcoinYou can grab the builds here
|
|
|
|
FreeMoney
Legendary
Offline
Activity: 1246
Merit: 1020
Strength in numbers
|
 |
July 26, 2010, 11:48:14 AM Last edit: July 26, 2010, 12:01:51 PM by FreeMoney |
|
Amazing. ~1250 up to ~2200.
Is this for real? Other machine from ~600 to ~1350
|
Play Bitcoin Poker at sealswithclubs.eu. We're active and open to everyone.
|
|
|
BlackEye
Newbie
Offline
Activity: 17
Merit: 0
|
 |
July 26, 2010, 12:12:41 PM |
|
I don't see the source code included with any of your builds. It would be great to be able to independently verify and compile the sources.
|
|
|
|
Olipro (OP)
Member

Offline
Activity: 70
Merit: 10
|
 |
July 26, 2010, 12:13:45 PM |
|
I don't see the source code included with any of your builds. It would be great to be able to independently verify and compile the sources.
If you don't trust my builds, don't use them, if you want to set up your own build environment, do the work yourself, all the source code in the app is publically available.
|
|
|
|
BlackEye
Newbie
Offline
Activity: 17
Merit: 0
|
 |
July 26, 2010, 12:38:27 PM |
|
No need to be hostile. I do have a build environment set up, but I am completely unable to compile your changes without your source code. This just sends up a big red flag for me as you've obviously modified the source for some of your builds and are unwilling to provide those changes for peer review. I think this goes against the spirit of open source. Modified binary only releases don't help progress the software and create an unnecessary dependency on an individual to provide those binaries.
|
|
|
|
Olipro (OP)
Member

Offline
Activity: 70
Merit: 10
|
 |
July 26, 2010, 01:04:41 PM |
|
No need to be hostile. I do have a build environment set up, but I am completely unable to compile your changes without your source code. This just sends up a big red flag for me as you've obviously modified the source for some of your builds and are unwilling to provide those changes for peer review. I think this goes against the spirit of open source. Modified binary only releases don't help progress the software and create an unnecessary dependency on an individual to provide those binaries.
Crypto++ 5.6.0: http://www.cryptopp.com/Cached SHA256: http://pastebin.com/rJAYZJ32 (although I'm pretty sure this is publically submitted elsewhere, I was linked to it on IRC)
|
|
|
|
knightmb
|
 |
July 26, 2010, 02:15:21 PM |
|
Thanks for thinking about those still on the 32bit chips 
|
Timekoin - The World's Most Energy Efficient Encrypted Digital Currency
|
|
|
sgtstein
Member

Offline
Activity: 61
Merit: 10
|
 |
July 26, 2010, 02:28:25 PM |
|
Or those with servers stuck on 32-bit OSes. :-D Quad core Xeon@1.6GHzStock: 1100kh/s Full Opt: 2600kh/s THANKS!
|
|
|
|
BitCoinPurse
Newbie
Offline
Activity: 34
Merit: 0
|
 |
July 26, 2010, 02:53:27 PM |
|
More than doubled my khash/sec from 800 to 1900.
AMD Phenon II X2 550
|
|
|
|
Olipro (OP)
Member

Offline
Activity: 70
Merit: 10
|
 |
July 26, 2010, 03:42:01 PM |
|
Or those with servers stuck on 32-bit OSes. :-D Quad core Xeon@1.6GHzStock: 1100kh/s Full Opt: 2600kh/s THANKS! BitCoins are always appreciated, address in my sig
|
|
|
|
satoshi
Founder
Sr. Member
Offline
Activity: 364
Merit: 8282
|
 |
July 27, 2010, 01:29:42 AM |
|
I added the cached SHA256 state idea to the SVN, rev 113. The speedup is about 70%. I credited it to tcatm based on your post in the x64 thread. I can compile the Crypto++ 5.6.0 ASM SHA code with MinGW but as soon as it runs it crashes. It says its for MASM (Microsoft's assembler) and the sample command line they give looks like Visual C++. Does it only work with the MSVC and Intel compilers?
|
|
|
|
knightmb
|
 |
July 27, 2010, 02:01:11 AM |
|
Or those with servers stuck on 32-bit OSes. :-D Quad core Xeon@1.6GHzStock: 1100kh/s Full Opt: 2600kh/s THANKS! BitCoins are always appreciated, address in my sig I just sent you a wad of coin 
|
Timekoin - The World's Most Energy Efficient Encrypted Digital Currency
|
|
|
dkaparis
Newbie
Offline
Activity: 53
Merit: 0
|
 |
July 27, 2010, 09:59:06 AM |
|
I can compile the Crypto++ 5.6.0 ASM SHA code with MinGW but as soon as it runs it crashes. It says its for MASM (Microsoft's assembler) and the sample command line they give looks like Visual C++. Does it only work with the MSVC and Intel compilers?
I recently also made an attempt to use Crypto++ 5.6.0 (as an external library) instead of the old integrated code, with the same result - it crashed on the first invocation of CryptoPP::SHA256::Transform. Only I built everything with VC++ 2008. I haven't investigated in depth, but someone mentioned Crypto++'s routine required aligned input - maybe that's the reason, or we may have other bug we don't figure.
|
|
|
|
BlackEye
Newbie
Offline
Activity: 17
Merit: 0
|
 |
July 27, 2010, 12:43:35 PM |
|
You need to change the assembly instructions that require aligned input to unaligned - http://bitcointalk.org/index.php?topic=453.msg5774#msg5774, or make the blocks that are being hashed aligned. I haven't tried yet, but this assembly code combined with the state caching modification should make this blazing fast.
|
|
|
|
satoshi
Founder
Sr. Member
Offline
Activity: 364
Merit: 8282
|
 |
July 27, 2010, 06:27:30 PM Last edit: July 27, 2010, 07:44:48 PM by satoshi |
|
I was able to integrate the SHA256 functionality from Crypto++ 5.6.0 into Bitcoin. This is the fastest SHA256 yet using the SSE2 assembly code. Since Bitcoin was sending unaligned data to the block hash function, I had to change the MOVDQA instruction to MOVDQU.
I think using the SHA256 functionality from Crypto++ 5.6.0 is the way forward right now.
I added a subset of the Crypto++ 5.6.0 library to the SVN. I stripped it down to just SHA and 11 general dependency files. There shouldn't be any other crypto in there other than SHA. I aligned the data fields and it worked. The ASM SHA-256 is about 48% faster. The combined speedup is about 2.5x faster than version 0.3.3. I guess it's using SSE2. It automatically sets its build configuration at compile time based on the compiler environment. It looks like it has some SSE2 detection at runtime, but it's hard to tell if it actually uses it to fall back if it's not available. I want the release builds to have SSE2. SSE2 has been around since the first Pentium 4. A Pentium 3 or older would be so slow, you'd be wasting your electricity trying to generate on it anyway. This is SVN rev 114.
|
|
|
|
knightmb
|
 |
July 27, 2010, 06:36:03 PM Last edit: July 27, 2010, 07:45:40 PM by satoshi |
|
... I guess it's using SSE2. It automatically sets its build configuration at compile time based on the compiler environment.
It looks like it has some SSE2 detection at runtime, but it's hard to tell if it actually uses it to fall back if it's not available. I do want the release builds to have SSE2. SSE2 has been around since the first Pentium 4. A Pentium 3 or older would be so slow, you'd be wasting your electricity trying to generate on it anyway.
I've got some older machines (for the windows client and linux clients) to test with that don't support SSE2. Mainly if you try to run them, the program just crashes when I tried some of the experimental builds here, but I'll be glad to test some future official builds to see if the "detect SSE2" part works or the program goes belly up.
|
Timekoin - The World's Most Energy Efficient Encrypted Digital Currency
|
|
|
satoshi
Founder
Sr. Member
Offline
Activity: 364
Merit: 8282
|
 |
July 27, 2010, 07:47:42 PM |
|
OK, thanks. I'd also like to know if it runs fine as long as you don't turn on Generate. You'd think as long as it doesn't actually execute any SSE2 instructions, it would still load. At least Pentium 3's could run it without generating.
|
|
|
|
knightmb
|
 |
July 27, 2010, 08:09:54 PM |
|
OK, thanks. I'd also like to know if it runs fine as long as you don't turn on Generate. You'd think as long as it doesn't actually execute any SSE2 instructions, it would still load. At least Pentium 3's could run it without generating.
I thought the SSE2 mode would just be for those those with processors that support it, so far the release client runs just fine on older PI, PII, PIII machines, just very slow on the khash/s part. For example, a PIII 933MHz Linux machine can muster about 125 khash/s, some old 1.1GHz Celerons I have can reach 172 khash/s I wouldn't want to exclude those that have older processors from at least trying to generate coin, hehe.
|
Timekoin - The World's Most Energy Efficient Encrypted Digital Currency
|
|
|
BlackEye
Newbie
Offline
Activity: 17
Merit: 0
|
 |
July 27, 2010, 08:47:10 PM |
|
I added a subset of the Crypto++ 5.6.0 library to the SVN. I stripped it down to just SHA and 11 general dependency files. There shouldn't be any other crypto in there other than SHA.
I think you should be able to pare it down to at most 5 files from Crypto++, config.h, cpu.h, cpu.cpp, sha.h, and sha.cpp. Take a look at the zip file I posted on the other thread. It's just excluding some headers that aren't needed and I think I had to move 1 or 2 functions as well, ByteSwap being one if I remember.
|
|
|
|
Olipro (OP)
Member

Offline
Activity: 70
Merit: 10
|
 |
July 29, 2010, 04:28:25 PM |
|
Or those with servers stuck on 32-bit OSes. :-D Quad core Xeon@1.6GHzStock: 1100kh/s Full Opt: 2600kh/s THANKS! BitCoins are always appreciated, address in my sig I just sent you a wad of coin  so you weren't the guy who sent me 0.02 I take it? 
|
|
|
|
|