Bitcoin Forum
November 15, 2019, 06:15:30 PM *
News: 10th anniversary art contest
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Does core have any SHA256 SIMD parallelization code for "ONE" message?  (Read 146 times)
Coding Enthusiast
Hero Member
*****
Offline Offline

Activity: 701
Merit: 1159


Novice C♯ Coder


View Profile WWW
November 08, 2019, 06:22:36 AM
Merited by ETFbitcoin (1)
 #1

I am currently exploring parallelization of SHA256 algorithm using SIMD based on a paper I've found which is basically parallelization of the "message scheduling" step that according to the authors takes up 26% of the computation time.

If I understand bitcoin core's code (eg. AVX2), it seems like it doesn't support computing SHA256 of a large data using SIMD (eg. SHA256 of a single 512+ byte long data), but only has the code for computing SHA256 of multiple messages in parallel (ie. SHA256 of m1, m2, ..., m8) and return multiple hashes (ie. h1, h2, ... h8).

If I am reading the code wrong, please explain how it does that.
And if I am right then is there any reason why they didn't add this feature? It seems to be useful for computing the message digest of a big transaction specially the legacy ones which could easily be bigger than 512 bytes.

P.S. If you have any scientific paper about this topic that is newer than 2012 please let me know.

Projects List+Suggestion box
Donation link using BIP21
Bech32 Donation link!
BitcoinTransactionTool (0.9.2):  Ann - Source Code
Watch Only Bitcoin Wallet (supporting SegWit) (3.1.0):  Ann - Source Code
SharpPusher (broadcast transactions) (0.10.0): Ann - Source Code

The Bitcoin Forum is turning 10 years old! Join the community in sharing and exploring the notable posts made over the years.
1573841730
Hero Member
*
Offline Offline

Posts: 1573841730

View Profile Personal Message (Offline)

Ignore
1573841730
Reply with quote  #2

1573841730
Report to moderator
1573841730
Hero Member
*
Offline Offline

Posts: 1573841730

View Profile Personal Message (Offline)

Ignore
1573841730
Reply with quote  #2

1573841730
Report to moderator
gmaxwell
Moderator
Legendary
*
qt
Offline Offline

Activity: 2870
Merit: 2608



View Profile
November 09, 2019, 02:30:44 PM
Merited by Coding Enthusiast (5), Welsh (2), ETFbitcoin (1)
 #2

I am currently exploring parallelization of SHA256 algorithm using SIMD based on a paper I've found which is basically parallelization of the "message scheduling" step that according to the authors takes up 26% of the computation time.

If I understand bitcoin core's code (eg. AVX2), it seems like it doesn't support computing SHA256 of a large data using SIMD (eg. SHA256 of a single 512+ byte long data), but only has the code for computing SHA256 of multiple messages in parallel (ie. SHA256 of m1, m2, ..., m8) and return multiple hashes (ie. h1, h2, ... h8).

If I am reading the code wrong, please explain how it does that.
And if I am right then is there any reason why they didn't add this feature? It seems to be useful for computing the message digest of a big transaction specially the legacy ones which could easily be bigger than 512 bytes.

P.S. If you have any scientific paper about this topic that is newer than 2012 please let me know.

You're looking in the wrong place.

https://github.com/bitcoin/bitcoin/commit/c1ccb15b0e847eb95623f9d25dc522aa02dbdbe8#diff-58b88805302ed488ea34900368aab920

Most of the hashing in bitcoin is small messages (e.g. 64 bytes), and the N-message parallelization is much faster, when its available.

But for big messages there is SIMD too, it's just in different files.

State of the art is ... get a CPU that doesn't suck. Smiley SHA-NI is much faster than any of these SIMD techniques esp in the one message case.
Coding Enthusiast
Hero Member
*****
Offline Offline

Activity: 701
Merit: 1159


Novice C♯ Coder


View Profile WWW
November 09, 2019, 04:48:23 PM
 #3

Damn, I was afraid of this. It looks like assembly code and I can't read it. Gotta put it in the to-do list now.

State of the art is ... get a CPU that doesn't suck. Smiley SHA-NI is much faster than any of these SIMD techniques esp in the one message case.
Yeah, I've been reading some benchmarks on this. It's fantastic. It is surprising that only a handful of CPUs have SHA Extensions although Intel introduced it in 2013!

Projects List+Suggestion box
Donation link using BIP21
Bech32 Donation link!
BitcoinTransactionTool (0.9.2):  Ann - Source Code
Watch Only Bitcoin Wallet (supporting SegWit) (3.1.0):  Ann - Source Code
SharpPusher (broadcast transactions) (0.10.0): Ann - Source Code

Dabs
Legendary
*
Offline Offline

Activity: 2506
Merit: 1324


The Concierge of Crypto


View Profile WWW
November 10, 2019, 04:20:14 AM
 #4

Even older Xeons have AES-NI, ... is that different or part of SHA-NI? I think 5th or 6th generation Xeons can be bought for cheap and support AES-NI. I'm a fan of these reburb or off-lease rack servers and workstations.

Coding Enthusiast
Hero Member
*****
Offline Offline

Activity: 701
Merit: 1159


Novice C♯ Coder


View Profile WWW
November 10, 2019, 05:07:25 AM
 #5

Even older Xeons have AES-NI, ... is that different or part of SHA-NI?

Yes, they are different (in case it wasn't clear, AES is Advanced Encryption Standard). There are hundreds of CPU intrinsics available. You can see Intel's intrinsics here: https://software.intel.com/sites/landingpage/IntrinsicsGuide/
AES is a group of them that they added in a lot more CPUs compared to SHA so it is normal to see old/cheap CPUs support it.

Projects List+Suggestion box
Donation link using BIP21
Bech32 Donation link!
BitcoinTransactionTool (0.9.2):  Ann - Source Code
Watch Only Bitcoin Wallet (supporting SegWit) (3.1.0):  Ann - Source Code
SharpPusher (broadcast transactions) (0.10.0): Ann - Source Code

Dabs
Legendary
*
Offline Offline

Activity: 2506
Merit: 1324


The Concierge of Crypto


View Profile WWW
November 10, 2019, 02:15:10 PM
 #6

Quote
sha1msg1
__m128i _mm_sha1msg1_epu32 (__m128i a, __m128i b)
sha1msg2
__m128i _mm_sha1msg2_epu32 (__m128i a, __m128i b)
sha1nexte
__m128i _mm_sha1nexte_epu32 (__m128i a, __m128i b)
sha1rnds4
__m128i _mm_sha1rnds4_epu32 (__m128i a, __m128i b, const int func)
sha256msg1
__m128i _mm_sha256msg1_epu32 (__m128i a, __m128i b)
sha256msg2
__m128i _mm_sha256msg2_epu32 (__m128i a, __m128i b)
sha256rnds2
__m128i _mm_sha256rnds2_epu32 (__m128i a, __m128i b, __m128i k)

Oh, that's what you need? Didn't know Intel made sorta built in functions for some chips, almost like an ASIC.

AMD seems to have them too for Ryzen processors. It's not easy to sort through the chips list to find them, they could be under SSE or something else.

HardwalletAttacker1
Jr. Member
*
Online Online

Activity: 56
Merit: 8


View Profile
November 10, 2019, 07:17:21 PM
 #7

Quote
sha1msg1
__m128i _mm_sha1msg1_epu32 (__m128i a, __m128i b)
sha1msg2
__m128i _mm_sha1msg2_epu32 (__m128i a, __m128i b)
sha1nexte
__m128i _mm_sha1nexte_epu32 (__m128i a, __m128i b)
sha1rnds4
__m128i _mm_sha1rnds4_epu32 (__m128i a, __m128i b, const int func)
sha256msg1
__m128i _mm_sha256msg1_epu32 (__m128i a, __m128i b)
sha256msg2
__m128i _mm_sha256msg2_epu32 (__m128i a, __m128i b)
sha256rnds2
__m128i _mm_sha256rnds2_epu32 (__m128i a, __m128i b, __m128i k)

Oh, that's what you need? Didn't know Intel made sorta built in functions for some chips, almost like an ASIC.

AMD seems to have them too for Ryzen processors. It's not easy to sort through the chips list to find them, they could be under SSE or something else.


https://www.officedaytime.com/simd512e/simdimg/sha256.html
Dabs
Legendary
*
Offline Offline

Activity: 2506
Merit: 1324


The Concierge of Crypto


View Profile WWW
November 10, 2019, 10:10:27 PM
 #8

I'm just going to ask it, since I can't easily find the list, maybe it's just in front of me and I can't see it. Anyone know how to get the list from Intel which chips have SHA? (and from AMD too.) ... there used to a flag or tick box or option to select these things on intel's ark site...

Here is the one where you can find them by features:
https://ark.intel.com/content/www/us/en/ark/search/featurefilter.html

So I can go there and select AES-NI ... but can't find one specific for SHA-NI. (Ok, granted, I'm not even sure why I'd want one, but if you're working on something that uses it, then this would be a good processor to play with your software right? hehe.)

AES-NI is useful for third party full disk encryption that takes advantage of it, such as DiskCryptor, TrueCrypt, VeraCrypt (Bitlocker? I don't use that.)

gmaxwell
Moderator
Legendary
*
qt
Offline Offline

Activity: 2870
Merit: 2608



View Profile
November 10, 2019, 10:43:13 PM
Merited by ETFbitcoin (1), aliashraf (1)
 #9

I'm just going to ask it, since I can't easily find the list, maybe it's just in front of me and I can't see it. Anyone know how to get the list from Intel which chips have SHA? (and from AMD too.)

https://en.wikipedia.org/wiki/Intel_SHA_extensions

Intel Goldmont chips (sever market atom) and Ice Lake.  (I haven't used it on Ice Lake, but it's finally reported there). Intel has been pre-announcing it on arches back to skylake then failing to deliver.

Anything AMD Zen and Zen+/Zen2  (so all the threadripper and epyc), which is what all of Bitcoin's development using SHA-NI has been on.

Instruction latency of sha-ni is such that you're still better interleaving independent processing of several messages... but even without that its much faster than anything else except maybe a super wide many messages AVX512 version.
Pages: [1]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!