A lot of processes inside Bitcoin are using SHA256 in various places, from creating an address to signing a transaction to mining a block. So in this post I hope to make it clear how most of the internal procedures around SHA256 works so that you can better understand exactly why it is so resistant to cryptanalysis attacks.

This thread is intended to be a resource for anybody whose trying to implement a custom SHA256 operation inside a CPU, GPU, FPGA, or whatever.

What is it useful for?

- Brainflayer

- Vanitysearch

- Recovering lost private keys

- Mining (were SHA256 is done twice)

- Any application where you need to do SHA256

*really* fast

Interactive demo (not mine):

https://sha256algorithm.com/**Warning - don't try to make an implementation for a secure application. This is for speed only.**SHA256 variablesIn SHA256, every variable is 32 bits long except for the message itself. Even the arrays have 32-bit elements.

We have 8 variables denoting state. Usually this would be in an array practically speaking, but theoretically you could call the variables

`a`,

`b`,

`c`,

`d`,

`e`,

`f`,

`g`, and

`h`. they are all 32 bits long. Initially, they are set to the first 32 bits of the square roots of the first 8 prime numbers.

When the SHA256 operation is finished then these variables are all concatenated together and it is returned as the final checksum.

How does that work, you might ask? Well, let's take the first prime for example: 2. The first digits of the square root of 2 are: 1.4142135... but we are not interested in these decimal digits. We want to convert this into binary first.

Also it is worth noting that we don't use the numbers before the decimal point. In this case it is 1, so we subtract the integer part from the number to get only the fractional part.

Well, how do we convert this fractional number into binary, and is it even possible?

The easiest way to do this is to multiply the number by 2**32 and truncate the result in order to get the first 32 bits on their own. You can adjust the exponent if you want to get a different number of bits. e.g. 2**64.

There are also 64 round constants

`K`, although we almost always make this an array. Their initial values are the first 32 bits of the cube roots of the first 64 prime numbers. Calculated in a similar way to above.

Finally we have a list of words. As in, in binary nomenclature "words". And by the way, in the SHA256 algorithm, a word is defined as 32 bits, and not having its own meaning. We call these

`W` and there are 64 of them.

Input preparationSHA256 works on chunks of 512 bits i.e. 64 bytes. So, the input has to be padded by 64 bytes and this is done automatically by the operation.

SHA256 takes an arbitrary string as an input, even an empty string.

*However*, we always add 0x80 to the end of the string (technically a single 1 bit is all the implementation needs), and then after that we keep adding 0x00 bytes to the end until the length is a multiple of 512 bits, or 64 bytes actually.

Then we are going to 56 more 0x00 bytes at the end, followed by 8 bytes containing the original length of the message. (SHA256 supports messages up to 2**64 bits large).

Operations on chunksFor each 512-bit chunk, we are going to run a particular algorithm on the chunk 64 times. In the process, it is going to update the variables

`a` through

`h`. The reason why these variables are also changed is because they are used in the algorithm as well in the processing of the next chunk. When you have the final result depending on the values of each 512-bit chunk to create a particular result, it becomes much harder to forge hashes.

The first thing that is done before the rounds begins is that we are going to set the values of

`W`. As I have mentioned before, there are 64

`W` values. The 512-bit chunk is broken into parts of 32-bits each to set the first 16

`W` values. The other 48 W values are calculated by the following formula, assuming zero based index:

`W`[

`i`]=σ1(

`W`[

`i-2`])+

`W`[

`i-7`]+σ0(

`W`[

`i-15`])+

`W`[

`i-16`] 16 <=

`i` <= 63

We will get to the sigma functions in a minute. But first I would like to note that this equation can be unrolled out of a loop, can can be ran in batches of 16. Effectively this means that words 16-31 of

`W` can be calculated after the first 16 rounds, words 32-47 of

`W` can be calculated after the second 16 rounds, and words 48-63 after the third 16 rounds which is before the fourth and last 16 rounds.

This process is called expansion in SHA256.

Here is some

code (that is not mine) which shows how this is done in practice:

// uint32_t * w; // Function parameter that contains the 512-bit message chunk i.e. the first 16 words of W

SHA256_RND(0); // This does 16 rounds on a chunk

WMIX(); // generate the next 16 words of W

SHA256_RND(16); // Do another 16 rounds

WMIX(); // etc...

SHA256_RND(32);

WMIX();

SHA256_RND(48);

// https://github.com/XopMC/CudaBrainSecp/blob/main/GPU/GPUHash.h#L170-#L187

#define WMIX() { \

w[0] += s1(w[14]) + w[9] + s0(w[1]);\

w[1] += s1(w[15]) + w[10] + s0(w[2]);\

w[2] += s1(w[0]) + w[11] + s0(w[3]);\

w[3] += s1(w[1]) + w[12] + s0(w[4]);\

w[4] += s1(w[2]) + w[13] + s0(w[5]);\

w[5] += s1(w[3]) + w[14] + s0(w[6]);\

w[6] += s1(w[4]) + w[15] + s0(w[7]);\

w[7] += s1(w[5]) + w[0] + s0(w[8]);\

w[8] += s1(w[6]) + w[1] + s0(w[9]);\

w[9] += s1(w[7]) + w[2] + s0(w[10]);\

w[10] += s1(w[8]) + w[3] + s0(w[11]);\

w[11] += s1(w[9]) + w[4] + s0(w[12]);\

w[12] += s1(w[10]) + w[5] + s0(w[13]);\

w[13] += s1(w[11]) + w[6] + s0(w[14]);\

w[14] += s1(w[12]) + w[7] + s0(w[15]);\

w[15] += s1(w[13]) + w[8] + s0(w[0]);\

}

As you can see, although the algorithm prescribes that the last 48 words are generated by the formula, it only uses numbers between 1..16 in the subtraction part. Technically that means you can get away with generating 16 words at a time and overwriting the previous words as you go. For example:

w[0] += s1(w[14]) + w[9] + s0(w[1]);

Here you can define

`i` to be 16 so that

`W`[

`i-2`] is

`W`[

`14`],

`W`[

`i-7`] is

`W`[

`9`],

`W`[

`i-15`] is

`W`[

`1`] and the last one

`W`[

`i-16`] is just

`W`[

`0`] which we take care of by adding it and then overwriting its value via the += operator.

So effectively this means

`W`[

`0`] in the program now contains

`W`[

`16`] from the math.

Subsequent lines that reference

`W`[

`0`] are doing so because their value of t-2, t-7, t-15 or whatever evaluates to 16, which is what is inside

`W`[

`0`] right now.

The s0 and s1 are the lower sigma functions (named so because there are also upper sigma functions - I will explain in a minute) and they are defined like this:

#define s0(x) (ROR(x,7) ^ ROR(x,18) ^ (x >> 3))

#define s1(x) (ROR(x,17) ^ ROR(x,19) ^ (x >> 10))

ROR is just the Right Rotate function for 32 bits, i.e.

`((x>>n)|(x<<(32-n)))`.

These lower sigma functions are designed in such a way that the expansion process creates words that are as non-linear (unpredictable) as possible.

Anatomy of a roundAfter initializing the values of

`a` through

`h`, which is done at the beginning of the function by the way,

*not* at the beginning of a chunk,

we apply another formula to update these eight variables.

#define S0(x) (ROR(x,2) ^ ROR(x,13) ^ ROR(x,22))

#define S1(x) (ROR(x,6) ^ ROR(x,11) ^ ROR(x,25))

#define Maj(x,y,z) ((x & y) | (z & (x | y))) // Majority function i.e. (x & y) ^ (x & z) ^ (y & z)

#define Ch(x,y,z) (z ^ (x & (y ^ z))) // Choice function i.e. (x & y) ^ (~x & z)

// SHA-256 inner round

#define S2Round(a, b, c, d, e, f, g, h, k, w) \

t1 = h + S1(e) + Ch(e,f,g) + k + (w); \

t2 = S0(a) + Maj(a,b,c); \

d += t1; \

h = t1 + t2;

#define SHA256_RND(k) {\

S2Round(a, b, c, d, e, f, g, h, K[k], w[0]);\

S2Round(h, a, b, c, d, e, f, g, K[k + 1], w[1]);\

S2Round(g, h, a, b, c, d, e, f, K[k + 2], w[2]);\

S2Round(f, g, h, a, b, c, d, e, K[k + 3], w[3]);\

S2Round(e, f, g, h, a, b, c, d, K[k + 4], w[4]);\

S2Round(d, e, f, g, h, a, b, c, K[k + 5], w[5]);\

S2Round(c, d, e, f, g, h, a, b, K[k + 6], w[6]);\

S2Round(b, c, d, e, f, g, h, a, K[k + 7], w[7]);\

S2Round(a, b, c, d, e, f, g, h, K[k + 8], w[8]);\

S2Round(h, a, b, c, d, e, f, g, K[k + 9], w[9]);\

S2Round(g, h, a, b, c, d, e, f, K[k + 10], w[10]);\

S2Round(f, g, h, a, b, c, d, e, K[k + 11], w[11]);\

S2Round(e, f, g, h, a, b, c, d, K[k + 12], w[12]);\

S2Round(d, e, f, g, h, a, b, c, K[k + 13], w[13]);\

S2Round(c, d, e, f, g, h, a, b, K[k + 14], w[14]);\

S2Round(b, c, d, e, f, g, h, a, K[k + 15], w[15]);\

The capital S stands for upper sigma function, ignore these for now we will see them later. Ignore the majority and choice functions as well, we will look at them later.

Each round (which is implemented in S2Round) switches around the variables. So the variables used in the current round will not be in the same order as the previous round. You can see that it wraps around to its starting order after every 8 rounds.

This particular implementation only shows 16 rounds, the reason which I have explained in the previous section (it only calculates 16 words at a time).

At any rate...

The upper sigma functions have a similar definition to the lower sigma functions, except they are not used in the word expansion process, but inside the rounds.

Their purpose is similar- to make it really hard for anyone to guess what the previous value of the input was.

The majority and choice functions are elementary logical functions. The majority function takes three variables, and in each bit position, takes the one which is most occurring in the three variables. For example, if two variables have a one in a bit position but the third has a zero, that bit is going to be a one. And vice versa. So if you have three variables like 0x0010010, 0x10001000, and 0x01010101, the majority function of all these variables is going to be just 0x0000000 because in each bit position more variables have a 0 bit there than a 1 bit.

The choice function also takes three variables, but uses the first one as a conditional, like an if statement or something. So basically for each bit position:

- If the first variable has a 1 bit there, the result has that bit position set to the second variable's bit

- Otherwise if the first variable has a 0 bit there, the result has the bit position set to the third variable's bit.

You don't actually have to understand how all these helper functions work in order to implement SHA256, as long as you have the function definitions you're good.

Going back to the variables themselves, you can see that before the variables are rotated, that d and h are updated as follows:

h = S1(e) + Ch(e, f, g) + k + w + S0(a) + Maj(a, b, c)

d = d + S1(e) + Ch(e, f, g) + k + w

This adds some additional obfuscation to the state variables before the rotation (i.e. b = a, c = b, etc). Changing the order of the variables like this brings about a major security advantage that makes it difficult to reverse-engineer the previous values of the variables from the final SHA256 hash - which, as you recall, are just the

`a` through

`h` variables stuck together.

Under no circumstances are the values of the round constants

`K` changed to new values.

Wrapping upAfter the 64 rounds are finished, you have the eight variables for the chunk. After you get these variables for all the chunks, you need to add them together to get the final eight variables.

What I mean is, You need to have another array of size 8 that is initialized to zero so that you can perform addition-assignment with the state variables.As you are not going to be storing the variables after you discard the chunk. Practically speaking, you do something like this:

output[0] += a;

output[1] += b;

output[2] += c;

output[3] += d;

output[4] += e;

output[5] += f;

output[6] += g;

output[7] += h;

A reminder that since these variables are all 32-bits or 4 bytes, you need to concatenate them together to get the final 256-bit digest.

Credits:

https://github.com/XopMC/CudaBrainSecphttps://sha256algorithm.com/https://en.wikipedia.org/wiki/SHA-2https://blog.daisie.com/sha-256-explained-comprehensive-secure-hash-tutorial/