Print Page - Is the output of hash function evenly distributed?

Title: Is the output of hash function evenly distributed?
Post by: arulbero on November 29, 2016, 03:45:05 PM

Hash function (sha256) is used to mine and identify a block. The digest is a 256 bit string.

We don't know whether this function is surjective or not (whether there is an input-block header for each output-digest).

But we assume that the output of the hash function is evenly distributed (so we can calculate target and difficulty for proof-of-work algorithm).

How can we know for sure the output is evenly distributed?

Title: Re: Is the output of hash function evenly distributed?
Post by: manuelra on November 30, 2016, 02:43:44 PM

And while in the subject...

Although extremely unlikely, what happens if a new block has the same hash as a previous block?

Shouldn't minners keep digging in that case? and shouldn't CheckBlock verify that case (e.g., if another block has the same hash but a lower height then this block is invalid)?

https://github.com/bitcoin/bitcoin/blob/5488514b901d807a98544618cb441a28e0b28bda/src/main.cpp

Title: Re: Is the output of hash function evenly distributed?
Post by: Syke on November 30, 2016, 05:55:53 PM

Quote from: manuelra on November 30, 2016, 02:43:44 PM

And while in the subject...

Although extremely unlikely, what happens if a new block has the same hash as a previous block?

Bitcoin continues normally. It's a perfectly valid occurrence.

Title: Re: Is the output of hash function evenly distributed?
Post by: shorena on November 30, 2016, 09:03:29 PM

Quote from: Syke on November 30, 2016, 05:55:53 PM

Quote from: manuelra on November 30, 2016, 02:43:44 PM

And while in the subject...

Although extremely unlikely, what happens if a new block has the same hash as a previous block?

Bitcoin continues normally. It's a perfectly valid occurrence.

Is that so?

Lets say there are blocks A, B, C and D with B and C having the same hash. B refers to A as normal. C refers to B which may work even though its the same hash, Im not sure. D refers to C and B because they have the same hash. What am I missing? Besides that a 256-bit collision will probably never happen.

Edit: a more in depth description of this problem can be found in post #8 by Danny below.

Edit2: answered:

Quote from: amaclin on December 01, 2016, 03:52:21 PM

Quote from: DannyHamilton on December 01, 2016, 03:47:05 AM

Is there something in Bitcoin Core that prevents the block at height 441372 (with a
Previous Block Hash that is identical to it's own Block hash) from being added to the blockchain?

Yes.
No one node will receive it
Because will ignore INV command thinking that already has this block in the blockchain

reference -> https://bitcoin.org/en/developer-reference#getblocks

Title: Re: Is the output of hash function evenly distributed?
Post by: Fuserleer on November 30, 2016, 09:43:22 PM

Quote from: shorena on November 30, 2016, 09:03:29 PM

Quote from: Syke on November 30, 2016, 05:55:53 PM

Quote from: manuelra on November 30, 2016, 02:43:44 PM

And while in the subject...

Although extremely unlikely, what happens if a new block has the same hash as a previous block?

Bitcoin continues normally. It's a perfectly valid occurrence.

Ignoring the fact that the likelihood of 2 blocks outputting the same hash, in the same consensus round (10 mins) is practically zero. There would be a network fork until the next valid blocks on all of the fork chains were created.

The network would then revert to the fork where the next block/s had the most work performed upon it.

A (not very good) diagram:

CHAIN (WORK = 100) -> Block +1 (HASH X : WORK = 10) -> Block +2 (HASH Z : WORK 12)
CHAIN (WORK = 100) -> Block +1 (HASH X' : WORK = 10) -> Block +2 (HASH Y : WORK 13) <- This one would win

Title: Re: Is the output of hash function evenly distributed?
Post by: shorena on November 30, 2016, 10:25:44 PM

Quote from: Fuserleer on November 30, 2016, 09:43:22 PM

Quote from: shorena on November 30, 2016, 09:03:29 PM

Quote from: Syke on November 30, 2016, 05:55:53 PM

Quote from: manuelra on November 30, 2016, 02:43:44 PM

And while in the subject...

Although extremely unlikely, what happens if a new block has the same hash as a previous block?

Bitcoin continues normally. It's a perfectly valid occurrence.

So block X' cant follow X?

Title: Re: Is the output of hash function evenly distributed?
Post by: Fuserleer on November 30, 2016, 10:39:11 PM

Quote from: shorena on November 30, 2016, 10:25:44 PM

Quote from: Fuserleer on November 30, 2016, 09:43:22 PM

Quote from: shorena on November 30, 2016, 09:03:29 PM

Quote from: Syke on November 30, 2016, 05:55:53 PM

Quote from: manuelra on November 30, 2016, 02:43:44 PM

And while in the subject...

Although extremely unlikely, what happens if a new block has the same hash as a previous block?

Bitcoin continues normally. It's a perfectly valid occurrence.

So block X' cant follow X?

Nope, the hash X is already used for a block X, so block X' using the same hash X would get "soft" rejected by the network.

Title: Re: Is the output of hash function evenly distributed?
Post by: DannyHamilton on December 01, 2016, 03:47:05 AM

Quote from: Fuserleer on November 30, 2016, 10:39:11 PM

Nope, the hash X is already used for a block X, so block X' using the same hash X would get "soft" rejected by the network.

I'm not understanding what you are saying. Can you point out the code in Bitcoin Core that explains what you are saying?

As a new example (saying what I think is the same thing as shorena in a different way)...

Lets say we are at block height 441370 with a hash of 0000...08ac (as of the writing of this post, we actually are)

Now lets say a few minutes later a mining pool solves a block. The new block at block height 441371 has:

Code:

Block Hash:   0000...075c
Previous Block:  0000...08ac (referring to block height 441370)

Now, lets say 15 minutes later a mining pool solves a block. The new block at block height 441372 has:

Code:

Block Hash:   0000...075c
Previous Block:  0000...75c (referring to block height 441371)

Is there something in Bitcoin Core that prevents the block at height 441372 (with a Previous Block Hash that is identical to it's own Block hash) from being added to the blockchain?

If not, then when a miner creates the next block having:

Code:

Block Hash:   0000...08ac
Previous Block:  0000...075c

Then how will any Bitcoin Core node know if this new block is at height 441373 (Previous Block: 0000...075c refers to block height 441372) or a fork competing with the other block at height 441372 (Previous Block: 0000...075c refers to block height 441371)?

Title: Re: Is the output of hash function evenly distributed?
Post by: Dabs on December 01, 2016, 01:08:20 PM

Quote from: arulbero on November 29, 2016, 03:45:05 PM

But we assume that the output of the hash function is evenly distributed (so we can calculate target and difficulty for proof-of-work algorithm).

How can we know for sure the output is evenly distributed?

I did a few million hashes of number sequences and other random stuff, for playing cards and dice simulations. It looked evenly distributed.

Title: Re: Is the output of hash function evenly distributed?
Post by: arulbero on December 01, 2016, 02:39:41 PM

I rephrase my question:

1) given arbitrary inputs, does the SHA-256 function really generate outputs that are evenly distributed?

2) given block headers** as inputs, does the SHA-256 function really generate outputs that are evenly distributed?

**block header = Version (4 bytes) + Previous Block Hash (32 bytes) + Merkle Root (32 bytes) + Timestamp (4 bytes) + Difficulty Target (4 bytes) + Nonce (4 bytes) = 80 byte = 640 bit --> fixed length

block hash = block header hash = SHA256(SHA256(Block_Header))

Quote from: Dabs on December 01, 2016, 01:08:20 PM

Quote from: arulbero on November 29, 2016, 03:45:05 PM

I did a few million hashes of number sequences and other random stuff, for playing cards and dice simulations. It looked evenly distributed.

In your case, what were your inputs? What kind of number sequences? Fixed length (more or less than 256 bit) or not?

Title: Re: Is the output of hash function evenly distributed?
Post by: amaclin on December 01, 2016, 03:52:21 PM

Quote from: DannyHamilton on December 01, 2016, 03:47:05 AM

Is there something in Bitcoin Core that prevents the block at height 441372 (with a
Previous Block Hash that is identical to it's own Block hash) from being added to the blockchain?

Yes.
No one node will receive it
Because will ignore INV command thinking that already has this block in the blockchain

Title: Re: Is the output of hash function evenly distributed?
Post by: arulbero on December 03, 2016, 05:00:28 PM

I found out some interesting information about hash function:

http://www.denimgroup.com/know_artic_secure_hash_functions.html

Quote

A secure hash function has three properties: preimage resistance, collision resistance, and second preimage resistance.

A good collision resistant hash function should have each hash value be about as evenly distributed as possible

http://crypto.stackexchange.com/questions/12822/are-the-sha-family-hash-outputs-practically-random

Quote

The SHA-256 (as well as any cryptographically secure hash algorithm) produces output that will appear like an uniformly random sequence to observer who does not know the input.

Quite a few random number generators, for example ANSI X9.31's RNG and NIST SP 800-90 Hash_DRBG use SHA family hash functions for the reason that resulting sequence is hard to distinguish from random.

If the RNG is able to produce entropy, but has large bias, SHA-256 is a very good function for collecting entropy. The output of the function has nearly 1 bit of entropy per output bit, if the input to the function contained at least 256 bits of entropy. Only very little entropy gets lost in this process.

SHA-256 using constructs like Hash_DRBG in NIST SP 800-90 can be used in situation where true entropy is available, but is slow to collect. Once you have been able to collect 256 bits of entropy (and preferable a bit more to be safe), you can use the entropy to instantiate Hash_DRBG/SHA-256, which will be able to serve billions of random numbers.

Remember these are not good uses:

If you feed SHA-256 with too little entropy (or even smaller input than 256 bits), the output may appear random, but it is not. Smart adversary can be able to abuse this
In other words: if your input to hash is short or too predictable, so shall be the output.

Pseudo-Random Number Generator using SHA-256:
https://www.stat.berkeley.edu/~stark/Java/Html/sha256Rand.htm

But in general randomness is not the same as collision avoidance: http://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

Title: Re: Is the output of hash function evenly distributed?
Post by: Dabs on December 04, 2016, 03:06:36 AM

Quote from: arulbero on December 01, 2016, 02:39:41 PM

Quote from: Dabs on December 01, 2016, 01:08:20 PM

Quote from: arulbero on November 29, 2016, 03:45:05 PM

I did a few million hashes of number sequences and other random stuff, for playing cards and dice simulations. It looked evenly distributed.

In your case, what were your inputs? What kind of number sequences? Fixed length (more or less than 256 bit) or not?

Why don't you try your #2 and see... we have more than 400k blocks already.

I believe my inputs were more than 256 bits. I used a bunch of "random" characters ... at least 60 in length.

Title: Re: Is the output of hash function evenly distributed?
Post by: Mitchell4265727279 on December 05, 2016, 08:55:17 AM

Quote from: Fuserleer on November 30, 2016, 09:43:22 PM

Ignoring the fact that the likelihood of 2 blocks outputting the same hash, in the same consensus round (10 mins) is practically zero.

It is still zero regardless of time. The human brain isnt equipped to deal with very large/small numbers. It's just too far from the realm of normal experience.

SHA2 is regarded as being evenly distributed. Even if that's wrong it's uniform enough for current purposes, makes little difference in the end as far as a collision is concerned.

Title: Re: Is the output of hash function evenly distributed?
Post by: Fuserleer on December 05, 2016, 12:09:43 PM

Quote from: Mitchell4265727279 on December 05, 2016, 08:55:17 AM

Quote from: Fuserleer on November 30, 2016, 09:43:22 PM

Ignoring the fact that the likelihood of 2 blocks outputting the same hash, in the same consensus round (10 mins) is practically zero.

and if I said it was zero there would be posts stating that in theory it isn't :p

Title: Re: Is the output of hash function evenly distributed?
Post by: Rick Storm on December 05, 2016, 12:22:31 PM

Quote from: arulbero on December 01, 2016, 02:39:41 PM

Yes, yes. If not, it would not be a good hash function and it would be open to attacks.

Title: Re: Is the output of hash function evenly distributed?
Post by: arulbero on December 05, 2016, 04:20:11 PM

Obviously block hashes from blockchain are not evenly distributed (because of difficulty and retarget):

Code:

BLOCKS : from 0 to 442033 (TOT = 442034)
height                    hash 
0 	 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
1 	 00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048
...
10000    0000000099c744455f58e6c6e98b671e1bf7f37346bfd4cf5d0274ad8ee660cb
10001    00000000f01df1dbc52bce6d8d31167a8fef76f1a8eb67897469cf92205e806b
...
50000    000000001aeae195809d120b5d66a39c83eb48792e068f8ea1fea19d84a4278a
50001    000000001c920d495e1eeef2452b6d1c6c229a919b28196c103ecffebabee141
...
100000   000000000003ba27aa200b1cecaad478d2b00432346c3f1f3986da1afd33e506
100001   00000000000080b66c911bd5ba14a74260057311eaeb1982802f7010f1a9f090
...
200000   000000000000034a7dedef4a161fa058a2d67a173a90155f3a2fe6fc132e0ebf
200001   00000000000002e3269b8a00caf315115297c626f954770e8398470d7f387e1c
...
300000   000000000000000082ccf8f1557c5d40b21edabb18d2d691cfbf87118bac7254
300001   000000000000000049a0914d83df36982c77ac1f65ade6a52bdced2ce312aba9
...
400000   000000000000000004ec466ce4732fe6f1ed1cddc2ed4b328fff5224276e3f6f
400001   000000000000000005421b1b2ee6d06d037557d7f5ec96852542413cfed40c22
...
442032   000000000000000000b6a5c60d2509768b29f4b49ceda2679bc6d33f99c8adba
442033   000000000000000000bf88951b0bd245474265cc6a2a1b38ac8ef78c8612158a

So I have checked the last digit distribution of every hash (16 possible values):

Code:

BLOCKS : from 0 to 442033 (TOT = 442034)
last
digit         fr.         prob.           unif. -->  1/16=0.0625
0           27520   0.0622576543886                    
1           27881   0.0630743336485
2           27952   0.0632349547772       MAX -->  =+0.00073495777 = +1,16%
3           27892   0.0630992186121
4           27543   0.0623096865852
5           27339   0.0618481836239
6           27674   0.062606043879
7           27691   0.0626445024591
8           27428   0.0620495256021
9           27203   0.061540514983        MIN --> -0.000959485 = -1.54%
10 (a)      27594   0.0624250623255
11 (b)      27554   0.0623345715488
12 (c)      27863   0.063033612799
13 (d)      27616   0.0624748322527
14 (e)      27619   0.062481619061
15 (f)      27665   0.0625856834542

I think it's ok.

Out of curiosity, I have checked the last 2 digits distribution (256 possible values) too:

Code:

BLOCKS : from 0 to 442033 (TOT = 442034)
last 2
digits        fr.            prob.            unif. -->  1/256=0.00390625
71  (47)    1891    0.0042779514698    MAX -->  =+0.0003717 = +9,516%
...................................................
...................................................
...................................................
138 (8a)    1622    0.0036694009963   MIN --> =-0.000236849 = -6,06%

and the last 3 digits distribution (4096 possible values):

Code:

BLOCKS : from 0 to 442033 (TOT = 442034)
last 3
digits        fr.            prob.            unif. -->  1/4096=0.000024414
2686 (A7E)   147     0.000332553604474   MAX --> 0.00008841 = +36,21%
.................................................
.................................................
.................................................
3139 (C43)   75	     0.000169670206364   MIN --> -0.000074470419 = -30,5%

How do you think about these results? These distributions are okay?

Title: Re: Is the output of hash function evenly distributed?
Post by: Dabs on December 05, 2016, 05:41:08 PM

Try a loop. Count from 1 to a billion. Hash those. Include a large random salt.

For example:

xxxxxxxxxxxxxxxxxxxxxxxxxx00000001
xxxxxxxxxxxxxxxxxxxxxxxxxx00000002
xxxxxxxxxxxxxxxxxxxxxxxxxx00000003
xxxxxxxxxxxxxxxxxxxxxxxxxx00000004
...
xxxxxxxxxxxxxxxxxxxxxxxxxx99999996
xxxxxxxxxxxxxxxxxxxxxxxxxx99999997
xxxxxxxxxxxxxxxxxxxxxxxxxx99999998
xxxxxxxxxxxxxxxxxxxxxxxxxx99999999

Or, let it run for a day or a few hours, then stop whatever you have generated. That should be more than enough for you or most other people. I've already done it, but it's better for you to try it yourself.

Title: Re: Is the output of hash function evenly distributed?
Post by: arulbero on December 07, 2016, 03:37:38 PM

Quote from: Dabs on December 05, 2016, 05:41:08 PM

Ok, I took your advice.
I picked a block (# 277316, hash 0000000000000001b6b9a13b095e96db41c4a928b97ef2d944a9b31b2cc7bdc4) and I made a loop with the nonce from 0 to 2^20:

Code:

#!/usr/bin/env python
import hashlib

nomefile="dist_hash"
maxnonce=2**20 

version='00000002'
previousblockhash='0000000000000002a7bbd25a417c0374cc55261021e8a9ca74442b01284f0569'
merkleroot='c91c008c26e50763e9f548bb8b2fc323735f73577effbc55502c51eb4cc7cf2e'
timestamp='52be093a'
difficultytarget='1903a30c'
nonce=0

def byteswap(a):
    b='';    
    for i in range(0,len(a)/2+1): b=a[2*i:2*i+2]+b
    return b

header=byteswap(version)+byteswap(previousblockhash)+byteswap(merkleroot)+byteswap(timestamp)+byteswap(difficultytarget)

with open(nomefile, 'w') as f:
	f.write("NONCE" + "\t\t" + "BLOCK HEADER HASH" + "\n")
	for nonce in range(0,maxnonce) :  #924591752 is the best 
		nonce_st = byteswap("{0:0{1}x}".format(nonce,8)) #format: hex (8 bytes), little endian	
		hash_result = hashlib.sha256((header+nonce_st).decode('hex')).digest()
		hash_result = hashlib.sha256(hash_result).hexdigest()
		f.write("{0:0{1}}".format(nonce,8) + "\t" + byteswap(hash_result) + "\n")

Output sorted by ascending hash:

Code:

  NONCE             BLOCK HEADER HASH
00073675         00001161c27946bfbc99e631e5cfd1bf979c27bd68207991cb97e3d054c3192b
00563775         00001f485a9ad567aeb6a18a514f28b807f6a5c43a166debdc74b7ac46a765ea
00108164         000024a77201aaa21f86a43878bc50ef8be511105b403cad9c4b99e6f6b5563c
00172165         00002f3ea713eef6411ca3b2ead29ce859b1330d05df76a298bed3cf9807d666
00661616         000033f44a25e97b6dca43e4ba8dc9fe0e0b48777cc936f774433ce6244c71d4
...
...
00052351 	 ffffaa0a77c61b614e110c280bbb06c6d48561df1d6b064e4718a75ca97d509f
00845204 	 ffffc35f3b2a105c5a9d057b03e5eea26bb83f5d6482070be0eaa28a4dcb2f6b
00999281 	 ffffc4ee9128c730094017f32614562b434c3c782263b4adeb2a64fdc14a8584
00245490 	 ffffe3579d3529b5fe46b12db6eb472a6d5df707d4a738b7259ea7b2c62bae9b
00282384 	 ffffef1566a137349da9e6e02ecd638c7508933c08428f099a7f0808f10d7c6f

The last digit distribution:

Code:

TOT = 2^20 = 1048576
last
digit         fr.      prob.       unif. -->  1/16=0.0625
0            62401   0.062401
1            62764   0.062764
2            62608   0.062608
3            62638   0.062638
4            62465   0.062465
5            62695   0.062695
6            62351   0.062351
7            62706   0.062706
8            62093   0.062093   --> MIN = -0.000407 = -0.65%
9            62340   0.06234
10 (a)       62884   0.062884   --> MAX = +0.000384 = +0.61%
11 (b)       62412   0.062412
12 (c)       62455   0.062455
13 (d)       62204   0.062204
14 (e)       62220   0.06222
15 (f)       62764   0.062764

and the last 2 digits distribution:

Code:

last 2
digits      fr.       prob.    unif. -->  1/256=0.00390625

78 (4E)    3751    0.003751    MIN -->  -0.00015525 = -3,97%
...........................................
...........................................
47 (8A)    4117    0.004117    MAX -->  +0.00021075 = +5.40%

Title: Re: Is the output of hash function evenly distributed?
Post by: Dabs on December 07, 2016, 07:14:44 PM

Looks good. For more than 2 digits distribution, you should do a few million more, or a few billion more, or 2^32 or something, to get a better picture of the distribution. At one point, you'll see that, maybe there's no point going further.

No one needs pi to more than 100 decimal places, but geeks have been computing it to billions of digits.

Title: Re: Is the output of hash function evenly distributed?
Post by: btc_enigma on December 09, 2016, 06:20:30 AM

Quote from: Rick Storm on December 05, 2016, 12:22:31 PM

Quote from: arulbero on December 01, 2016, 02:39:41 PM

Yes, yes. If not, it would not be a good hash function and it would be open to attacks.

Good point. Collision resistance comes from the fact that its uniformly distributed.

Title: Re: Is the output of hash function evenly distributed?
Post by: arulbero on December 09, 2016, 07:56:33 AM

Quote from: btc_enigma on December 09, 2016, 06:20:30 AM

Good point. Collision resistance comes from the fact that its uniformly distributed.

good distribution --> collision resistance ? No, example: SuperFastHash

http://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

Quote

I think the more important takeaway is that there are two classes of algorithms when it comes to collisions:

   collisions rare: FNV-1, FNV-1a, DJB2, DJB2a, SDBM
   collisions common: SuperFastHash, Loselose

And then there's the how evenly distributed the hashes are:

   outstanding distribution: Murmur2, FNV-1a, SuperFastHash
   excellent distribution: FNV-1
   good distribution: SDBM, DJB2, DJB2a
   horrible distribution: Loselose

collision resistance --> uniformity but uniformity != random uniformity

https://research.neustar.biz/2012/02/02/choosing-a-good-hash-function-part-3/

Quote

A hash function ought to distribute its keys uniformly across its output range.
Now, uniformity is different from random uniformity.

Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: arulbero on November 29, 2016, 03:45:05 PM