Bitcoin Forum
November 18, 2024, 02:06:49 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Is the output of hash function evenly distributed?  (Read 2264 times)
arulbero (OP)
Legendary
*
Offline Offline

Activity: 1942
Merit: 2094


View Profile
November 29, 2016, 03:45:05 PM
Last edit: December 01, 2016, 02:41:50 PM by arulbero
Merited by ABCbits (2)
 #1

Hash function (sha256) is used to mine and identify a block. The digest is a 256 bit string.

We don't know whether this function is surjective or not (whether there is an input-block header for each output-digest).

But we assume that the output of the hash function is evenly distributed (so we can calculate target and difficulty for proof-of-work algorithm).

How can we know for sure the output is evenly distributed?
manuelra
Newbie
*
Offline Offline

Activity: 6
Merit: 3


View Profile
November 30, 2016, 02:43:44 PM
 #2

And while in the subject...

Although extremely unlikely, what happens if a new block has the same hash as a previous block?

Shouldn't minners keep digging in that case? and shouldn't CheckBlock verify that case (e.g., if another block has the same hash but a lower height then this block is invalid)?

https://github.com/bitcoin/bitcoin/blob/5488514b901d807a98544618cb441a28e0b28bda/src/main.cpp

Syke
Legendary
*
Offline Offline

Activity: 3878
Merit: 1193


View Profile
November 30, 2016, 05:55:53 PM
 #3

And while in the subject...

Although extremely unlikely, what happens if a new block has the same hash as a previous block?

Bitcoin continues normally. It's a perfectly valid occurrence.

Buy & Hold
shorena
Copper Member
Legendary
*
Offline Offline

Activity: 1498
Merit: 1540


No I dont escrow anymore.


View Profile
November 30, 2016, 09:03:29 PM
Last edit: December 02, 2016, 08:31:49 AM by shorena
 #4

And while in the subject...

Although extremely unlikely, what happens if a new block has the same hash as a previous block?

Bitcoin continues normally. It's a perfectly valid occurrence.

Is that so?

Lets say there are blocks A, B, C and D with B and C having the same hash. B refers to A as normal. C refers to B which may work even though its the same hash, Im not sure. D refers to C and B because they have the same hash. What am I missing? Besides that a 256-bit collision will probably never happen.

Edit: a more in depth description of this problem can be found in post #8 by Danny below.

Edit2: answered:

Is there something in Bitcoin Core that prevents the block at height 441372 (with a
Previous Block Hash that is identical to it's own Block hash) from being added to the blockchain?
Yes.
No one node will receive it
Because will ignore INV command thinking that already has this block in the blockchain


reference -> https://bitcoin.org/en/developer-reference#getblocks

Im not really here, its just your imagination.
Fuserleer
Legendary
*
Offline Offline

Activity: 1064
Merit: 1020



View Profile WWW
November 30, 2016, 09:43:22 PM
 #5

And while in the subject...

Although extremely unlikely, what happens if a new block has the same hash as a previous block?

Bitcoin continues normally. It's a perfectly valid occurrence.

Is that so?

Lets say there are blocks A, B, C and D with B and C having the same hash. B refers to A as normal. C refers to B which may work even though its the same hash, Im not sure. D refers to C and B because they have the same hash. What am I missing? Besides that a 256-bit collision will probably never happen.

Ignoring the fact that the likelihood of 2 blocks outputting the same hash, in the same consensus round (10 mins) is practically zero.  There would be a network fork until the next valid blocks on all of the fork chains were created.

The network would then revert to the fork where the next block/s had the most work performed upon it.

A (not very good) diagram:

CHAIN (WORK = 100) -> Block +1 (HASH X  : WORK = 10) -> Block +2 (HASH Z : WORK 12)
CHAIN (WORK = 100) -> Block +1 (HASH X' : WORK = 10) -> Block +2 (HASH Y : WORK 13) <- This one would win

shorena
Copper Member
Legendary
*
Offline Offline

Activity: 1498
Merit: 1540


No I dont escrow anymore.


View Profile
November 30, 2016, 10:25:44 PM
 #6

And while in the subject...

Although extremely unlikely, what happens if a new block has the same hash as a previous block?

Bitcoin continues normally. It's a perfectly valid occurrence.

Is that so?

Lets say there are blocks A, B, C and D with B and C having the same hash. B refers to A as normal. C refers to B which may work even though its the same hash, Im not sure. D refers to C and B because they have the same hash. What am I missing? Besides that a 256-bit collision will probably never happen.

Ignoring the fact that the likelihood of 2 blocks outputting the same hash, in the same consensus round (10 mins) is practically zero.  There would be a network fork until the next valid blocks on all of the fork chains were created.

The network would then revert to the fork where the next block/s had the most work performed upon it.

A (not very good) diagram:

CHAIN (WORK = 100) -> Block +1 (HASH X  : WORK = 10) -> Block +2 (HASH Z : WORK 12)
CHAIN (WORK = 100) -> Block +1 (HASH X' : WORK = 10) -> Block +2 (HASH Y : WORK 13) <- This one would win

So block X' cant follow X?

Im not really here, its just your imagination.
Fuserleer
Legendary
*
Offline Offline

Activity: 1064
Merit: 1020



View Profile WWW
November 30, 2016, 10:39:11 PM
 #7

And while in the subject...

Although extremely unlikely, what happens if a new block has the same hash as a previous block?

Bitcoin continues normally. It's a perfectly valid occurrence.

Is that so?

Lets say there are blocks A, B, C and D with B and C having the same hash. B refers to A as normal. C refers to B which may work even though its the same hash, Im not sure. D refers to C and B because they have the same hash. What am I missing? Besides that a 256-bit collision will probably never happen.

Ignoring the fact that the likelihood of 2 blocks outputting the same hash, in the same consensus round (10 mins) is practically zero.  There would be a network fork until the next valid blocks on all of the fork chains were created.

The network would then revert to the fork where the next block/s had the most work performed upon it.

A (not very good) diagram:

CHAIN (WORK = 100) -> Block +1 (HASH X  : WORK = 10) -> Block +2 (HASH Z : WORK 12)
CHAIN (WORK = 100) -> Block +1 (HASH X' : WORK = 10) -> Block +2 (HASH Y : WORK 13) <- This one would win

So block X' cant follow X?

Nope, the hash X is already used for a block X, so block X' using the same hash X would get "soft" rejected by the network.

DannyHamilton
Legendary
*
Offline Offline

Activity: 3486
Merit: 4851



View Profile
December 01, 2016, 03:47:05 AM
Merited by ABCbits (1)
 #8

Nope, the hash X is already used for a block X, so block X' using the same hash X would get "soft" rejected by the network.

I'm not understanding what you are saying.  Can you point out the code in Bitcoin Core that explains what you are saying?



As a new example (saying what I think is the same thing as shorena in a different way)...


Lets say we are at block height 441370 with a hash of 0000...08ac (as of the writing of this post, we actually are)


Now lets say a few minutes later a mining pool solves a block.  The new block at block height 441371 has:
Code:
Block Hash:   0000...075c
Previous Block:  0000...08ac (referring to block height 441370)


Now, lets say 15 minutes later a mining pool solves a block.  The new block at block height 441372 has:
Code:
Block Hash:   0000...075c
Previous Block:  0000...75c (referring to block height 441371)

Is there something in Bitcoin Core that prevents the block at height 441372 (with a Previous Block Hash that is identical to it's own Block hash) from being added to the blockchain?


If not, then when a miner creates the next block having:
Code:
Block Hash:   0000...08ac
Previous Block:  0000...075c

Then how will any Bitcoin Core node know if this new block is at height 441373 (Previous Block: 0000...075c refers to block height 441372) or a fork competing with the other block at height 441372 (Previous Block: 0000...075c refers to block height 441371)?
Dabs
Legendary
*
Offline Offline

Activity: 3416
Merit: 1912


The Concierge of Crypto


View Profile
December 01, 2016, 01:08:20 PM
 #9

But we assume that the output of the hash function is evenly distributed (so we can calculate target and difficulty for proof-of-work algorithm).

How can we know for sure the output is evenly distributed?
I did a few million hashes of number sequences and other random stuff, for playing cards and dice simulations. It looked evenly distributed.

arulbero (OP)
Legendary
*
Offline Offline

Activity: 1942
Merit: 2094


View Profile
December 01, 2016, 02:39:41 PM
 #10

I rephrase my question:

1) given arbitrary inputs, does the SHA-256 function really generate outputs that are evenly distributed?

2) given block headers** as inputs, does the SHA-256 function really generate outputs that are evenly distributed?

**block header = Version (4 bytes) + Previous Block Hash (32 bytes) + Merkle Root (32 bytes) + Timestamp (4 bytes) + Difficulty Target (4 bytes) + Nonce (4 bytes) = 80 byte = 640 bit  --> fixed length

block hash = block header hash = SHA256(SHA256(Block_Header))

But we assume that the output of the hash function is evenly distributed (so we can calculate target and difficulty for proof-of-work algorithm).

How can we know for sure the output is evenly distributed?
I did a few million hashes of number sequences and other random stuff, for playing cards and dice simulations. It looked evenly distributed.

In your case, what were your inputs? What kind of number sequences? Fixed length (more or less than 256 bit) or not?
amaclin
Legendary
*
Offline Offline

Activity: 1260
Merit: 1019


View Profile
December 01, 2016, 03:52:21 PM
 #11

Is there something in Bitcoin Core that prevents the block at height 441372 (with a
Previous Block Hash that is identical to it's own Block hash) from being added to the blockchain?
Yes.
No one node will receive it
Because will ignore INV command thinking that already has this block in the blockchain
arulbero (OP)
Legendary
*
Offline Offline

Activity: 1942
Merit: 2094


View Profile
December 03, 2016, 05:00:28 PM
 #12

I found out some interesting information about hash function:

http://www.denimgroup.com/know_artic_secure_hash_functions.html
Quote
A secure hash function has three properties: preimage resistance, collision resistance, and second preimage resistance.

A good collision resistant hash function should have each hash value be about as evenly distributed as possible

http://crypto.stackexchange.com/questions/12822/are-the-sha-family-hash-outputs-practically-random
Quote
The SHA-256 (as well as any cryptographically secure hash algorithm) produces output that will appear like an uniformly random sequence to observer who does not know the input.

Quite a few random number generators, for example ANSI X9.31's RNG and NIST SP 800-90 Hash_DRBG use SHA family hash functions for the reason that resulting sequence is hard to distinguish from random.

If the RNG is able to produce entropy, but has large bias, SHA-256 is a very good function for collecting entropy. The output of the function has nearly 1 bit of entropy per output bit, if the input to the function contained at least 256 bits of entropy. Only very little entropy gets lost in this process.

SHA-256 using constructs like Hash_DRBG in NIST SP 800-90 can be used in situation where true entropy is available, but is slow to collect. Once you have been able to collect 256 bits of entropy (and preferable a bit more to be safe), you can use the entropy to instantiate Hash_DRBG/SHA-256, which will be able to serve billions of random numbers.

Remember these are not good uses:

If you feed SHA-256 with too little entropy (or even smaller input than 256 bits), the output may appear random, but it is not. Smart adversary can be able to abuse this
In other words: if your input to hash is short or too predictable, so shall be the output.

Pseudo-Random Number Generator using SHA-256:
https://www.stat.berkeley.edu/~stark/Java/Html/sha256Rand.htm

But in general randomness is not the same as collision avoidancehttp://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed
Dabs
Legendary
*
Offline Offline

Activity: 3416
Merit: 1912


The Concierge of Crypto


View Profile
December 04, 2016, 03:06:36 AM
 #13

I rephrase my question:

1) given arbitrary inputs, does the SHA-256 function really generate outputs that are evenly distributed?

2) given block headers** as inputs, does the SHA-256 function really generate outputs that are evenly distributed?

**block header = Version (4 bytes) + Previous Block Hash (32 bytes) + Merkle Root (32 bytes) + Timestamp (4 bytes) + Difficulty Target (4 bytes) + Nonce (4 bytes) = 80 byte = 640 bit  --> fixed length

block hash = block header hash = SHA256(SHA256(Block_Header))

But we assume that the output of the hash function is evenly distributed (so we can calculate target and difficulty for proof-of-work algorithm).

How can we know for sure the output is evenly distributed?
I did a few million hashes of number sequences and other random stuff, for playing cards and dice simulations. It looked evenly distributed.

In your case, what were your inputs? What kind of number sequences? Fixed length (more or less than 256 bit) or not?


Why don't you try your #2 and see... we have more than 400k blocks already.

I believe my inputs were more than 256 bits. I used a bunch of "random" characters ... at least 60 in length.

Mitchell4265727279
Newbie
*
Offline Offline

Activity: 1
Merit: 0


View Profile
December 05, 2016, 08:55:17 AM
Last edit: December 05, 2016, 09:39:41 AM by Mitchell4265727279
 #14

Ignoring the fact that the likelihood of 2 blocks outputting the same hash, in the same consensus round (10 mins) is practically zero.

It is still zero regardless of time. The human brain isnt equipped to deal with very large/small numbers. It's just too far from the realm of normal experience.

SHA2 is regarded as being evenly distributed. Even if that's wrong it's uniform enough for current purposes,  makes little difference in the end as far as a collision is concerned.
Fuserleer
Legendary
*
Offline Offline

Activity: 1064
Merit: 1020



View Profile WWW
December 05, 2016, 12:09:43 PM
 #15

Ignoring the fact that the likelihood of 2 blocks outputting the same hash, in the same consensus round (10 mins) is practically zero.

It is still zero regardless of time. The human brain isnt equipped to deal with very large/small numbers. It's just too far from the realm of normal experience.

SHA2 is regarded as being evenly distributed. Even if that's wrong it's uniform enough for current purposes,  makes little difference in the end as far as a collision is concerned.

and if I said it was zero there would be posts stating that in theory it isn't :p

Rick Storm
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
December 05, 2016, 12:22:31 PM
 #16

I rephrase my question:

1) given arbitrary inputs, does the SHA-256 function really generate outputs that are evenly distributed?

2) given block headers** as inputs, does the SHA-256 function really generate outputs that are evenly distributed?


Yes, yes. If not, it would not be a good hash function and it would be open to attacks.
arulbero (OP)
Legendary
*
Offline Offline

Activity: 1942
Merit: 2094


View Profile
December 05, 2016, 04:20:11 PM
Last edit: December 05, 2016, 04:43:35 PM by arulbero
 #17

Obviously block hashes from blockchain are not evenly distributed (because of difficulty and retarget):

Code:
BLOCKS : from 0 to 442033 (TOT = 442034)
height                    hash
0 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
1 00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048
...
10000    0000000099c744455f58e6c6e98b671e1bf7f37346bfd4cf5d0274ad8ee660cb
10001    00000000f01df1dbc52bce6d8d31167a8fef76f1a8eb67897469cf92205e806b
...
50000    000000001aeae195809d120b5d66a39c83eb48792e068f8ea1fea19d84a4278a
50001    000000001c920d495e1eeef2452b6d1c6c229a919b28196c103ecffebabee141
...
100000   000000000003ba27aa200b1cecaad478d2b00432346c3f1f3986da1afd33e506
100001   00000000000080b66c911bd5ba14a74260057311eaeb1982802f7010f1a9f090
...
200000   000000000000034a7dedef4a161fa058a2d67a173a90155f3a2fe6fc132e0ebf
200001   00000000000002e3269b8a00caf315115297c626f954770e8398470d7f387e1c
...
300000   000000000000000082ccf8f1557c5d40b21edabb18d2d691cfbf87118bac7254
300001   000000000000000049a0914d83df36982c77ac1f65ade6a52bdced2ce312aba9
...
400000   000000000000000004ec466ce4732fe6f1ed1cddc2ed4b328fff5224276e3f6f
400001   000000000000000005421b1b2ee6d06d037557d7f5ec96852542413cfed40c22
...
442032   000000000000000000b6a5c60d2509768b29f4b49ceda2679bc6d33f99c8adba
442033   000000000000000000bf88951b0bd245474265cc6a2a1b38ac8ef78c8612158a

So I have checked the last digit distribution of every hash (16 possible values):

Code:
BLOCKS : from 0 to 442033 (TOT = 442034)
last
digit         fr.         prob.           unif. -->  1/16=0.0625
0           27520   0.0622576543886                    
1           27881   0.0630743336485
2           27952   0.0632349547772       MAX -->  =+0.00073495777 = +1,16%
3           27892   0.0630992186121
4           27543   0.0623096865852
5           27339   0.0618481836239
6           27674   0.062606043879
7           27691   0.0626445024591
8           27428   0.0620495256021
9           27203   0.061540514983        MIN --> -0.000959485 = -1.54%
10 (a)      27594   0.0624250623255
11 (b)      27554   0.0623345715488
12 (c)      27863   0.063033612799
13 (d)      27616   0.0624748322527
14 (e)      27619   0.062481619061
15 (f)      27665   0.0625856834542

I think it's ok.

Out of curiosity, I have checked the last 2 digits distribution (256 possible values) too:

Code:
BLOCKS : from 0 to 442033 (TOT = 442034)
last 2
digits        fr.            prob.            unif. -->  1/256=0.00390625
71  (47)    1891    0.0042779514698    MAX -->  =+0.0003717 = +9,516%
...................................................
...................................................
...................................................
138 (8a)    1622    0.0036694009963   MIN --> =-0.000236849 = -6,06%


and the last 3 digits distribution (4096 possible values):

Code:
BLOCKS : from 0 to 442033 (TOT = 442034)
last 3
digits        fr.            prob.            unif. -->  1/4096=0.000024414
2686 (A7E)   147     0.000332553604474   MAX --> 0.00008841 = +36,21%
.................................................
.................................................
.................................................
3139 (C43)   75     0.000169670206364   MIN --> -0.000074470419 = -30,5%

How do you think about these results? These distributions are okay?
Dabs
Legendary
*
Offline Offline

Activity: 3416
Merit: 1912


The Concierge of Crypto


View Profile
December 05, 2016, 05:41:08 PM
Merited by ABCbits (2)
 #18

Try a loop. Count from 1 to a billion. Hash those. Include a large random salt.

For example:

xxxxxxxxxxxxxxxxxxxxxxxxxx00000001
xxxxxxxxxxxxxxxxxxxxxxxxxx00000002
xxxxxxxxxxxxxxxxxxxxxxxxxx00000003
xxxxxxxxxxxxxxxxxxxxxxxxxx00000004
...
xxxxxxxxxxxxxxxxxxxxxxxxxx99999996
xxxxxxxxxxxxxxxxxxxxxxxxxx99999997
xxxxxxxxxxxxxxxxxxxxxxxxxx99999998
xxxxxxxxxxxxxxxxxxxxxxxxxx99999999

Or, let it run for a day or a few hours, then stop whatever you have generated. That should be more than enough for you or most other people. I've already done it, but it's better for you to try it yourself.

arulbero (OP)
Legendary
*
Offline Offline

Activity: 1942
Merit: 2094


View Profile
December 07, 2016, 03:37:38 PM
Merited by ABCbits (3)
 #19

Try a loop. Count from 1 to a billion. Hash those. Include a large random salt.

For example:

xxxxxxxxxxxxxxxxxxxxxxxxxx00000001
xxxxxxxxxxxxxxxxxxxxxxxxxx00000002
xxxxxxxxxxxxxxxxxxxxxxxxxx00000003
xxxxxxxxxxxxxxxxxxxxxxxxxx00000004
...
xxxxxxxxxxxxxxxxxxxxxxxxxx99999996
xxxxxxxxxxxxxxxxxxxxxxxxxx99999997
xxxxxxxxxxxxxxxxxxxxxxxxxx99999998
xxxxxxxxxxxxxxxxxxxxxxxxxx99999999

Ok, I took your advice.
I picked a block (# 277316,  hash 0000000000000001b6b9a13b095e96db41c4a928b97ef2d944a9b31b2cc7bdc4) and I made a loop with the nonce from 0 to 2^20:

Code:
#!/usr/bin/env python
import hashlib

nomefile="dist_hash"
maxnonce=2**20

version='00000002'
previousblockhash='0000000000000002a7bbd25a417c0374cc55261021e8a9ca74442b01284f0569'
merkleroot='c91c008c26e50763e9f548bb8b2fc323735f73577effbc55502c51eb4cc7cf2e'
timestamp='52be093a'
difficultytarget='1903a30c'
nonce=0

def byteswap(a):
    b='';    
    for i in range(0,len(a)/2+1): b=a[2*i:2*i+2]+b
    return b

header=byteswap(version)+byteswap(previousblockhash)+byteswap(merkleroot)+byteswap(timestamp)+byteswap(difficultytarget)

with open(nomefile, 'w') as f:
f.write("NONCE" + "\t\t" + "BLOCK HEADER HASH" + "\n")
for nonce in range(0,maxnonce) :  #924591752 is the best
nonce_st = byteswap("{0:0{1}x}".format(nonce,8)) #format: hex (8 bytes), little endian
hash_result = hashlib.sha256((header+nonce_st).decode('hex')).digest()
hash_result = hashlib.sha256(hash_result).hexdigest()
f.write("{0:0{1}}".format(nonce,8) + "\t" + byteswap(hash_result) + "\n")

Output sorted by ascending hash:
Code:
  NONCE             BLOCK HEADER HASH
00073675         00001161c27946bfbc99e631e5cfd1bf979c27bd68207991cb97e3d054c3192b
00563775         00001f485a9ad567aeb6a18a514f28b807f6a5c43a166debdc74b7ac46a765ea
00108164         000024a77201aaa21f86a43878bc50ef8be511105b403cad9c4b99e6f6b5563c
00172165         00002f3ea713eef6411ca3b2ead29ce859b1330d05df76a298bed3cf9807d666
00661616         000033f44a25e97b6dca43e4ba8dc9fe0e0b48777cc936f774433ce6244c71d4
...
...
00052351 ffffaa0a77c61b614e110c280bbb06c6d48561df1d6b064e4718a75ca97d509f
00845204 ffffc35f3b2a105c5a9d057b03e5eea26bb83f5d6482070be0eaa28a4dcb2f6b
00999281 ffffc4ee9128c730094017f32614562b434c3c782263b4adeb2a64fdc14a8584
00245490 ffffe3579d3529b5fe46b12db6eb472a6d5df707d4a738b7259ea7b2c62bae9b
00282384 ffffef1566a137349da9e6e02ecd638c7508933c08428f099a7f0808f10d7c6f

The last digit distribution:
Code:
TOT = 2^20 = 1048576
last
digit         fr.      prob.       unif. -->  1/16=0.0625
0            62401   0.062401
1            62764   0.062764
2            62608   0.062608
3            62638   0.062638
4            62465   0.062465
5            62695   0.062695
6            62351   0.062351
7            62706   0.062706
8            62093   0.062093   --> MIN = -0.000407 = -0.65%
9            62340   0.06234
10 (a)       62884   0.062884   --> MAX = +0.000384 = +0.61%
11 (b)       62412   0.062412
12 (c)       62455   0.062455
13 (d)       62204   0.062204
14 (e)       62220   0.06222
15 (f)       62764   0.062764

and the last 2 digits distribution:
Code:
last 2
digits      fr.       prob.    unif. -->  1/256=0.00390625

78 (4E)    3751    0.003751    MIN -->  -0.00015525 = -3,97%
...........................................
...........................................
47 (8A)    4117    0.004117    MAX -->  +0.00021075 = +5.40%
Dabs
Legendary
*
Offline Offline

Activity: 3416
Merit: 1912


The Concierge of Crypto


View Profile
December 07, 2016, 07:14:44 PM
 #20

Looks good. For more than 2 digits distribution, you should do a few million more, or a few billion more, or 2^32 or something, to get a better picture of the distribution. At one point, you'll see that, maybe there's no point going further.

No one needs pi to more than 100 decimal places, but geeks have been computing it to billions of digits.

Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!