Title: Is the output of hash function evenly distributed? Post by: arulbero on November 29, 2016, 03:45:05 PM Hash function (sha256) is used to mine and identify a block. The digest is a 256 bit string.
We don't know whether this function is surjective or not (whether there is an input-block header for each output-digest). But we assume that the output of the hash function is evenly distributed (so we can calculate target and difficulty for proof-of-work algorithm). How can we know for sure the output is evenly distributed? Title: Re: Is the output of hash function evenly distributed? Post by: manuelra on November 30, 2016, 02:43:44 PM And while in the subject...
Although extremely unlikely, what happens if a new block has the same hash as a previous block? Shouldn't minners keep digging in that case? and shouldn't CheckBlock verify that case (e.g., if another block has the same hash but a lower height then this block is invalid)? https://github.com/bitcoin/bitcoin/blob/5488514b901d807a98544618cb441a28e0b28bda/src/main.cpp Title: Re: Is the output of hash function evenly distributed? Post by: Syke on November 30, 2016, 05:55:53 PM And while in the subject... Although extremely unlikely, what happens if a new block has the same hash as a previous block? Bitcoin continues normally. It's a perfectly valid occurrence. Title: Re: Is the output of hash function evenly distributed? Post by: shorena on November 30, 2016, 09:03:29 PM And while in the subject... Although extremely unlikely, what happens if a new block has the same hash as a previous block? Bitcoin continues normally. It's a perfectly valid occurrence. Is that so? Lets say there are blocks A, B, C and D with B and C having the same hash. B refers to A as normal. C refers to B which may work even though its the same hash, Im not sure. D refers to C and B because they have the same hash. What am I missing? Besides that a 256-bit collision will probably never happen. Edit: a more in depth description of this problem can be found in post #8 by Danny below. Edit2: answered: Is there something in Bitcoin Core that prevents the block at height 441372 (with a Yes. Previous Block Hash that is identical to it's own Block hash) from being added to the blockchain? No one node will receive it Because will ignore INV command thinking that already has this block in the blockchain reference -> https://bitcoin.org/en/developer-reference#getblocks Title: Re: Is the output of hash function evenly distributed? Post by: Fuserleer on November 30, 2016, 09:43:22 PM And while in the subject... Although extremely unlikely, what happens if a new block has the same hash as a previous block? Bitcoin continues normally. It's a perfectly valid occurrence. Is that so? Lets say there are blocks A, B, C and D with B and C having the same hash. B refers to A as normal. C refers to B which may work even though its the same hash, Im not sure. D refers to C and B because they have the same hash. What am I missing? Besides that a 256-bit collision will probably never happen. Ignoring the fact that the likelihood of 2 blocks outputting the same hash, in the same consensus round (10 mins) is practically zero. There would be a network fork until the next valid blocks on all of the fork chains were created. The network would then revert to the fork where the next block/s had the most work performed upon it. A (not very good) diagram: CHAIN (WORK = 100) -> Block +1 (HASH X : WORK = 10) -> Block +2 (HASH Z : WORK 12) CHAIN (WORK = 100) -> Block +1 (HASH X' : WORK = 10) -> Block +2 (HASH Y : WORK 13) <- This one would win Title: Re: Is the output of hash function evenly distributed? Post by: shorena on November 30, 2016, 10:25:44 PM And while in the subject... Although extremely unlikely, what happens if a new block has the same hash as a previous block? Bitcoin continues normally. It's a perfectly valid occurrence. Is that so? Lets say there are blocks A, B, C and D with B and C having the same hash. B refers to A as normal. C refers to B which may work even though its the same hash, Im not sure. D refers to C and B because they have the same hash. What am I missing? Besides that a 256-bit collision will probably never happen. Ignoring the fact that the likelihood of 2 blocks outputting the same hash, in the same consensus round (10 mins) is practically zero. There would be a network fork until the next valid blocks on all of the fork chains were created. The network would then revert to the fork where the next block/s had the most work performed upon it. A (not very good) diagram: CHAIN (WORK = 100) -> Block +1 (HASH X : WORK = 10) -> Block +2 (HASH Z : WORK 12) CHAIN (WORK = 100) -> Block +1 (HASH X' : WORK = 10) -> Block +2 (HASH Y : WORK 13) <- This one would win So block X' cant follow X? Title: Re: Is the output of hash function evenly distributed? Post by: Fuserleer on November 30, 2016, 10:39:11 PM And while in the subject... Although extremely unlikely, what happens if a new block has the same hash as a previous block? Bitcoin continues normally. It's a perfectly valid occurrence. Is that so? Lets say there are blocks A, B, C and D with B and C having the same hash. B refers to A as normal. C refers to B which may work even though its the same hash, Im not sure. D refers to C and B because they have the same hash. What am I missing? Besides that a 256-bit collision will probably never happen. Ignoring the fact that the likelihood of 2 blocks outputting the same hash, in the same consensus round (10 mins) is practically zero. There would be a network fork until the next valid blocks on all of the fork chains were created. The network would then revert to the fork where the next block/s had the most work performed upon it. A (not very good) diagram: CHAIN (WORK = 100) -> Block +1 (HASH X : WORK = 10) -> Block +2 (HASH Z : WORK 12) CHAIN (WORK = 100) -> Block +1 (HASH X' : WORK = 10) -> Block +2 (HASH Y : WORK 13) <- This one would win So block X' cant follow X? Nope, the hash X is already used for a block X, so block X' using the same hash X would get "soft" rejected by the network. Title: Re: Is the output of hash function evenly distributed? Post by: DannyHamilton on December 01, 2016, 03:47:05 AM Nope, the hash X is already used for a block X, so block X' using the same hash X would get "soft" rejected by the network. I'm not understanding what you are saying. Can you point out the code in Bitcoin Core that explains what you are saying? As a new example (saying what I think is the same thing as shorena in a different way)... Lets say we are at block height 441370 with a hash of 0000...08ac (as of the writing of this post, we actually are) Now lets say a few minutes later a mining pool solves a block. The new block at block height 441371 has: Code: Block Hash: 0000...075c Now, lets say 15 minutes later a mining pool solves a block. The new block at block height 441372 has: Code: Block Hash: 0000...075c Is there something in Bitcoin Core that prevents the block at height 441372 (with a Previous Block Hash that is identical to it's own Block hash) from being added to the blockchain? If not, then when a miner creates the next block having: Code: Block Hash: 0000...08ac Then how will any Bitcoin Core node know if this new block is at height 441373 (Previous Block: 0000...075c refers to block height 441372) or a fork competing with the other block at height 441372 (Previous Block: 0000...075c refers to block height 441371)? Title: Re: Is the output of hash function evenly distributed? Post by: Dabs on December 01, 2016, 01:08:20 PM But we assume that the output of the hash function is evenly distributed (so we can calculate target and difficulty for proof-of-work algorithm). I did a few million hashes of number sequences and other random stuff, for playing cards and dice simulations. It looked evenly distributed.How can we know for sure the output is evenly distributed? Title: Re: Is the output of hash function evenly distributed? Post by: arulbero on December 01, 2016, 02:39:41 PM I rephrase my question:
1) given arbitrary inputs, does the SHA-256 function really generate outputs that are evenly distributed? 2) given block headers** as inputs, does the SHA-256 function really generate outputs that are evenly distributed? **block header = Version (4 bytes) + Previous Block Hash (32 bytes) + Merkle Root (32 bytes) + Timestamp (4 bytes) + Difficulty Target (4 bytes) + Nonce (4 bytes) = 80 byte = 640 bit --> fixed length block hash = block header hash = SHA256(SHA256(Block_Header)) But we assume that the output of the hash function is evenly distributed (so we can calculate target and difficulty for proof-of-work algorithm). I did a few million hashes of number sequences and other random stuff, for playing cards and dice simulations. It looked evenly distributed.How can we know for sure the output is evenly distributed? In your case, what were your inputs? What kind of number sequences? Fixed length (more or less than 256 bit) or not? Title: Re: Is the output of hash function evenly distributed? Post by: amaclin on December 01, 2016, 03:52:21 PM Is there something in Bitcoin Core that prevents the block at height 441372 (with a Yes. Previous Block Hash that is identical to it's own Block hash) from being added to the blockchain? No one node will receive it Because will ignore INV command thinking that already has this block in the blockchain Title: Re: Is the output of hash function evenly distributed? Post by: arulbero on December 03, 2016, 05:00:28 PM I found out some interesting information about hash function:
http://www.denimgroup.com/know_artic_secure_hash_functions.html Quote A secure hash function has three properties: preimage resistance, collision resistance, and second preimage resistance. A good collision resistant hash function should have each hash value be about as evenly distributed as possible http://crypto.stackexchange.com/questions/12822/are-the-sha-family-hash-outputs-practically-random Quote The SHA-256 (as well as any cryptographically secure hash algorithm) produces output that will appear like an uniformly random sequence to observer who does not know the input. Quite a few random number generators, for example ANSI X9.31's RNG and NIST SP 800-90 Hash_DRBG use SHA family hash functions for the reason that resulting sequence is hard to distinguish from random. If the RNG is able to produce entropy, but has large bias, SHA-256 is a very good function for collecting entropy. The output of the function has nearly 1 bit of entropy per output bit, if the input to the function contained at least 256 bits of entropy. Only very little entropy gets lost in this process. SHA-256 using constructs like Hash_DRBG in NIST SP 800-90 can be used in situation where true entropy is available, but is slow to collect. Once you have been able to collect 256 bits of entropy (and preferable a bit more to be safe), you can use the entropy to instantiate Hash_DRBG/SHA-256, which will be able to serve billions of random numbers. Remember these are not good uses: If you feed SHA-256 with too little entropy (or even smaller input than 256 bits), the output may appear random, but it is not. Smart adversary can be able to abuse this In other words: if your input to hash is short or too predictable, so shall be the output. Pseudo-Random Number Generator using SHA-256: https://www.stat.berkeley.edu/~stark/Java/Html/sha256Rand.htm But in general randomness is not the same as collision avoidance: http://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed Title: Re: Is the output of hash function evenly distributed? Post by: Dabs on December 04, 2016, 03:06:36 AM I rephrase my question: 1) given arbitrary inputs, does the SHA-256 function really generate outputs that are evenly distributed? 2) given block headers** as inputs, does the SHA-256 function really generate outputs that are evenly distributed? **block header = Version (4 bytes) + Previous Block Hash (32 bytes) + Merkle Root (32 bytes) + Timestamp (4 bytes) + Difficulty Target (4 bytes) + Nonce (4 bytes) = 80 byte = 640 bit --> fixed length block hash = block header hash = SHA256(SHA256(Block_Header)) But we assume that the output of the hash function is evenly distributed (so we can calculate target and difficulty for proof-of-work algorithm). I did a few million hashes of number sequences and other random stuff, for playing cards and dice simulations. It looked evenly distributed.How can we know for sure the output is evenly distributed? In your case, what were your inputs? What kind of number sequences? Fixed length (more or less than 256 bit) or not? Why don't you try your #2 and see... we have more than 400k blocks already. I believe my inputs were more than 256 bits. I used a bunch of "random" characters ... at least 60 in length. Title: Re: Is the output of hash function evenly distributed? Post by: Mitchell4265727279 on December 05, 2016, 08:55:17 AM Ignoring the fact that the likelihood of 2 blocks outputting the same hash, in the same consensus round (10 mins) is practically zero. It is still zero regardless of time. The human brain isnt equipped to deal with very large/small numbers. It's just too far from the realm of normal experience. SHA2 is regarded as being evenly distributed. Even if that's wrong it's uniform enough for current purposes, makes little difference in the end as far as a collision is concerned. Title: Re: Is the output of hash function evenly distributed? Post by: Fuserleer on December 05, 2016, 12:09:43 PM Ignoring the fact that the likelihood of 2 blocks outputting the same hash, in the same consensus round (10 mins) is practically zero. It is still zero regardless of time. The human brain isnt equipped to deal with very large/small numbers. It's just too far from the realm of normal experience. SHA2 is regarded as being evenly distributed. Even if that's wrong it's uniform enough for current purposes, makes little difference in the end as far as a collision is concerned. and if I said it was zero there would be posts stating that in theory it isn't :p Title: Re: Is the output of hash function evenly distributed? Post by: Rick Storm on December 05, 2016, 12:22:31 PM I rephrase my question: 1) given arbitrary inputs, does the SHA-256 function really generate outputs that are evenly distributed? 2) given block headers** as inputs, does the SHA-256 function really generate outputs that are evenly distributed? Yes, yes. If not, it would not be a good hash function and it would be open to attacks. Title: Re: Is the output of hash function evenly distributed? Post by: arulbero on December 05, 2016, 04:20:11 PM Obviously block hashes from blockchain are not evenly distributed (because of difficulty and retarget):
Code: BLOCKS : from 0 to 442033 (TOT = 442034) So I have checked the last digit distribution of every hash (16 possible values): Code: BLOCKS : from 0 to 442033 (TOT = 442034) I think it's ok. Out of curiosity, I have checked the last 2 digits distribution (256 possible values) too: Code: BLOCKS : from 0 to 442033 (TOT = 442034) and the last 3 digits distribution (4096 possible values): Code: BLOCKS : from 0 to 442033 (TOT = 442034) How do you think about these results? These distributions are okay? Title: Re: Is the output of hash function evenly distributed? Post by: Dabs on December 05, 2016, 05:41:08 PM Try a loop. Count from 1 to a billion. Hash those. Include a large random salt.
For example: xxxxxxxxxxxxxxxxxxxxxxxxxx00000001 xxxxxxxxxxxxxxxxxxxxxxxxxx00000002 xxxxxxxxxxxxxxxxxxxxxxxxxx00000003 xxxxxxxxxxxxxxxxxxxxxxxxxx00000004 ... xxxxxxxxxxxxxxxxxxxxxxxxxx99999996 xxxxxxxxxxxxxxxxxxxxxxxxxx99999997 xxxxxxxxxxxxxxxxxxxxxxxxxx99999998 xxxxxxxxxxxxxxxxxxxxxxxxxx99999999 Or, let it run for a day or a few hours, then stop whatever you have generated. That should be more than enough for you or most other people. I've already done it, but it's better for you to try it yourself. Title: Re: Is the output of hash function evenly distributed? Post by: arulbero on December 07, 2016, 03:37:38 PM Try a loop. Count from 1 to a billion. Hash those. Include a large random salt. For example: xxxxxxxxxxxxxxxxxxxxxxxxxx00000001 xxxxxxxxxxxxxxxxxxxxxxxxxx00000002 xxxxxxxxxxxxxxxxxxxxxxxxxx00000003 xxxxxxxxxxxxxxxxxxxxxxxxxx00000004 ... xxxxxxxxxxxxxxxxxxxxxxxxxx99999996 xxxxxxxxxxxxxxxxxxxxxxxxxx99999997 xxxxxxxxxxxxxxxxxxxxxxxxxx99999998 xxxxxxxxxxxxxxxxxxxxxxxxxx99999999 Ok, I took your advice. I picked a block (# 277316, hash 0000000000000001b6b9a13b095e96db41c4a928b97ef2d944a9b31b2cc7bdc4) and I made a loop with the nonce from 0 to 2^20: Code: #!/usr/bin/env python Output sorted by ascending hash: Code: NONCE BLOCK HEADER HASH The last digit distribution: Code: TOT = 2^20 = 1048576 and the last 2 digits distribution: Code: last 2 Title: Re: Is the output of hash function evenly distributed? Post by: Dabs on December 07, 2016, 07:14:44 PM Looks good. For more than 2 digits distribution, you should do a few million more, or a few billion more, or 2^32 or something, to get a better picture of the distribution. At one point, you'll see that, maybe there's no point going further.
No one needs pi to more than 100 decimal places, but geeks have been computing it to billions of digits. Title: Re: Is the output of hash function evenly distributed? Post by: btc_enigma on December 09, 2016, 06:20:30 AM I rephrase my question: 1) given arbitrary inputs, does the SHA-256 function really generate outputs that are evenly distributed? 2) given block headers** as inputs, does the SHA-256 function really generate outputs that are evenly distributed? Yes, yes. If not, it would not be a good hash function and it would be open to attacks. Good point. Collision resistance comes from the fact that its uniformly distributed. Title: Re: Is the output of hash function evenly distributed? Post by: arulbero on December 09, 2016, 07:56:33 AM Good point. Collision resistance comes from the fact that its uniformly distributed. good distribution --> collision resistance ? No, example: SuperFastHash http://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed Quote I think the more important takeaway is that there are two classes of algorithms when it comes to collisions: collisions rare: FNV-1, FNV-1a, DJB2, DJB2a, SDBM collisions common: SuperFastHash, Loselose And then there's the how evenly distributed the hashes are: outstanding distribution: Murmur2, FNV-1a, SuperFastHash excellent distribution: FNV-1 good distribution: SDBM, DJB2, DJB2a horrible distribution: Loselose collision resistance --> uniformity but uniformity != random uniformity https://research.neustar.biz/2012/02/02/choosing-a-good-hash-function-part-3/ Quote A hash function ought to distribute its keys uniformly across its output range. Now, uniformity is different from random uniformity. |