Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: ehoffman on July 20, 2018, 02:43:29 AM



Title: Question about extra data in blockchain. Malicious tempering vulnerability?
Post by: ehoffman on July 20, 2018, 02:43:29 AM
Hello

This is a bit technical.

I tried looking and did not found anything really specific about this.

I was looking at analyzing the blockchain, mainly using the blkxxxxx.dat files generated by the bitcoin core client.

If you look at the structure of a block, in this file, you have:

- magic number (4 bytes) - 0xD9B4BEF9
- Block length (4 bytes) - length of the block that follow
- Block header (80 bytes)
- Transaction count (var int)
- Transactions (multiple bytes) - A list of all the transactions for this block

Now, the block length above could be larger than the size of 'useful' data that follow (header/transactions).  So, there may be extra data at the end of the block, and indeed, this has surfaced in the news that there was some dodgy data that have been found there (but the subject of this topic is not on what's been found).

Now, my question is about the immutability of this 'extra' data.  The immutability of the blockchain is primarily based on the hash of the block header alone (only the block header is used when computing cryptographic hash).  However, the header itself contain a cryptographic merkel hash of all the transactions, so effectively this also make the transactions immutable.  The magic number also is fixed by consensus, so that is also immutable.

But apart from that, the immutability pretty much ends there, meaning that the following data could be modified:
- The block length
- The extra data.

So, effectively, on the blockchain, a specific block could be represented in a 'practically' infinite many possible way.  One could for example grow any block he receive from the network, store any arbitrary data there, and re-broadcast this new block.  Then, any other nodes would accept it (as long as it is under 1MB of size).

Am I right in my assumption?  What are the implications?  Could someone 'pollute' the network and flood it with 1MB blocks (which would effectively give a 500GB blockchain)?  Possibly not to that extent, because all but the newer peers (who still don't have the blockchain yet) already have their own copy of the blocks, but from an academic point of view, if you were to query this rogue node, you would get a 500GB blockchain.

On a more darker side, I could see for example that some parties could hide data there in the hope that enough peers on the network eventually propagate them.  For example, some political parties or spy agencies could take some blocks and append state secrets.  With the current Bitcoin clients all relaying the block 'as is', this would effectively make this extra appended data as semi-immutable (by majority consensus).  This would be more effective for new blocks, where a party controlling many nodes could broadcast this 'new' tempered block very quickly.  In that case, the network would be mixed with possible a large percentage of nodes having one version of a block, and the other half having a different version.  Technically both version of that block would be valid, and that alone would not create any fork.

However, this could very easily and effectively be fixable with a soft fork, by ensuring that any node receiving a block does strip any extra data (make the 'block length' the real size of the block/transactions set).  That would be backward compatible with any existing clients, and really be a 'free' fix.

I have not tried, but this could be easily be verifiable by querying many nodes for a specific block, and check if that all return the exact same block (taking a block which was found to have dodgy data in it).

Any thought, comments?

Regards,
Eric Hoffman


Title: Re: Question about extra data in blockchain. Malicious tempering vulnerability?
Post by: achow101 on July 21, 2018, 02:33:47 AM
Now, the block length above could be larger than the size of 'useful' data that follow (header/transactions).  So, there may be extra data at the end of the block, and indeed, this has surfaced in the news that there was some dodgy data that have been found there (but the subject of this topic is not on what's been found).
No, this is absolutely wrong. There is no "extra" data at the end of the block. If the deserializer sees that there is data following the stated length of the block, then it will throw an error and the block will be invalid. If the length is longer than the data in the block (the data within the block are all self descriptive in length), then an error will be thrown and the block will be invalid.


Title: Re: Question about extra data in blockchain. Malicious tempering vulnerability?
Post by: pebwindkraft on July 21, 2018, 06:17:38 AM
Here are two sources, which show how blocks are assembled and then verified:

https://en.bitcoin.it/wiki/Protocol_documentation#Differential_encoding

https://en.bitcoin.it/wiki/Protocol_rules#Block_creation_fee

As you say, the block size could be larger, which data structure do you think of?
Based on the specs it must be a transaction, and if it doesn’t fit the rules on the page mentioned (see rules for blocks and the rules for tx), then there would be invalid data of a supposed transaction, making the block invalid...


Title: Re: Question about extra data in blockchain. Malicious tempering vulnerability?
Post by: Kallisteiros on July 21, 2018, 01:19:01 PM
What achow101 said, and even on the off chance that deserializer does not throw error, but just stops reading the input stream, I doubt that it would then decide to propagate this unserialized raw binary data forward as is, and not the parsed block structure it accepted.


Title: Re: Question about extra data in blockchain. Malicious tempering vulnerability?
Post by: odolvlobo on July 21, 2018, 04:07:37 PM
What achow101 said, and even on the off chance that deserializer does not throw error, but just stops reading the input stream, I doubt that it would then decide to propagate this unserialized raw binary data forward as is, and not the parsed block structure it accepted.

I agree. Extraneous data does not need to be stored or propagated, and the policy of rejecting a poorly formed message is better for security.