|
July 20, 2018, 02:43:29 AM |
|
Hello
This is a bit technical.
I tried looking and did not found anything really specific about this.
I was looking at analyzing the blockchain, mainly using the blkxxxxx.dat files generated by the bitcoin core client.
If you look at the structure of a block, in this file, you have:
- magic number (4 bytes) - 0xD9B4BEF9 - Block length (4 bytes) - length of the block that follow - Block header (80 bytes) - Transaction count (var int) - Transactions (multiple bytes) - A list of all the transactions for this block
Now, the block length above could be larger than the size of 'useful' data that follow (header/transactions). So, there may be extra data at the end of the block, and indeed, this has surfaced in the news that there was some dodgy data that have been found there (but the subject of this topic is not on what's been found).
Now, my question is about the immutability of this 'extra' data. The immutability of the blockchain is primarily based on the hash of the block header alone (only the block header is used when computing cryptographic hash). However, the header itself contain a cryptographic merkel hash of all the transactions, so effectively this also make the transactions immutable. The magic number also is fixed by consensus, so that is also immutable.
But apart from that, the immutability pretty much ends there, meaning that the following data could be modified: - The block length - The extra data.
So, effectively, on the blockchain, a specific block could be represented in a 'practically' infinite many possible way. One could for example grow any block he receive from the network, store any arbitrary data there, and re-broadcast this new block. Then, any other nodes would accept it (as long as it is under 1MB of size).
Am I right in my assumption? What are the implications? Could someone 'pollute' the network and flood it with 1MB blocks (which would effectively give a 500GB blockchain)? Possibly not to that extent, because all but the newer peers (who still don't have the blockchain yet) already have their own copy of the blocks, but from an academic point of view, if you were to query this rogue node, you would get a 500GB blockchain.
On a more darker side, I could see for example that some parties could hide data there in the hope that enough peers on the network eventually propagate them. For example, some political parties or spy agencies could take some blocks and append state secrets. With the current Bitcoin clients all relaying the block 'as is', this would effectively make this extra appended data as semi-immutable (by majority consensus). This would be more effective for new blocks, where a party controlling many nodes could broadcast this 'new' tempered block very quickly. In that case, the network would be mixed with possible a large percentage of nodes having one version of a block, and the other half having a different version. Technically both version of that block would be valid, and that alone would not create any fork.
However, this could very easily and effectively be fixable with a soft fork, by ensuring that any node receiving a block does strip any extra data (make the 'block length' the real size of the block/transactions set). That would be backward compatible with any existing clients, and really be a 'free' fix.
I have not tried, but this could be easily be verifiable by querying many nodes for a specific block, and check if that all return the exact same block (taking a block which was found to have dodgy data in it).
Any thought, comments?
Regards, Eric Hoffman
|