Bitcoin Forum
July 10, 2024, 07:04:27 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Question about extra data in blockchain. Malicious tempering vulnerability?  (Read 225 times)
ehoffman (OP)
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250


View Profile
July 20, 2018, 02:43:29 AM
 #1

Hello

This is a bit technical.

I tried looking and did not found anything really specific about this.

I was looking at analyzing the blockchain, mainly using the blkxxxxx.dat files generated by the bitcoin core client.

If you look at the structure of a block, in this file, you have:

- magic number (4 bytes) - 0xD9B4BEF9
- Block length (4 bytes) - length of the block that follow
- Block header (80 bytes)
- Transaction count (var int)
- Transactions (multiple bytes) - A list of all the transactions for this block

Now, the block length above could be larger than the size of 'useful' data that follow (header/transactions).  So, there may be extra data at the end of the block, and indeed, this has surfaced in the news that there was some dodgy data that have been found there (but the subject of this topic is not on what's been found).

Now, my question is about the immutability of this 'extra' data.  The immutability of the blockchain is primarily based on the hash of the block header alone (only the block header is used when computing cryptographic hash).  However, the header itself contain a cryptographic merkel hash of all the transactions, so effectively this also make the transactions immutable.  The magic number also is fixed by consensus, so that is also immutable.

But apart from that, the immutability pretty much ends there, meaning that the following data could be modified:
- The block length
- The extra data.

So, effectively, on the blockchain, a specific block could be represented in a 'practically' infinite many possible way.  One could for example grow any block he receive from the network, store any arbitrary data there, and re-broadcast this new block.  Then, any other nodes would accept it (as long as it is under 1MB of size).

Am I right in my assumption?  What are the implications?  Could someone 'pollute' the network and flood it with 1MB blocks (which would effectively give a 500GB blockchain)?  Possibly not to that extent, because all but the newer peers (who still don't have the blockchain yet) already have their own copy of the blocks, but from an academic point of view, if you were to query this rogue node, you would get a 500GB blockchain.

On a more darker side, I could see for example that some parties could hide data there in the hope that enough peers on the network eventually propagate them.  For example, some political parties or spy agencies could take some blocks and append state secrets.  With the current Bitcoin clients all relaying the block 'as is', this would effectively make this extra appended data as semi-immutable (by majority consensus).  This would be more effective for new blocks, where a party controlling many nodes could broadcast this 'new' tempered block very quickly.  In that case, the network would be mixed with possible a large percentage of nodes having one version of a block, and the other half having a different version.  Technically both version of that block would be valid, and that alone would not create any fork.

However, this could very easily and effectively be fixable with a soft fork, by ensuring that any node receiving a block does strip any extra data (make the 'block length' the real size of the block/transactions set).  That would be backward compatible with any existing clients, and really be a 'free' fix.

I have not tried, but this could be easily be verifiable by querying many nodes for a specific block, and check if that all return the exact same block (taking a block which was found to have dodgy data in it).

Any thought, comments?

Regards,
Eric Hoffman

Like my comments?  Cheer me up at 137s1qFV63M6SXWhKkwjaZKEeZX23pq1hw
Don't like my comments, donate to the BCRT (better comment research team) here at 1A1PbZypjEe7yanj69ApVS1FhK8UMW7Wdc Smiley
achow101
Moderator
Legendary
*
expert
Offline Offline

Activity: 3444
Merit: 6785


Just writing some code


View Profile WWW
July 21, 2018, 02:33:47 AM
Merited by Foxpup (2), ABCbits (1)
 #2

Now, the block length above could be larger than the size of 'useful' data that follow (header/transactions).  So, there may be extra data at the end of the block, and indeed, this has surfaced in the news that there was some dodgy data that have been found there (but the subject of this topic is not on what's been found).
No, this is absolutely wrong. There is no "extra" data at the end of the block. If the deserializer sees that there is data following the stated length of the block, then it will throw an error and the block will be invalid. If the length is longer than the data in the block (the data within the block are all self descriptive in length), then an error will be thrown and the block will be invalid.

pebwindkraft
Sr. Member
****
Offline Offline

Activity: 257
Merit: 343


View Profile
July 21, 2018, 06:17:38 AM
 #3

Here are two sources, which show how blocks are assembled and then verified:

https://en.bitcoin.it/wiki/Protocol_documentation#Differential_encoding

https://en.bitcoin.it/wiki/Protocol_rules#Block_creation_fee

As you say, the block size could be larger, which data structure do you think of?
Based on the specs it must be a transaction, and if it doesn’t fit the rules on the page mentioned (see rules for blocks and the rules for tx), then there would be invalid data of a supposed transaction, making the block invalid...
Kallisteiros
Copper Member
Member
**
Offline Offline

Activity: 85
Merit: 122


View Profile
July 21, 2018, 01:19:01 PM
Merited by achow101 (2), ABCbits (2)
 #4

What achow101 said, and even on the off chance that deserializer does not throw error, but just stops reading the input stream, I doubt that it would then decide to propagate this unserialized raw binary data forward as is, and not the parsed block structure it accepted.
odolvlobo
Legendary
*
Offline Offline

Activity: 4368
Merit: 3287



View Profile
July 21, 2018, 04:07:37 PM
Merited by ABCbits (1)
 #5

What achow101 said, and even on the off chance that deserializer does not throw error, but just stops reading the input stream, I doubt that it would then decide to propagate this unserialized raw binary data forward as is, and not the parsed block structure it accepted.

I agree. Extraneous data does not need to be stored or propagated, and the policy of rejecting a poorly formed message is better for security.

Join an anti-signature campaign: Click ignore on the members of signature campaigns.
PGP Fingerprint: 6B6BC26599EC24EF7E29A405EAF050539D0B2925 Signing address: 13GAVJo8YaAuenj6keiEykwxWUZ7jMoSLt
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!