Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: bullioner on August 18, 2012, 09:58:20 AM



Title: specification of blockchain format
Post by: bullioner on August 18, 2012, 09:58:20 AM
I'm trying to write a parser for the blockchain, but I can't find a written specification of the format.  Does such a thing exist?  I don't really fancy reverse engineering it from the data or another parser's source code, but I guess I'll have to if it's not specified.


Title: Re: specification of blockchain format
Post by: organofcorti on August 18, 2012, 09:59:57 AM
If you mean the db, it'e Berkeleydb


Title: Re: specification of blockchain format
Post by: 2112 on August 18, 2012, 01:32:19 PM
0) There's no "blockchain format" for the on-the-disk file.

1) blkNNNN.dat files are simple concatenation of the blocks as seen on the network wire.

2) because of the above and the possibility of Satoshi bitcoin client crashing mid-append, there is a possibility that those files contain partially-written blocks. There will be a header and at least portion of the transaction part written, but not all the way to the end.

3) blkindex.dat is just an index, nothing more. Currently it is in BerkeleyDB but there's a planned switch to LevelDB. None of this matters for your parser because the actual block chain will stay being stored in the above described simple format.

4) If you are trying to write your parser in C++ I suggest first looking into the parser written by the user znort987.

https://bitcointalk.org/index.php?topic=88584.0


Title: Re: specification of blockchain format
Post by: maaku on August 18, 2012, 08:42:54 PM
This might help:

http://james.lab6.com/2012/01/12/bitcoin-285-bytes-that-changed-the-world/


Title: Re: specification of blockchain format
Post by: etotheipi on August 18, 2012, 08:52:11 PM
More specifically, each new block is appended to the blkXXXX.dat files as they are received.  Their format is pretty simple:


Quote
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
...


Title: Re: specification of blockchain format
Post by: bullioner on August 19, 2012, 03:01:30 PM
More specifically, each new block is appended to the blkXXXX.dat files as they are received.  Their format is pretty simple:


Quote
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
...

That's what I was after, thanks.  Based on https://en.bitcoin.it/wiki/Protocol_specification#block I could see that it was almost a concatenation of block data structures.  But the first 8 bytes before each block in the file were a mystery to me, and I had been discarding them.  The magic bytes plus blocksize now accounts for that.