Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: etotheipi on July 16, 2011, 03:19:56 PM



Title: File formats -- blkindex.dat and blk0001.dat
Post by: etotheipi on July 16, 2011, 03:19:56 PM
I wanted to be able to read the block-index and blockchain in my client, as well as create a possible "recovery/repair" script for these files.  I know that everything is in these two files, yet I can't find any documentation about how they are stored. 

blkindex.dat:   first 10 kB is mostly zeros with some scattered non-zero bytes.  I stopped checking after that, realizing I have no idea what I'm looking for

blk0001.dat:   first four bytes are the main network magic number, but then after that I don't know what I'm looking at.  Here's the byte breakdown of the first four 32-bit words:

Code:
     
blk0001.dat:

01     f9 be b4 d9    (main network magic number)
02     1d 01 00 00
03     01 00 00 00
04     00 00 00 00
05     00 00 00 00
06     00 00 00 00
07     00 00 00 00
08     00 00 00 00
09     00 00 00 00
10     00 00 00 00
11     00 00 00 00
12     3b a3 ed fd
13     7a 7b 12 b2
14     7a c7 2c 3e
15     67 76 8f 61
16     7f c8 1b c3
17     88 8a 51 32
18     3a 9f b8 aa
19     4b 1e 5e 4a
20     29 ab 5f 49
21     ff ff 00 1d
22     1d ac 2b 7c
23     01 01 00 00
24     00 01 00 00
25     00 00 00 00
26     00 00 00 00
27     00 00 00 00
28     00 00 00 00
29     00 00 00 00
30     00 00 00 00
31     00 00 00 00
32     00 00 ff ff
33     ff ff 4d 04
34     ff ff 00 1d
35     01 04 45 54
36     68 65 20 54
37     69 6d 65 73
38     20 30 33 2f
39     4a 61 6e 2f
40     32 30 30 39

Can anyone offer some guidance on what I'm looking at?  Or more likely, link me to somewhere that actually explains the file format?  I looked through the reference client code, but I couldn't figure out the reading and writing code.  Ugh... I work in C++ for a living, and I can't figure out the reference client code!

-Eto


Title: Re: File formats -- blkindex.dat and blk0001.dat
Post by: zamgo on July 16, 2011, 06:01:32 PM


blkindex.dat is a berkeley database.

blk0001.dat is a binary file.

In blkindex.dat 'main' database are pointers to the position of blocks within the binary blk0001.dat file.

I think ;)



Title: Re: File formats -- blkindex.dat and blk0001.dat
Post by: patvarilly on July 16, 2011, 07:29:01 PM


blkindex.dat is a berkeley database.

blk0001.dat is a binary file.

In blkindex.dat 'main' database are pointers to the position of blocks within the binary blk0001.dat file.

I think ;)



The blk0001.dat file (and whatever 0002, etc. files eventually get tacked on) is just a long list of block raw data as transferred over the network.  The format is:

4 bytes: network message start (0xf9, 0xbe, 0xb4, 0xd9)
4 bytes: block size N (little endian)
N bytes: block raw data (see https://en.bitcoin.it/wiki/Protocol_specification#block)

You can just read the blocks one after another (I built my own water-downed blockexplorer doing just this a while ago).  Be careful that there can be blocks in blk0001.dat that branch off the main chain, so you'll need to either know what the last block in the main block chain is and work backwards from an index you have in memory, or reconstruct the blockchain yourself.  Neither of these is terribly difficult.

blkindex, as the post above says, is a Berkeley DB, and it shouldn't be too hard to figure out the format of the keys and values (it must be something like block hash for the key and file number (0001) and position into the block data file).

Good luck!


Title: Re: File formats -- blkindex.dat and blk0001.dat
Post by: etotheipi on July 16, 2011, 07:36:24 PM
That's great.  I just hope my block serialization is identical to the format in the file!   I know my headers are... I guess this will be a good test for whether all my serialization code works.

The one last question I have is:  if blk0001.dat has all the block data, what does blkindex.dat hold?  I would guess it's just headers, but there should only be 12 MB worth of headers.  What I downloaded has 170MB.

-Eto


Title: Re: File formats -- blkindex.dat and blk0001.dat
Post by: patvarilly on July 16, 2011, 07:50:07 PM
The one last question I have is:  if blk0001.dat has all the block data, what does blkindex.dat hold?  I would guess it's just headers, but there should only be 12 MB worth of headers.  What I downloaded has 170MB.

OK, I looked into this, and blkindex.dat stores both the index to the blocks and the index to the transactions + a few other things (!!!).  Here's the relevant code that records a block into the index:

Code:
bool CTxDB::WriteBlockIndex(const CDiskBlockIndex& blockindex)
{
    return Write(make_pair(string("blockindex"), blockindex.GetBlockHash()), blockindex);
}

In other words, the key is whatever pair<string,uint256> serializes to, and the data is everything in the IMPLEMENT_SERIALIZE block of CDiskBlockIndex (see main.h)  That seems to be an index into the blk*.dat file, a block height, a link to the next block on the main chain, and a copy of the block header.  For comparison, the code to store other things into this *same* database is

Code:
bool CTxDB::WriteHashBestChain(uint256 hashBestChain)
{
    return Write(string("hashBestChain"), hashBestChain);
}

bool CTxDB::WriteBestInvalidWork(CBigNum bnBestInvalidWork)
{
    return Write(string("bnBestInvalidWork"), bnBestInvalidWork);
}

bool CTxDB::AddTxIndex(const CTransaction& tx, const CDiskTxPos& pos, int nHeight)
{
    assert(!fClient);

    // Add to tx index
    uint256 hash = tx.GetHash();
    CTxIndex txindex(pos, tx.vout.size());
    return Write(make_pair(string("tx"), hash), txindex);
}