Bitcoin Forum
May 03, 2024, 12:48:58 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: File formats -- blkindex.dat and blk0001.dat  (Read 4261 times)
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
July 16, 2011, 03:19:56 PM
 #1

I wanted to be able to read the block-index and blockchain in my client, as well as create a possible "recovery/repair" script for these files.  I know that everything is in these two files, yet I can't find any documentation about how they are stored. 

blkindex.dat:   first 10 kB is mostly zeros with some scattered non-zero bytes.  I stopped checking after that, realizing I have no idea what I'm looking for

blk0001.dat:   first four bytes are the main network magic number, but then after that I don't know what I'm looking at.  Here's the byte breakdown of the first four 32-bit words:

Code:
     
blk0001.dat:

01     f9 be b4 d9    (main network magic number)
02     1d 01 00 00
03     01 00 00 00
04     00 00 00 00
05     00 00 00 00
06     00 00 00 00
07     00 00 00 00
08     00 00 00 00
09     00 00 00 00
10     00 00 00 00
11     00 00 00 00
12     3b a3 ed fd
13     7a 7b 12 b2
14     7a c7 2c 3e
15     67 76 8f 61
16     7f c8 1b c3
17     88 8a 51 32
18     3a 9f b8 aa
19     4b 1e 5e 4a
20     29 ab 5f 49
21     ff ff 00 1d
22     1d ac 2b 7c
23     01 01 00 00
24     00 01 00 00
25     00 00 00 00
26     00 00 00 00
27     00 00 00 00
28     00 00 00 00
29     00 00 00 00
30     00 00 00 00
31     00 00 00 00
32     00 00 ff ff
33     ff ff 4d 04
34     ff ff 00 1d
35     01 04 45 54
36     68 65 20 54
37     69 6d 65 73
38     20 30 33 2f
39     4a 61 6e 2f
40     32 30 30 39

Can anyone offer some guidance on what I'm looking at?  Or more likely, link me to somewhere that actually explains the file format?  I looked through the reference client code, but I couldn't figure out the reading and writing code.  Ugh... I work in C++ for a living, and I can't figure out the reference client code!

-Eto

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
The Bitcoin software, network, and concept is called "Bitcoin" with a capitalized "B". Bitcoin currency units are called "bitcoins" with a lowercase "b" -- this is often abbreviated BTC.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714740538
Hero Member
*
Offline Offline

Posts: 1714740538

View Profile Personal Message (Offline)

Ignore
1714740538
Reply with quote  #2

1714740538
Report to moderator
zamgo
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile
July 16, 2011, 06:01:32 PM
 #2



blkindex.dat is a berkeley database.

blk0001.dat is a binary file.

In blkindex.dat 'main' database are pointers to the position of blocks within the binary blk0001.dat file.

I think Wink

patvarilly
Newbie
*
Offline Offline

Activity: 42
Merit: 0


View Profile
July 16, 2011, 07:29:01 PM
 #3



blkindex.dat is a berkeley database.

blk0001.dat is a binary file.

In blkindex.dat 'main' database are pointers to the position of blocks within the binary blk0001.dat file.

I think Wink



The blk0001.dat file (and whatever 0002, etc. files eventually get tacked on) is just a long list of block raw data as transferred over the network.  The format is:

4 bytes: network message start (0xf9, 0xbe, 0xb4, 0xd9)
4 bytes: block size N (little endian)
N bytes: block raw data (see https://en.bitcoin.it/wiki/Protocol_specification#block)

You can just read the blocks one after another (I built my own water-downed blockexplorer doing just this a while ago).  Be careful that there can be blocks in blk0001.dat that branch off the main chain, so you'll need to either know what the last block in the main block chain is and work backwards from an index you have in memory, or reconstruct the blockchain yourself.  Neither of these is terribly difficult.

blkindex, as the post above says, is a Berkeley DB, and it shouldn't be too hard to figure out the format of the keys and values (it must be something like block hash for the key and file number (0001) and position into the block data file).

Good luck!
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
July 16, 2011, 07:36:24 PM
 #4

That's great.  I just hope my block serialization is identical to the format in the file!   I know my headers are... I guess this will be a good test for whether all my serialization code works.

The one last question I have is:  if blk0001.dat has all the block data, what does blkindex.dat hold?  I would guess it's just headers, but there should only be 12 MB worth of headers.  What I downloaded has 170MB.

-Eto

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
patvarilly
Newbie
*
Offline Offline

Activity: 42
Merit: 0


View Profile
July 16, 2011, 07:50:07 PM
 #5

The one last question I have is:  if blk0001.dat has all the block data, what does blkindex.dat hold?  I would guess it's just headers, but there should only be 12 MB worth of headers.  What I downloaded has 170MB.

OK, I looked into this, and blkindex.dat stores both the index to the blocks and the index to the transactions + a few other things (!!!).  Here's the relevant code that records a block into the index:

Code:
bool CTxDB::WriteBlockIndex(const CDiskBlockIndex& blockindex)
{
    return Write(make_pair(string("blockindex"), blockindex.GetBlockHash()), blockindex);
}

In other words, the key is whatever pair<string,uint256> serializes to, and the data is everything in the IMPLEMENT_SERIALIZE block of CDiskBlockIndex (see main.h)  That seems to be an index into the blk*.dat file, a block height, a link to the next block on the main chain, and a copy of the block header.  For comparison, the code to store other things into this *same* database is

Code:
bool CTxDB::WriteHashBestChain(uint256 hashBestChain)
{
    return Write(string("hashBestChain"), hashBestChain);
}

bool CTxDB::WriteBestInvalidWork(CBigNum bnBestInvalidWork)
{
    return Write(string("bnBestInvalidWork"), bnBestInvalidWork);
}

bool CTxDB::AddTxIndex(const CTransaction& tx, const CDiskTxPos& pos, int nHeight)
{
    assert(!fClient);

    // Add to tx index
    uint256 hash = tx.GetHash();
    CTxIndex txindex(pos, tx.vout.size());
    return Write(make_pair(string("tx"), hash), txindex);
}
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!