Bitcoin Forum
April 27, 2024, 10:10:05 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: specification of blockchain format  (Read 16492 times)
bullioner (OP)
Full Member
***
Offline Offline

Activity: 166
Merit: 101


View Profile
August 18, 2012, 09:58:20 AM
 #1

I'm trying to write a parser for the blockchain, but I can't find a written specification of the format.  Does such a thing exist?  I don't really fancy reverse engineering it from the data or another parser's source code, but I guess I'll have to if it's not specified.
Even if you use Bitcoin through Tor, the way transactions are handled by the network makes anonymity difficult to achieve. Do not expect your transactions to be anonymous unless you really know what you're doing.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
organofcorti
Donator
Legendary
*
Offline Offline

Activity: 2058
Merit: 1007


Poor impulse control.


View Profile WWW
August 18, 2012, 09:59:57 AM
 #2

If you mean the db, it'e Berkeleydb

Bitcoin network and pool analysis 12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r
follow @oocBlog for new post notifications
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
August 18, 2012, 01:32:19 PM
 #3

0) There's no "blockchain format" for the on-the-disk file.

1) blkNNNN.dat files are simple concatenation of the blocks as seen on the network wire.

2) because of the above and the possibility of Satoshi bitcoin client crashing mid-append, there is a possibility that those files contain partially-written blocks. There will be a header and at least portion of the transaction part written, but not all the way to the end.

3) blkindex.dat is just an index, nothing more. Currently it is in BerkeleyDB but there's a planned switch to LevelDB. None of this matters for your parser because the actual block chain will stay being stored in the above described simple format.

4) If you are trying to write your parser in C++ I suggest first looking into the parser written by the user znort987.

https://bitcointalk.org/index.php?topic=88584.0

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
August 18, 2012, 08:42:54 PM
 #4

This might help:

http://james.lab6.com/2012/01/12/bitcoin-285-bytes-that-changed-the-world/

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
August 18, 2012, 08:52:11 PM
 #5

More specifically, each new block is appended to the blkXXXX.dat files as they are received.  Their format is pretty simple:


Quote
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
...

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
bullioner (OP)
Full Member
***
Offline Offline

Activity: 166
Merit: 101


View Profile
August 19, 2012, 03:01:30 PM
 #6

More specifically, each new block is appended to the blkXXXX.dat files as they are received.  Their format is pretty simple:


Quote
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
Magic Bytes (4 bytes)
BlockSize w/ header (4 bytes)
Raw Header (80 bytes)
Number of Tx, N (VAR_INT)
Raw Tx1
Raw Tx2
...
Raw TxN
...

That's what I was after, thanks.  Based on https://en.bitcoin.it/wiki/Protocol_specification#block I could see that it was almost a concatenation of block data structures.  But the first 8 bytes before each block in the file were a mystery to me, and I had been discarding them.  The magic bytes plus blocksize now accounts for that.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!