Snapshot.bin File Format V0.3This post represents the third revision to the snapshot.bin file format specification proposal. The following file format encodes 100% of the spendable bitcoin wealth. Spin-off developers are free to support claims of only a subset of this total.
Snapshot.bin File StructureField Description Size
================================================================================================
Version 01 00 00 00 4 bytes (uint32)
MerkleRoot Merkle root of claim entries (hash160) 20 bytes
Blockhash hash of Bitcoin block that snapshot was taken from 32 bytes
TotalClaimValue the sum of all the claims (in satoshis) 8 bytes (uint64)
NumClaims the number of claims to follow 8 bytes (uint64)
P2PkHOffset the file offset in bytes for P2PkH claim section 8 bytes (uint64)
P2SHOffset the file offset in bytes for P2SH claim section 8 bytes (uint64)
NatMultisigOffset the file offset in bytes for native multisig claim section 8 bytes (uint64)
RawScriptOffset the file offset in bytes for raw script claim section 8 bytes (uint64)
ClaimEntries the list of claims (sorted) <Claimsize> number of claims
Spin-off developers would likely include only the Merkle root and blockhash in the spin-off's genesis block if they chose to use Smooth's proposal, but are free to include the entire file.
Although outside the scope of snapshot.bin, spin-off developers should also include the Bitcoin blockhash when the spin-off goes live (to act as a time-stamp to avoid pre-mine accusations) and other implementation-dependent genesis block data as required (e.g., a pubkey to prove that you are the developer in order to allow you to claim the spin-off bounty).
Claim EntriesClaim typesAll claim entries begin with a type-identifier byte. The format of each entry depends on the value of this type-identifier byte. There are four possible claim entry types.
Type Description
=======================
0x01 P2PkH
0x02 P2SH
0x03 Native multisig
0x04 Raw script
When constructing the snapshot file, pay2PubKey claims, 1-of-1 multisig, and multisig claims where only 1 pubkey is valid (and only 1 is required) shall be refactored as P2PkH claims. Unspent outputs claimable by the same claimer shall be combined into a single claim. Unspendable claims shall be removed.
Format for Type 0x01 (P2PkH) claim entries99.84% of the valid unspent outputs are either P2PkH or can be recast as P2PkH claim entries. These claims shall be encoded in the following format:
0x01 <hash of pubkey (20 bytes)> <claim value (8 bytes/uint64)>
Format for Type 0x02 (P2SH) claim entries0.16% of valid unspent outputs are P2SH type that shall be encoded in snapshot.bin in the following format:
0x02 <hash of redeem script (20 bytes)> <claim value (8 bytes/uint64)>
Format for Type 0x03 (Native multisig M-of-N) claim entries0.0001% of the unspent outputs are native multisig (that cannot be refactor as type 0x01). These claims shall be encoded in the following format, where the list of pubkeys is sorted canonically from smallest to largest (the byte at the greatest offset in memory is considered most significant for sorting purposes):
0x03 <M (1 byte)> <N (1 byte)> <PubKey 0 (33 bytes)> … <PubKey N (33 bytes)> <claim value (8 bytes/uint64)>
All pubKeys in the native multisig section shall be encoded in compact form. This is an internal form used by bitcoind for reducing the size of the UTXO and shouldn't be confused with compressed keys. In compact form all pubkeys are stored as 33 bytes in the form of <prefix:1 byte><x: 32 bytes>. The y values are not stored and are reconstructed as needed.
Compact PubKey prefixes
0x02 = Compressed Key (even)
0x03 = Compressed PubKey (odd)
0x04 = Uncompressed PubKey (even)
0x05 = Uncompressed PubKey (odd)
Format for Type 0x04 (raw script) claim entries0.0000% (16 entries as of Block #305,303) cannot be easily refactored into one of the three previous formats, and shall thus be recorded verbatim as raw script:
0x04 <n_bytes (2 bytes/uint16)> <verbatim raw script> <claim value (8 bytes/uint64)>
Sorting claim entries into snapshot.binThe claim entries are sorted by type, where the P2PkH (type=0x01) section comes first, and the RawScript (type=0x04) section comes last. The file header specifies the byte offset to each claim section.
<claim 1> <claim 2> … <claim A-1> <claim A> // type 0x01 claims
<claim A+1> <claim A+2> … <claim B-1> <claim B> // type 0x02 claims
<claim B+1> <claim B+2> … <claim C-1> <claim C> // type 0x03 claims
<claim C+1> <claim C+2> … <claim N-1> <claim N> // type 0x04 claims
Within each section, the claim entries are also sorted canonically to permit faster searching.
P2PkH-section: sorted by pubKeyHash from smallest to largest. It is assumed that the byte at the greatest offset in the hash value is the most significant (little endian convention).
P2SH-section: sorted by scriptHash from smallest to largest. It is assumed that the byte at the greatest offset in the hash value is the most significant (little endian convention).
NativeMultisig-section: Each claim entry in this section is hashed with RIPEMD-160. The claims are written to snapshot.bin in order of increasing hash value. It is assumed that the byte at the greatest offset in the hash value is the most significant (little endian convention).
RawScript-section: Each claim entry in this section is hashed with RIPEMD-160. The claims are written to snapshot.bin in order of increasing hash value. It is assumed that the byte at the greatest offset in the hash value is the most significant (little endian convention).
Snapshot.bin's Merkel TreeThe claim entries are hashed into a Merkle tree in the same manner as
the transactions in a bitcoin block are hashed into a Merkle tree. Hash160 is used instead of sha256d to reduce the size of the Merkle branches used when claiming one's share of spin-off.
An example of how the Merkle Tree for Snapshot.bin is constructed is given
here.
Endianness and uintsAll unsigned integers are serialized using the little-endian convention.
When sorting hash values canonically, the byte at greatest offset in memory is assumed most significant (little-endian convention).
===========================================
REVISIONS
26-Jul-14:
- Removed all varInts in favour of fixed-width integers.
- Clarified the manner in which claim entries are sorted.
- Added the note written by DeathAndTaxes regarding the pubkey format.
- Clarified endianness details.
- Posted example snapshot.bin file and Merkle tree construction here: https://bitcointalk.org/index.php?topic=563972.msg8040377#msg8040377
03-Aug-14:
- Removed section markers in favor of section byte offset specified in header.