I'm trying to understand the way Bitcoin stores block data - among other things I want to run some statistics on the block chain / transaction history and check just how anonymous Bitcoin really is. So I went to the source to see how Bitcoin reads/writes block data to file.
(ETA: this is in 0.3.2)
In main.h we have:
bool WriteToDisk(bool fWriteTransactions, unsigned int& nFileRet, unsigned int& nBlockPosRet)
{
// Open history file to append
CAutoFile fileout = AppendBlockFile(nFileRet);
if (!fileout)
return error("CBlock::WriteToDisk() : AppendBlockFile failed");
if (!fWriteTransactions)
fileout.nType |= SER_BLOCKHEADERONLY;
// Write index header
unsigned int nSize = fileout.GetSerializeSize(*this);
fileout << FLATDATA(pchMessageStart) << nSize;
// Write block
nBlockPosRet = ftell(fileout);
if (nBlockPosRet == -1)
return error("CBlock::WriteToDisk() : ftell failed");
fileout << *this;
// Flush stdio buffers and commit to disk before returning
fflush(fileout);
#ifdef __WXMSW__
_commit(_fileno(fileout));
#else
fsync(fileno(fileout));
#endif
return true;
}
and
bool ReadFromDisk(unsigned int nFile, unsigned int nBlockPos, bool fReadTransactions=true)
{
SetNull();
// Open history file to read
CAutoFile filein = OpenBlockFile(nFile, nBlockPos, "rb");
if (!filein)
return error("CBlock::ReadFromDisk() : OpenBlockFile failed");
if (!fReadTransactions)
filein.nType |= SER_BLOCKHEADERONLY;
// Read block
filein >> *this;
// Check the header
if (CBigNum().SetCompact(nBits) > bnProofOfWorkLimit)
return error("CBlock::ReadFromDisk() : nBits errors in block header");
if (GetHash() > CBigNum().SetCompact(nBits).getuint256())
return error("CBlock::ReadFromDisk() : GetHash() errors in block header");
return true;
}
FLATDATA is defined in serialize.h like so:
//
// Wrapper for serializing arrays and POD
// There's a clever template way to make arrays serialize normally, but MSVC6 doesn't support it
//
#define FLATDATA(obj) REF(CFlatData((char*)&(obj), (char*)&(obj) + sizeof(obj)))
class CFlatData
{
protected:
char* pbegin;
char* pend;
public:
CFlatData(void* pbeginIn, void* pendIn) : pbegin((char*)pbeginIn), pend((char*)pendIn) { }
char* begin() { return pbegin; }
const char* begin() const { return pbegin; }
char* end() { return pend; }
const char* end() const { return pend; }
unsigned int GetSerializeSize(int, int=0) const
{
return pend - pbegin;
}
template<typename Stream>
void Serialize(Stream& s, int, int=0) const
{
s.write(pbegin, pend - pbegin);
}
template<typename Stream>
void Unserialize(Stream& s, int, int=0)
{
s.read(pbegin, pend - pbegin);
}
};
Now - and I apologize if I'm reading this wrong, this is a little more advanced C/C++ code than I'm used to - as I understand it, the FLATDATA call interprets the raw bytes of a CBlock object as an array (stream??) of characters. The CBlock::WriteToDisk method writes the constant 4-byte message header (0xf9, 0xbe, 0xb4, 0xd9), the size of the CBlock object in bytes, and then the FLATDATA of the CBlock it's writing to disk - which is just the raw bytes of the CBlock object. So after the header, the data written to file is byte-for-byte the same as the CBlock object represented in memory. Also, if I'm reading it correctly, CBlock::ReadFromFile copies those bytes directly into the space allocated for a CBlock object in memory to re-create the block. Is this correct?
Related question - I am under the impression that the exact way an instance of a C++ class is represented internally not guaranteed under standards; compiling a program with different compilers or different optimization flags can change the order in which member variables are stored in memory, and some debug mode compilers even add a few bytes between member variables to make memory inspection easier. I'm not positive about this, it's just something I've picked up and never seriously questioned.