Latest posts of: Hepatizon

Show Posts
Pages: [1]

Bitcoin / Development & Technical Discussion / Reading/Writing Blocks and FLATDATA

on: July 24, 2010, 01:27:32 AM

I'm trying to understand the way Bitcoin stores block data - among other things I want to run some statistics on the block chain / transaction history and check just how anonymous Bitcoin really is. So I went to the source to see how Bitcoin reads/writes block data to file.

(ETA: this is in 0.3.2)

In main.h we have:

Code: (CBlock::WriteToDisk)

bool WriteToDisk(bool fWriteTransactions, unsigned int& nFileRet, unsigned int& nBlockPosRet)
{
    // Open history file to append
    CAutoFile fileout = AppendBlockFile(nFileRet);
    if (!fileout)
        return error("CBlock::WriteToDisk() : AppendBlockFile failed");
    if (!fWriteTransactions)
        fileout.nType |= SER_BLOCKHEADERONLY;

    // Write index header
    unsigned int nSize = fileout.GetSerializeSize(*this);
    fileout << FLATDATA(pchMessageStart) << nSize;

    // Write block
    nBlockPosRet = ftell(fileout);
    if (nBlockPosRet == -1)
        return error("CBlock::WriteToDisk() : ftell failed");
    fileout << *this;

    // Flush stdio buffers and commit to disk before returning
    fflush(fileout);
#ifdef __WXMSW__
    _commit(_fileno(fileout));
#else
    fsync(fileno(fileout));
#endif

    return true;
}

and

Code: (CBlock::ReadFromDisk)

bool ReadFromDisk(unsigned int nFile, unsigned int nBlockPos, bool fReadTransactions=true)
{
    SetNull();

    // Open history file to read
    CAutoFile filein = OpenBlockFile(nFile, nBlockPos, "rb");
    if (!filein)
        return error("CBlock::ReadFromDisk() : OpenBlockFile failed");
    if (!fReadTransactions)
        filein.nType |= SER_BLOCKHEADERONLY;

    // Read block
    filein >> *this;

    // Check the header
    if (CBigNum().SetCompact(nBits) > bnProofOfWorkLimit)
        return error("CBlock::ReadFromDisk() : nBits errors in block header");
    if (GetHash() > CBigNum().SetCompact(nBits).getuint256())
        return error("CBlock::ReadFromDisk() : GetHash() errors in block header");

    return true;
}

FLATDATA is defined in serialize.h like so:

Code: (FLATDATA)

//
// Wrapper for serializing arrays and POD
// There's a clever template way to make arrays serialize normally, but MSVC6 doesn't support it
//
#define FLATDATA(obj)   REF(CFlatData((char*)&(obj), (char*)&(obj) + sizeof(obj)))
class CFlatData
{
protected:
    char* pbegin;
    char* pend;
public:
    CFlatData(void* pbeginIn, void* pendIn) : pbegin((char*)pbeginIn), pend((char*)pendIn) { }
    char* begin() { return pbegin; }
    const char* begin() const { return pbegin; }
    char* end() { return pend; }
    const char* end() const { return pend; }

    unsigned int GetSerializeSize(int, int=0) const
    {
        return pend - pbegin;
    }

    template<typename Stream>
    void Serialize(Stream& s, int, int=0) const
    {
        s.write(pbegin, pend - pbegin);
    }

    template<typename Stream>
    void Unserialize(Stream& s, int, int=0)
    {
        s.read(pbegin, pend - pbegin);
    }
};

Now - and I apologize if I'm reading this wrong, this is a little more advanced C/C++ code than I'm used to - as I understand it, the FLATDATA call interprets the raw bytes of a CBlock object as an array (stream??) of characters. The CBlock::WriteToDisk method writes the constant 4-byte message header (0xf9, 0xbe, 0xb4, 0xd9), the size of the CBlock object in bytes, and then the FLATDATA of the CBlock it's writing to disk - which is just the raw bytes of the CBlock object. So after the header, the data written to file is byte-for-byte the same as the CBlock object represented in memory. Also, if I'm reading it correctly, CBlock::ReadFromFile copies those bytes directly into the space allocated for a CBlock object in memory to re-create the block. Is this correct?

Related question - I am under the impression that the exact way an instance of a C++ class is represented internally not guaranteed under standards; compiling a program with different compilers or different optimization flags can change the order in which member variables are stored in memory, and some debug mode compilers even add a few bytes between member variables to make memory inspection easier. I'm not positive about this, it's just something I've picked up and never seriously questioned.

Bitcoin / Development & Technical Discussion / Messing with queries

on: July 19, 2010, 05:56:44 AM

The Bitcoin network accepts blocks as valid if enough of the nodes accept the block as valid. The inherent security in this is that an attacker trying to manipulate which blocks are accepted as valid or not would need computing power approaching 50% of the entire network. But it is infeasible to actually query all or even most of the entire network (or at least it will be once the Bitcoin network becomes larger) so in reality each node must take a sample of the network. If the sample is large enough and randomly distributed over the entire network, there is no problem. But if a given node preferentially connects to a smaller subset of nodes, then that node is vulnerable - an attacker need only compromise that smaller subset. Obviously it is in the interests of each node operator to ensure that his node is truly unbiased in determining which node to get block data from. So, how does the Bitcoin client determine which nodes to connect to? I haven't gone through the source thoroughly enough to figure it out yet, but as far as I can tell it either asks the IRC channel who is online and queries those nodes, or sends a request to IRC directly and then gets a response from an arbitrary node? I think? I have two main concerns: 1. Is there any 'first-responder' bias in determining which node to ask for block data from? If there is, then a malicious user with access to a very low-latency connection could disrupt the network by responding extremely quickly to any request with some form of false data - perhaps simply by saying that no new block has been hashed yet - and get disproportionate influence over the consensus this way. If HighSpeedCheater can supply information so fast that he responds first 99% of the time to your query, then you need to do 68-69 queries just to get 50/50 odds that you reach a single node that isn't a HighSpeedCheater sockpuppet - and that node itself might be fooled by HighSpeedCheater and unwittingly be repeating false information. Probably unrealistic for an individual to pull this off, but I can see a big, widely distributed system like a botnet or Google being able to do it. Even then, nodes could get around it just by checking everything 140x as much as they used to. 2. Can a 'broadcaster' node influence which nodes it responds to? For example, could Alice set up her system so that every time Bob asks for the most recent block chain, she is the one (or disproportionately more likely to be the one) to answer him? If so, then Bob is basically at Alice's mercy unless he can figure out which addresses Alice is using faster than she can create new sockpuppets. There's also the possibility that the entire transaction chain could fork if the latency between any two approximately equal computing groups becomes larger than the average block-creation time. Suppose that there is roughly the same about of computing power dedicated to hashing Bitcoin blocks in two groups, say the US and Russia. If - and I can only see this happening if the time between new blocks is very short - it takes more time for a US node to ask a Russian node what the longest block chain is (and vice versa) than it takes for the Russian (or US) Bitcoin network to generate a new block, then chains could develop local to Russia and the US. Suppose a node in the US and Russia successfully hash the next block at about the same time, call them blocks 1001us and 1001ru. The network resolves the discrepancy by seeing whether 1002us or 1002ru gets hashed first. But since there's a delay in getting information from one country to the other, every node in the US that checks a Russian node for their longest chain length gets out-of-date information, so it will see an older, shorter chain than the chain it gets from local, lower latency nodes. And the same thing happens to the Russians, with the end effect that American nodes keep working on the branch containing 1001us because they can only see an older, shorter version of the 10001ru branch, while the Russians work on theirs for the same reason. As long as the chain grows faster than the speed of information, neither chain will ever dominate the other.

Bitcoin / Bitcoin Discussion / Which transactions go into which blocks?

on: July 18, 2010, 03:44:33 PM

I read the paper and did a brief forum search, so I apologize if this has already been asked/answered. How do nodes determine when to include a particular transaction into a block? There scenario I'm wondering about is this: Alice tries to double-spend her coins. As luck would have it, two different nodes simultaneously hash the next block, 1001, each containing one transaction of Alice's double spending attempt. Call them 1001a and 1001b. Then, a node working on 1001a hashes the next block, 1002, and every node sees that the chain containing 1000 -> 1001a -> 1002 is the longest (and most likely to be honest) and halts work on 1001b. One of Alice's two forks in her double-spending attempts is ignored, and her attempt therefore fails. The guy who was running the node that made 1001b sees that he almost made 50 bc and gets annoyed, but I guess that's acceptable. But what happens to the other transactions that were in block 1001b but not in 1001a? Do they get lost as well? Suppose Bob also bought something at about the same time as Alice, and his transaction ended up being hashed in 1001b (but not 1001a) as well. I assume that what happens next is that after chain 1002 gets accepted as legitimate, a node working on 1003 overhears Alice's and Bob's transactions, and then checks to see if the two are already in the chain. It will find that Alice's transaction is fake, and ignore it, and that Bob's transaction isn't in the chain yet, and add that to the block-in-progress. But then that means that every transaction has to keep echoing around for at least a few blocks to make sure that it ends up in the accepted chain - and every node that overhears a transaction has to check to make sure it isn't already in the list. The other question - and I know this must have been answered somewhere and I just couldn't find it - what determines transaction uniqueness? Say Alice buys a widget from Bob for 31.42 bc. Alice then decides she wants another widget, and places a new order - with the same keys for both buyer and seller - for another 31.42 bc. What makes these two transactions distinct? If a node sees one transaction for 31.42 from A to B, then sees a transaction for 31.42 from A to B, what tips it off that these are two distinct transactions rather than the same transaction detected twice?

Pages: [1]