: Hello, now that this was moved from the newbie forum, I have removed the section at the beginning in which I complained about some unrelated (and probably not interesting to anybody) aspects of bitcoin. If anyone feels it is important to see these, please let me know and I will re-post them.END EDITProposal: Bitcoin Elided BlocksIntroduction:
The bitcoin protocol in its most basic form does not scale for end users. Downloading the entire block chain currently, as of June 16, 2011, requires several hundred megabytes of data transfer, and the rate of generation of new data is increasing. Within a year it is likely that the block chain will be over 1 GB in size. It is impractical to expect that end users will be able to download this much data to start using bitcoin, and additionally that they will have sufficient resources to download new blocks on an ongoing basis at this increasing rate.
Furthermore, there are some current clients (smartphones as an example) for which the current multi-hundred-megabyte block chain download is already impractical. Therefore this problem is already relevant to the success of bitcoin (how can bitcoin succeed if new users are unable to practically join?) and will only become more relevant as time goes on. Right Now is a criticial time for bitcoin: the barriers to entry must be as minimal as possible in order for the user base to grow to a self-sustaining level.The Problem:
Currently the bitcoin client validates transactions locally by examining the entire block chain. This is the reason that the client must download the entire block chain - to be able to validate transactions. The client does not need to download blocks to initiate a transaction, but even then, it is likely that the user will want to maintain an up-to-date local block chain so that they can verify that any payments they send were received.
As of this writing (June 16 2011, 7:17 pm PDT), the total number of blocks is 131356, the average transaction size is 341 and the average block size is 2063, considering the entire block chain. However, these average values are misleading because they incorporate a large number of blocks that were generated during the first year and a half or so of bitcoin's existence where the blocks and transactions were very small as they were exclusively coin generation blocks. Using the last 1,000 blocks yields an average transaction size of 419 and an average block size of 22673.
Thus, currently the total size of all blocks is 270987428 bytes (258.433 MiB); if all blocks were as large as the average over the last 1,000 blocks, then the total size would be 2978234588 bytes, or 2.773 GiB. Even at today's relatively low volume, less than two more years of blocks would occupy almost 3 GiB. It is clearly impractical for all clients to download and store all of this data.A Proposed Solution:
A proposed solution is for clients to store "elided blocks". "Elided blocks" are blocks for which transactions not of interest have been stripped out, leaving only transactions interesting to the client. For most blocks, this will be all transactions, reducing the block to 80 bytes. Using the previous example of 131356 blocks, this reduces the storage to a lower bound of 10.021 MiB. Of course, some blocks will have transactions of interest, but these should add an average only the sizes of the transactions themselves plus the Merkle hash tree entries for that transaction; 1000 bytes seems generous given a 419 byte average transaction size. Thus a user with 10 active bitcoin addresses, each of which requires a chain of 100 transactions to verify their contents, might need only an extra 0.954 MiB of data.
So how do clients acquire "elided blocks"? For each client, the transactions of interest are unique, so each client's set of elided blocks is different. Therefore it is not possible to store elided blocks ahead of time in a shared pool; they must be fabricated on demand for each client.
One way to do this would be for each client to download the entire block chain and then elide transactions of interest. However, this leaves some problems unsolved:
1. The client still must download the entire block chain as input to the eliding operation.
2. The client would need to re-download the entire block chain whenever a new bitcoin address became interesting to the client, in order to re-elide with the new set of interesting transactions.
This solution is clearly not appropriate as it does not solve one of the fundamental problems (downloading the entire block chain) and it adds an additional problem of requiring re-downloading of the entire block chain over and over again with each new bitcoin address of interest.
A better solution would be for a client to ask its peers for "elided blocks", which would allow the client to download elided blocks in a progressive manner. The client would indicate bitcoin addresses of interest and its peer would provide the elided blocks. The client could come back later and query for all new elided blocks since the last time that the client queried in order to acquire the most up-to-date transactional information for bitcoin addresses of interest. The client could also come back later and query about an entirely new bitcoin address and receive only the elided blocks for that address.
Note that the client must always receive enough information in order to verify the transactions itself. It is not asked to trust the peer's verification of a transaction chain; instead it must only trust that the peer is delivering accurately elided versions of the true block chain blocks, and it would do the verification itself.
What is to prevent a peer from fabricating elided blocks in response to client queries? The client simply needs to validate each block header against the set of block headers that it acquired from the peer ahead of time (which themselves should be validated against other sources of the block headers), in order to ensure that the peer could not have fabricated any elided block on demand, as it would be impossible for the peer to fabricate a block and yet have the block header match the true bitcoin block header for that block.Client Identification:
Because there are costs of storing the entire block chain and eliding blocks on demand, peers are given the option to charge for this service. The means for accepting payment and verifying account balances are outside of the scope of the bitcoin protocol; but the protocol does include the key ingredient necessary to facilitate such a system: the requests may include an optional bitcoin address and be signed by that address, in order to associate the request with an account, that account being known to the peer by the bitcoin address that pays into it. This is optional - peers can be completely free and never require signed requests, or can be completely for-hire and only accept signed requests, or accept both kinds of requests and give priority to paid accounts, or anything in between.Postscript:
This last bit, where I talk about client signed requests, is part of the proposal. I think that all client requests ought to be optionally signable so that bitcoin peers can decide who they want to talk to (i.e. who has paid them for the service and is carrying a positive balance) and optionally charge. I think this should be some fundamental part of the protocol that can apply (at the client's discretion and the peer's acceptance) to any protocol message.
I was going to write up the protocol messages that would be added to the bitcoin protocol (to be added to the Bitcoin Protocol wiki page
), but I'm not going to bother. Like I said, maybe some developer will find what I have proposed useful, if not then ... please realize that the problem does need to be solved, even if what I have proposed is not the best way. But please take my advice: you have to find a way now
to solve the current bitcoin client data scalability problem. In a matter of months, bitcoin clients as they currently stand will be impractical due to the ever-growing size of the block chain.