Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: matt.collier on July 26, 2011, 08:27:45 PM



Title: Bitcoin client operating with a finite amount of disk space
Post by: matt.collier on July 26, 2011, 08:27:45 PM
Just finished reading some of the topics relating to concerns about the present and eventual size of the block chain.

Presently, the .bitcoin directory which contains the block chain is in excess of 600MB.  In my opinion, this is a non-trivial amount of disk space.  I among others have developed environments designed to assist with client security that involve running the client from a USB memory stick or other devices with resource limitations.

I read in one post, that it not necessary for every client to download/save the entire block chain in order to send/receive bitcoin.  Is a bitcoin client with this feature somewhere on the horizon?

Thanks for considering this issue and all your excellent work!

Matt


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: kjj on July 26, 2011, 08:45:15 PM
The exact shapes of lightweight clients and protocols are not yet known, but lots of people are thinking about / working on them.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: ctoon6 on July 26, 2011, 09:22:47 PM
the blockchain right now is not 600mb, its more like 400, excluding the index files. and that can be compressed to at least 80% of the original size. and (http://www.newegg.com/Product/ProductList.aspx?Submit=Property&N=100007960&IsNodeId=1&PropertyCodeValue=3094%3A29390%2C3094%3A25474&bop=And&Order=PRICE&PageSize=100). 16 gigs should be good for a linux install and 2 more years worth of blockchain worst case.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: matt.collier on July 26, 2011, 11:06:24 PM
the blockchain right now is not 600mb, its more like 400, excluding the index files. and that can be compressed to at least 80% of the original size. and (http://www.newegg.com/Product/ProductList.aspx?Submit=Property&N=100007960&IsNodeId=1&PropertyCodeValue=3094%3A29390%2C3094%3A25474&bop=And&Order=PRICE&PageSize=100). 16 gigs should be good for a linux install and 2 more years worth of blockchain worst case.

Can you say more about compressing the data?  How could this be accomplished in a transparent way?

I understand that there is a maximum number of transactions that can be included in a single block.  Do we know how much disk space a maxed-out block like this will consume?

You state that 16 gigs should should be good for 2 years worth of block chain data.  Is this a wild-ass guess or based on some reasonable assumptions?  If the estimate is based on some assumptions, would you please share the data you used in your calculations?


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: Sukrim on July 26, 2011, 11:41:50 PM
All non-miner clients need are block headers and access to a trusted node that has the transactions cached. The headers are small enough to not be a real issue.

Also you can remove all 0 balance accounts from the local chain again, which also results in very substantial space winnings.

Please write a patch at least for the latter one, it should already be possible to do right now in the current bitcoin client.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: payb.tc on July 27, 2011, 12:06:51 AM
but the block chain can even be shrunk with actual ZIP compression (or similar). would there be a way of making it compressed like this while being able to be accessed at the same time?

then when people install bitcoin for the first time, it'd be good if the client could download the blockchain in compressed form


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: ctoon6 on July 27, 2011, 01:17:18 AM
Can you say more about compressing the data?  How could this be accomplished in a transparent way?

I understand that there is a maximum number of transactions that can be included in a single block.  Do we know how much disk space a maxed-out block like this will consume?

You state that 16 gigs should should be good for 2 years worth of block chain data.  Is this a wild-ass guess or based on some reasonable assumptions?  If the estimate is based on some assumptions, would you please share the data you used in your calculations?

so far bitcoin has been out for around 30 months and has only just reached about 400mb, 500 if you count indexes. thats 2.5 years.

the past 100,000 blocks have an average size of 4244b
the past 50000 7900b
25000 13558b
10000 22871b
5000 24994b
1000 23705b
500 23170b

you can see that it doubles very roughly every 40000-60000 blocks. but this figure could very easily not work depending on bitcoin growth or death. so the average a year from now would be 40000b, so lets just assume from now on the size per block is 40000b,
40000*6*24*365
now double 40000
80000*6*24*365
add
get 6,307,200,000
i don't know if these figures are bits or bytes, but ill assume bytes.
5.87 gigabytes, assuming worst case scenarios. this means i used numbers that would exist at the end of the, at the beginning. so mathematically the numbers can not be higher than this.

again these numbers are probably wrong because of human behavior, my self and others, but it also seems to look like Moore's law a bit, exept the numbers are doubling sooner than every 18 months.

i got the data from block explorer btw


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: NetTecture on July 27, 2011, 05:07:14 AM
All non-miner clients need are block headers and access to a trusted node that has the transactions cached.

Ah, no. All a non miner client needs is a prototcol talking to atrusted server.

No need to store anything except addresses and private keys.

You can send signed transactions to the server and get balacnes and new transfers from the server.

Stick it THIN - so you dont need to sync anything. Laptop open, check, finished.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: kjj on July 27, 2011, 05:19:37 AM
The block chain is not very compressible, since most of it is hashes.  I got 21% space savings with gzip and 22% savings with bzip2.  (436690683 vs. 345345519 vs. 341013976 bytes)

The real magic comes in when you realize that you can prune old transactions.  Since any transaction in the chain can be the input for at most one new transaction, you can delete any transaction that was spent more than X blocks ago, with no ill effects.  Someone wrote a tool for that, and if I recall, he reported that something like 70% of the chain can be pruned already.

All non-miner clients need are block headers and access to a trusted node that has the transactions cached.

Ah, no. All a non miner client needs is a prototcol talking to atrusted server.

No need to store anything except addresses and private keys.

You can send signed transactions to the server and get balacnes and new transfers from the server.

Stick it THIN - so you dont need to sync anything. Laptop open, check, finished.

The trusted server part is currently difficult, so I expect a medium-weight client to pop up and be useful sooner than a fully stripped lightweight client.

That is, unless one of the 3 or more people/groups working on hardware wallets makes some big progress before a serious smartphone developer gets the itch to code up a medium client.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: timmey on July 27, 2011, 07:42:22 AM
so far bitcoin has been out for around 30 months and has only just reached about 400mb, 500 if you count indexes. thats 2.5 years.
yes, but bitcoin is also still waiting for it's major break through and doesn't really have many users and shops yet.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: payb.tc on July 27, 2011, 11:17:09 AM
so far bitcoin has been out for around 30 months and has only just reached about 400mb, 500 if you count indexes. thats 2.5 years.
yes, but bitcoin is also still waiting for it's major break through and doesn't really have many users and shops yet.

if the first 2.5 years made 400mb, i bet the *next* 2.5 years would easily make an extra 4000mb (if not compressed or pruned).


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: Bitsky on July 27, 2011, 11:51:46 AM
the blockchain right now is not 600mb, its more like 400, excluding the index files. and that can be compressed to at least 80% of the original size. and (http://www.newegg.com/Product/ProductList.aspx?Submit=Property&N=100007960&IsNodeId=1&PropertyCodeValue=3094%3A29390%2C3094%3A25474&bop=And&Order=PRICE&PageSize=100). 16 gigs should be good for a linux install and 2 more years worth of blockchain worst case.

so far bitcoin has been out for around 30 months and has only just reached about 400mb, 500 if you count indexes. thats 2.5 years.

My current bitcoin directory has 705MB right now.
A backup from the 18th has 612MB.
Another from the 14th has 592MB.

So, from 14th->18th, disk space increased by 20MB; that's 4MB/day.
From 18th->27th, another 93MB were saved, which means 9.3MB/day.

Of course it's easy to say "quit whining, diskspace is cheap", but when Bitcoin wants to enter the mobile/smartphone market, diskspace and traffic does matter.
Also, if you are an optimist and hope that Bitcoin will catch on and grow quickly, then the number of transactions will increase which in turn needs even more diskspace.
So, the faster Bitcoin grows, the bigger the storage requirements. This means the initial download/verify time will also increase and most likely anger new users.
Last but not least, when reaching a certain amount of transactions per second, the everyday John Doe simply won't have the bandwith to deal with the blockchain growth.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: netrin on July 27, 2011, 03:09:13 PM
I would like to see a ewallet service that uses a finite set of keys for each user wallet and lets the user download a copy.

And/or a service that handles the block chain and protocol allowing my client to deal only with my transactions and keys such as Webcoin and BitcoinJS are attempting:

http://bitcoinjs.org/img/architecture/architecture.png


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: vector76 on July 27, 2011, 04:57:28 PM
I was thinking about how to remove transactions when all the outputs have been used, but it seems to me that at least the transaction hash must be kept, because without it, it's impossible to tell the difference between an orphan transaction and a double-spend.  Although since neither are fully valid, it might make sense to discard them both, and if the orphan later becomes a non-orphan, hope that it will be retransmitted.

I'm not sure how much space savings this would provide, but I think it would be substantial.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: phillipsjk on July 27, 2011, 05:08:33 PM
the blockchain right now is not 600mb, its more like 400, excluding the index files. and that can be compressed to at least 80% of the original size. and (http://www.newegg.com/Product/ProductList.aspx?Submit=Property&N=100007960&IsNodeId=1&PropertyCodeValue=3094%3A29390%2C3094%3A25474&bop=And&Order=PRICE&PageSize=100). 16 gigs should be good for a linux install and 2 more years worth of blockchain worst case.

It is my understanding that the nodes save multiple copies of the block-chain in case of a split or one of the block-chains becomes the "longest" one. I have had a test-node running since June 9, 2011 (0.2.22 and 0.2.23) for a total of 55 days. It ran out of disk space today; consuming 5.8 GB. That works out to 105MB per day. Disk usage dropped to 4.9GB when the client exited. The client had 125 connections during peak times.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: ctoon6 on July 27, 2011, 06:06:25 PM
It is my understanding that the nodes save multiple copies of the block-chain in case of a split or one of the block-chains becomes the "longest" one. I have had a test-node running since June 9, 2011 (0.2.22 and 0.2.23) for a total of 55 days. It ran out of disk space today; consuming 5.8 GB. That works out to 105MB per day. Disk usage dropped to 4.9GB when the client exited. The client had 125 connections during peak times.
I have no idea about the test network, but my entire %appdata%\bitcoin dir has never went over 800mb, yet, but i would assume it will be by the end of next month at most.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: phillipsjk on July 27, 2011, 06:13:08 PM
I was using the "real" network with the official client in -gen mode.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: kjj on July 27, 2011, 06:20:21 PM
the blockchain right now is not 600mb, its more like 400, excluding the index files. and that can be compressed to at least 80% of the original size. and (http://www.newegg.com/Product/ProductList.aspx?Submit=Property&N=100007960&IsNodeId=1&PropertyCodeValue=3094%3A29390%2C3094%3A25474&bop=And&Order=PRICE&PageSize=100). 16 gigs should be good for a linux install and 2 more years worth of blockchain worst case.

It is my understanding that the nodes save multiple copies of the block-chain in case of a split or one of the block-chains becomes the "longest" one. I have had a test-node running since June 9, 2011 (0.2.22 and 0.2.23) for a total of 55 days. It ran out of disk space today; consuming 5.8 GB. That works out to 105MB per day. Disk usage dropped to 4.9GB when the client exited. The client had 125 connections during peak times.

No, it doesn't save multiple copies.  When there is a fork, it keeps both blocks, but it doesn't need to make a copy of the rest of the chain to do it.

Check to see if you have a debug.log.  If I don't clear mine often it gets huge.  Currently around 1.5 GB.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: phillipsjk on July 27, 2011, 06:45:49 PM
So there is: 4.2GB. (Gedit tried to load the whole thing..)

Code:
ubuntu@ubuntu:/media/803819A438199A6C/bitcoins$ tail debug.log 
StopNode()
Running BitcoinMiner with 2 transactions in block
ThreadBitcoinMiner exiting, 0 threads remaining
DBFlush(true)
blkindex.dat refcount=0
blkindex.dat flush
wallet.dat refcount=0
wallet.dat flush
Bitcoin exiting


I suppose that is what I get for running the "beta" version: It is saving a lot of debugging information. Log rotation would probably help, but I doubt it is a priority if only used for debugging.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: kjj on July 27, 2011, 08:42:53 PM
http://forum.bitcoin.org/?topic=292.0


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: etotheipi on July 28, 2011, 09:19:55 PM
Quote
I was thinking about how to remove transactions when all the outputs have been used, but it seems to me that at least the transaction hash must be kept, because without it, it's impossible to tell the difference between an orphan transaction and a double-spend.  Although since neither are fully valid, it might make sense to discard them both, and if the orphan later becomes a non-orphan, hope that it will be retransmitted.

I don't see why you need to keep that transaction.  The entire state of the network (and everyone's balances) can be determined solely by the set of unused TxOut objects and the hash/index of their parent Tx object.  And that's all the information you need to sign new transactions.  All the TxIns and previous TxOuts are only necessary for blockchain verification, but that is done by the miners before they include them in a block.  Your client can get away with storing just the TxOut information above, and trust that they are valid because they were part of the longest blockchain (which is difficult to fake), and would not be be there if they weren't valid.

You can store all the Tx's in a tree data structure, whose values are arrays of TxOut objects.  As TxOut's are spent, you can remove them from the array, saving about 40 bytes for each of them.  When the last TxOut in the array is spend, you can also remove the Tx node, which saves another 40 bytes (approx).  You would only need to keep that data (and/or its hash) if you were concerned about verifying the transaction history.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: vector76 on July 28, 2011, 09:39:06 PM
Quote
I was thinking about how to remove transactions when all the outputs have been used, but it seems to me that at least the transaction hash must be kept, because without it, it's impossible to tell the difference between an orphan transaction and a double-spend.  Although since neither are fully valid, it might make sense to discard them both, and if the orphan later becomes a non-orphan, hope that it will be retransmitted.

I don't see why you need to keep that transaction.  The entire state of the network (and everyone's balances) can be determined solely by the set of unused TxOut objects and the hash/index of their parent Tx object.  And that's all the information you need to sign new transactions.

I agree everything that needs to be known about current balances exists in unspent outputs, and I agree you can authenticate valid transactions without keeping the spent outputs or even the hashes of the spent Tx.

Are you saying it's possible to distinguish between a double-spend and an orphan?  Or that you don't need to distinguish between them?  If it's the latter, I would agree for space-constrained nodes you can just discard an orphan/double-spend without worrying which case it happens to be.  Maybe you could even give it the benefit of the doubt and hang on to it until you see the next block or two before you toss it out.

I'd be curious to know quantitatively what fraction of the space is taken by transactions whose outputs have all been used.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: etotheipi on July 28, 2011, 09:53:24 PM
I'm saying the latter -- I don't see why the client ever needs to distinguish between those two, it only matters whether the transaction is valid. 

The only difference might be that a 0-confirmation transaction is a bit less trustworthy on a light node, because it doesn't have the ability to verify the transaction itself.  It only knows for sure when it sees that transaction in a block, or can guess with high confidence that it wouldn't have received that Tx data unless it was valid, since invalid Tx's don't get very far in the network.

I'm working on some block-chain analysis tools right now, and playing/testing with the 374 MB of data up to block 136496.  I'm not quite there yet, but I share your intrigue and might take a shot at that calculation in the next few days.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: krepta3000 on July 28, 2011, 11:10:56 PM
I'm not worried about block chain size nearly as much as log file size.  Is there some way to make bitcoin restrain it's log file to a certain size, deleting older log data as it goes?  I've had to switch storage spaces several times to accommodate the log file, or delete the log file after shutting down the bitcoin client.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: etotheipi on July 28, 2011, 11:21:03 PM
What is even in the log file?  I never knew that it existed (and was a concern) until this thread...

I am much more concerned about the blockchain file, because that's a critical part of the protocol.  Presumably, the log file can be trimmed... but the blk0001.dat cannot.  And the more successful bitcoin becomes, the more that filesize is going to spiral out of control.

I guess there's no options for the miners, they're going to have to hold the whole file no matter what.  But for the users, a reduced set will become necessary.  Maybe not just yet, but eventually.  Once the BTC network starts processing 1000 transactions per block, the blockchain is going to grow about 10-30 MB per day... or possibly 10 GB per year.  It can still be handled by the miners, but the average user isn't going to want to hold that much data just to use the program.


Title: Re: Bitcoin client operating with a finite amount of disk space
Post by: vector76 on July 28, 2011, 11:40:14 PM
I'm not worried about block chain size nearly as much as log file size.  Is there some way to make bitcoin restrain it's log file to a certain size, deleting older log data as it goes?  I've had to switch storage spaces several times to accommodate the log file, or delete the log file after shutting down the bitcoin client.
Just found that there is an undocumented -printtoconsole option that will attempt to write to stdout instead of to the log file.  It may or may not succeed in writing to stdout but it seems it does suppress appending to the log file.