Bitcoin Forum

Bitcoin => Bitcoin Discussion => Topic started by: uyjulian on April 29, 2013, 08:49:35 PM



Title: Can't the blockchain be compressed?
Post by: uyjulian on April 29, 2013, 08:49:35 PM
I thought about making a minimal system just for managing bitcoins... but the blockchain is HUGE... Anyway to make it smaller ?


Title: Re: Can't the blockchain be compressed?
Post by: cypherdoc on April 29, 2013, 09:25:48 PM
it can be pruned by lopping off all spent tx's.


Title: Re: Can't the blockchain be compressed?
Post by: Frozenlock on April 29, 2013, 09:36:50 PM
What do you mean it's huge?

How large do you think your bank's databases are?


Title: Re: Can't the blockchain be compressed?
Post by: the founder on April 29, 2013, 11:14:26 PM
What do you mean it's huge?

How large do you think your bank's databases are?

I don't download the bank's database each time I login to see the 5 dollars I have in my personal account.



Title: Re: Can't the blockchain be compressed?
Post by: Frozenlock on April 29, 2013, 11:18:23 PM
I don't download the blockchain everytime I login on a light wallet to see the 5 dollars I have in my personal account.


Title: Re: Can't the blockchain be compressed?
Post by: CasinoBit on April 30, 2013, 09:17:31 AM
I don't download the blockchain everytime I login on a light wallet to see the 5 dollars I have in my personal account.

https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcSyY_hSfLAYzJOJ4EiWXOYrVoAJsTI-II-s0uDMyHSxR9sO1G1m


Title: Re: Can't the blockchain be compressed?
Post by: oakpacific on April 30, 2013, 09:36:06 AM


http://en.wikipedia.org/wiki/Private_information_retrieval


Title: Re: Can't the blockchain be compressed?
Post by: CasinoBit on April 30, 2013, 09:41:16 AM
I don't download the blockchain everytime I login on a light wallet to see the 5 dollars I have in my personal account.

https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcSyY_hSfLAYzJOJ4EiWXOYrVoAJsTI-II-s0uDMyHSxR9sO1G1m

http://en.wikipedia.org/wiki/Private_information_retrieval

No I was actually genuinely saying that it is a valid point, people who are new to the Bitcoin economy always seem to go ahead and download Bitcoin-qt and wait for the whole chain to download just to send a couple of coins. Wonder what will they do in 3 years when this will only be viable for servers.


Title: Re: Can't the blockchain be compressed?
Post by: oakpacific on April 30, 2013, 09:42:39 AM
I don't download the blockchain everytime I login on a light wallet to see the 5 dollars I have in my personal account.

https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcSyY_hSfLAYzJOJ4EiWXOYrVoAJsTI-II-s0uDMyHSxR9sO1G1m

http://en.wikipedia.org/wiki/Private_information_retrieval

No I was actually genuinely saying that it is a valid point, people who are new to the Bitcoin economy always seem to go ahead and download Bitcoin-qt and wait for the whole chain to download just to send a couple of coins. Wonder what will they do in 3 years when this will only be viable for servers.

Yup, fixed.

The idea is Bitcoin enables you to either run your own bank or have someone else running it for you.

And running a bank/clearing house is never easy, despite how corrupt a lot of banksters are.


Title: Re: Can't the blockchain be compressed?
Post by: JeromeS on April 30, 2013, 08:21:50 PM
I thought the same thing before and came up with a few ideas on how this can be done.
The problem, basically, is that the data in the blockchain is already very dense. I remember doing a few tests and finding out that it can't be compressed to less than 60% of its size.

For anyone who's interested, these are the ideas I had:
  • Replacing the previous block's hash with the current block's height in the block format, saves 28 bytes/block, 6 megabytes overall, almost nothing.
  • Removing the Merkle root, can be deduced from the transactions in the block, saves 32 bytes/block, 7 megabytes, almost nothing.
  • Replacing the previous output field in each input with the output's block number and its transaction's order in the block, saves 26 bytes per input. assuming 1 input/transaction, and going by blockchain.info (https://blockchain.info/charts/n-transactions-total)'s 17000000 transactions, this would save 431 megabytes.
  • For 99.999% of outputs, there are four bytes (OP_DUP,OP_HASH160,OP_EQUALVERIFY,OP_CHECKSIG) that can be removed (i.e. they would be "implied" in the compressed format unless otherwise indicated). again, assuming 1 output/transaction, this would save 68 megabytes.
  • Repeat uses of the same address. If the average address receives 5 payments, then 4 of those five can be replaced by a reference to the first time it appears in the blockchain (4-7 bytes long). With each address taking up 20 bytes, this saves 13 bytes per repeat output (4/5th), possibly another 172 megabytes.

With the current blockchain taking up 9 GIGABYTES of disk space and taking weeks to be downloaded and verified, this doesn't really change anything.

The size of the blockchain is a fundamental flaw in bitcoin and nothing is going to change that. It just happens to be the best technology we have now, but the moment something else is invented that has the same advantages and fewer drawbacks, bitcoin will be wiped out.


Title: Re: Can't the blockchain be compressed?
Post by: kjj on April 30, 2013, 08:45:04 PM
It really isn't hard to run the blockfiles through bzip2 to see how much compression is really possible.  If I recall correctly, when I did it, I  got about 30% compression on the block files.


Title: Re: Can't the blockchain be compressed?
Post by: scintill on April 30, 2013, 08:58:58 PM
I thought the same thing before and came up with a few ideas on how this can be done.
The problem, basically, is that the data in the blockchain is already very dense. I remember doing a few tests and finding out that it can't be compressed to less than 60% of its size.

For anyone who's interested, these are the ideas I had:
  • Replacing the previous block's hash with the current block's height in the block format, saves 28 bytes/block, 6 megabytes overall, almost nothing.
  • Removing the Merkle root, can be deduced from the transactions in the block, saves 32 bytes/block, 7 megabytes, almost nothing.
  • Replacing the previous output field in each input with the output's block number and its transaction's order in the block, saves 26 bytes per input. assuming 1 input/transaction, and going by blockchain.info (https://blockchain.info/charts/n-transactions-total)'s 17000000 transactions, this would save 431 megabytes.
  • For 99.999% of outputs, there are four bytes (OP_DUP,OP_HASH160,OP_EQUALVERIFY,OP_CHECKSIG) that can be removed (i.e. they would be "implied" in the compressed format unless otherwise indicated). again, assuming 1 output/transaction, this would save 68 megabytes.
  • Repeat uses of the same address. If the average address receives 5 payments, then 4 of those five can be replaced by a reference to the first time it appears in the blockchain (4-7 bytes long). With each address taking up 20 bytes, this saves 13 bytes per repeat output (4/5th), possibly another 172 megabytes.

1) I don't think this works.  It needs a pointer to the previous block to form the "chain" of blocks.
2) I think the merkle root is important to preserve for pruning, a theoretical way to cut down of blockchain size that hasn't been fully implemented.  Why would there be a second hash if it were redundant?
3) If you do this, then it's not possible to construct a spend until the funding tx is in a block, and is deep enough that it will not be reorganized into another block.  The person spending has to watch the network for reorgs and recreate their tx.  With block-agnostic transactions, the network can hold their tx and just rebroadcast it for them if necessary.
4) I would rather see Script abandoned in favor of hard-coded transaction types, than to put in some hack like "if the script just pushes a pubkey-sized thing onto the stack, infer that it means DUP HASH160 EQUALVERIFY CHECKSIG."
5) You could also get everyone to use compressed public keys (saves ~32 bytes per spend, I'm looking at you SatoshiDice -- last I looked they are still using uncompressed addresses), and consider a scheme where the client can somehow infer the publickey to be used for a spend if it has been seen on the network before.  This just feels dirty and could be difficult to make performant, though.