Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: mechs on June 30, 2013, 09:36:29 PM



Title: Blockchain Compression
Post by: mechs on June 30, 2013, 09:36:29 PM
   Are there any plans at some point to add blockchain compression technology to the bitcoin client? The blockchain size will continue to increase exponentially as adoption increases and will become a victim of its own success.  It would seem to me adding some compression algorithms to the bitcoin-qt client would be of significant benefit and eventually an outright necessity to maintain the integrity of the system.


Title: Re: Blockchain Compression
Post by: nomailing on June 30, 2013, 10:38:17 PM
Did you see the threads about Ultimate Blockchain compression?
Here is the concept: https://bitcointalk.org/index.php?topic=88208.0 (https://bitcointalk.org/index.php?topic=88208.0)
And here is the current implementation in progress: https://bitcointalk.org/index.php?topic=204283.0 (https://bitcointalk.org/index.php?topic=204283.0)


Title: Re: Blockchain Compression
Post by: mechs on July 01, 2013, 12:04:11 AM
Why is this being done separately from the core development team?


Title: Re: Blockchain Compression
Post by: dillpicklechips on July 01, 2013, 05:28:04 PM
Why is this being done separately from the core development team?
Probably busy! Always lot's of stuff to do.
And, why not?  ;)


Title: Re: Blockchain Compression
Post by: mechs on July 02, 2013, 01:06:46 AM
Wouldn't this work best if it was integrated into the official client as opposed to a secondary blockchain?


Title: Re: Blockchain Compression
Post by: Mike Hearn on July 02, 2013, 08:01:03 AM
"Ultimate blockchain compression" does not compress the block chain. It's misnamed.

The most important piece of the chain is the unspent outputs set (UTXO set). This is already highly compressed using a custom compression algorithm. The rest of the chain is not compressed, but the solution to growth there is called pruning, which allows for deletion of old data.


Title: Re: Blockchain Compression
Post by: TierNolan on July 02, 2013, 09:15:48 AM
Wouldn't this work best if it was integrated into the official client as opposed to a secondary blockchain?

The idea is to make it backward compatible with all other clients.  If it becomes popular, it could become part of the main standard.

From the thread, there is an intention to merge mine it.  This means that every so often, there would be a block on the main chain that references the alt-chain.  It also means that miners can mine both chains at the same time, so hopefully, it would have similar POW to the main chain.

Having said that, it can't give block rewards, so there is less incentive to merge mine it.


Title: Re: Blockchain Compression
Post by: mechs on July 02, 2013, 12:18:42 PM
Has the main dev team discussed these ideas at all? I think we all see the blockchain size will increase exponentially as acceptance of bitcoin grows.


Title: Re: Blockchain Compression
Post by: colinistheman on July 02, 2013, 10:41:18 PM
Would pruning delete bitcoin ledger entries if the person hadn't used it before the date of the pruning?

I guess the obvious answer is of course they wouldn't just make people's bitcoin disappear... but how would you avoid that if you were "pruning" the blockchain??


Title: Re: Blockchain Compression
Post by: hazek on July 03, 2013, 12:10:04 AM
Has the main dev team discussed these ideas at all? I think we all see the blockchain size will increase exponentially as acceptance of bitcoin grows.

Unfortunately Gavin has different priorities. He intends to solve a problem that was already solved by private companies by implementing it into the reference client instead of focusing on scalability which he knows when push comes to shove he will likely be able to lobby and get enough support for a hard fork that removes the blocksize limit and he basically doesn't care how big the blockchain gets or how few full nodes we have.


Title: Re: Blockchain Compression
Post by: TierNolan on July 03, 2013, 09:18:11 AM
He intends to solve a problem that was already solved by private companies by implementing it into the reference client

So, what are they working on?


Title: Re: Blockchain Compression
Post by: hazek on July 03, 2013, 09:20:05 AM
He intends to solve a problem that was already solved by private companies by implementing it into the reference client

So, what are they working on?

An integrated secure invoice system. Something the functionality of which Bitpay and all the other similar platforms already offer.


Title: Re: Blockchain Compression
Post by: TierNolan on July 03, 2013, 09:21:48 AM
An integrated secure invoice system. Something the functionality of which Bitpay and all the other similar platforms already offer.

Hmm, I would tend to agree that focusing on "core" functionality of bitcoin should be highest priority.  Would the intention be that this includes some kind of protocol change?


Title: Re: Blockchain Compression
Post by: kjj on July 03, 2013, 10:53:57 AM
He intends to solve a problem that was already solved by private companies by implementing it into the reference client

So, what are they working on?

An integrated secure invoice system. Something the functionality of which Bitpay and all the other similar platforms already offer.

I think your understanding of the payment protocol is incorrect.


Title: Re: Blockchain Compression
Post by: Liquid on July 03, 2013, 11:24:06 AM
   Are there any plans at some point to add blockchain compression technology to the bitcoin client? The blockchain size will continue to increase exponentially as adoption increases and will become a victim of its own success.  It would seem to me adding some compression algorithms to the bitcoin-qt client would be of significant benefit and eventually an outright necessity to maintain the integrity of the system.


+1 running out of space here  :(

also i have a question a bit off topic but can a wallet.dat file get corrupted ?


Title: Re: Blockchain Compression
Post by: solex on July 03, 2013, 11:32:39 AM
"Ultimate blockchain compression" does not compress the block chain. It's misnamed.

The most important piece of the chain is the unspent outputs set (UTXO set). This is already highly compressed using a custom compression algorithm. The rest of the chain is not compressed, but the solution to growth there is called pruning, which allows for deletion of old data.

Yes, you are technically right, but "compression" captures the thrust of the Reiner-Maaku project for public consumption. I see this as the single most important piece of work being done on Bitcoin at the moment as it partly addresses the long-term scalability problem.


Title: Re: Blockchain Compression
Post by: hazek on July 03, 2013, 11:50:52 AM
He intends to solve a problem that was already solved by private companies by implementing it into the reference client

So, what are they working on?

An integrated secure invoice system. Something the functionality of which Bitpay and all the other similar platforms already offer.

I think your understanding of the payment protocol is incorrect.

You know I really don't care if that's true. Because what I do know is that it's wasting time on a problem that we might have over spending vital time we don't have on a problem that we have RIGHT NOW which frustrates me to no end.

A CEO with his prioritization skills would have been fired long ago.


Title: Re: Blockchain Compression
Post by: kjj on July 03, 2013, 12:17:37 PM
You know I really don't care if that's true. Because what I do know is that it's wasting time on a problem that we might have over spending vital time we don't have on a problem that we have RIGHT NOW which frustrates me to no end.

A CEO with his prioritization skills would have been fired long ago.

There is certainly room for honest disagreement here.  One could argue that the lack of authentication for addresses and transactions is a problem right now, while the size of the blockchain does not appear to be a problem today, but might become one in the future.

At any rate, Gavin isn't a CEO, he is an engineer.  The payment protocol is something that can surely be solved, while the solution to the blockchain size is unclear.  You can hardly fault an engineer that wishes to solve a problem he knows he can solve in preference to a spinning his wheels on something vague.


Title: Re: Blockchain Compression
Post by: Mike Hearn on July 03, 2013, 12:59:07 PM
The solution to both problems is clear and Gavin's prioritization is correct. The block chain is only 8 gigabytes today. That's nothing. If you can't keep up with that then just switch to MultiBit and you'll not need to store the chain any more. It'll be much faster as well.

Again, the "ultimate blockchain compression" thing doesn't make any technical sense, that's why it's not happening. Pruning is something that Pieter Wiulle has said he wants to work on. Most of the work was already done. It requires some protocol upgrades and some other tricky work, then you will probably be able to tell Bitcoin-Qt/bitcoind how much disk space you want to give it. It'll store as much of the chain as it can within that space.

The payment protocol is foundational work, it is required for a lot of other features. That's why it's important. Pruning is a needed scalability fix but we're far from having some kind of crisis because people can't afford the disk space. A lot of people blow more than 8 gigs of disk space on apps they simply forgot about.


Title: Re: Blockchain Compression
Post by: hazek on July 03, 2013, 01:23:47 PM
The solution to both problems is clear and Gavin's prioritization is correct. The block chain is only 8 gigabytes today. That's nothing. If you can't keep up with that then just switch to MultiBit and you'll not need to store the chain any more. It'll be much faster as well.

Yes, let's screw decentralization while we are at it, right? I know you wouldn't mind Bitcoin becoming controlled by a hand full of super nodes..


Title: Re: Blockchain Compression
Post by: hazek on July 03, 2013, 01:29:56 PM
The payment protocol is foundational work, it is required for a lot of other features. That's why it's important. Pruning is a needed scalability fix but we're far from having some kind of crisis because people can't afford the disk space. A lot of people blow more than 8 gigs of disk space on apps they simply forgot about.

Man, for a brilliant programmer I can't believe what utter shit you sometimes post.

Yes the disk space currently isn't a problem. But you know damn well that it only takes a little over 2 more doublings of transaction volume in terms of space and we're going to reach the hard limit of 1Mb. And then what, chief? Not to mention that this will mean 55GB blockchain size increase per year but undoubtedly you and Gavin will both lobby for a reckless hardfork then instead of trying to find a different solution now.

But I'm perfectly aware where your Google spoiled centrally planning head is coming from. You want more control cause it's easier from a technical standpoint, you don't care one bit about the decentralization part of Bitcoin and you don't care what sort of consequences not focusing on this issue now and forcing a hardfork then will have.


Title: Re: Blockchain Compression
Post by: TierNolan on July 03, 2013, 01:45:40 PM
The payment protocol is foundational work, it is required for a lot of other features.

Is there a summary of what is being considered here?


Title: Re: Blockchain Compression
Post by: maaku on July 03, 2013, 07:31:57 PM
Again, the "ultimate blockchain compression" thing doesn't make any technical sense, that's why it's not happening.

I'd love to hear your technical arguments regarding that.


Title: Re: Blockchain Compression
Post by: d'aniel on July 03, 2013, 08:02:08 PM
Again, the "ultimate blockchain compression" thing doesn't make any technical sense, that's why it's not happening.

I'd love to hear your technical arguments regarding that.
He described them here once: https://bitcointalk.org/index.php?topic=93606.msg1607170#msg1607170 (https://bitcointalk.org/index.php?topic=93606.msg1607170#msg1607170)


Title: Re: Blockchain Compression
Post by: Mike Hearn on July 03, 2013, 08:03:38 PM
hazek, you are not a developer so you really shouldn't get angry about technical topics. I already explained that block chain pruning will allow you to run a full node that uses an arbitrary (user specified) amount of disk space. So when the chain gets to be 55 gigs, OK, if you can spare 55 gigs then keep all of it. Otherwise keep 30 gigs of it. Or 10. Or none. Regardless of what you can afford, you will still be able to run a full node. You just won't be able to serve the whole chain to a new node that's starting from scratch (so someone, somewhere will have to keep a full copy of the chain, but that's fundamental to how Bitcoin works).

If you don't understand what I'm talking about, go do some research and come back when you do understand. Flaming people because you prefer to believe in conspiracy theories than learn hard technical facts just makes you come across as immature.

By the way, I've spent over two years implementing SPV mode so people can use the P2P network directly without having to download the block chain - and without relying on any trusted servers. Only a handful of people have put as much effort into keeping Bitcoin decentralised as I have. So by saying that I don't care about decentralisation you just sound even more uninformed.

Quote
Is there a summary of what is being considered here?

It'll be written up in BIP form soon. Until then, this link is the first result on Google for "bitcoin payment protocol":

https://github.com/bitcoin/bitcoin/pull/2539

Quote
I'd love to hear your technical arguments regarding that.

The idea was to put a commitment to the UTXO set into the coinbase transactions. The claim was this lets you run a full node without processing the full chain, and that it helps lightweight clients. The first claim is not true, which is why it's now crossed out in the original post.

https://bitcointalk.org/index.php?topic=88208.0

You can't have full node security by bootstrapping off a miner-majority consensus, that's SPV level security by definition. To bootstrap a full node in a trust free manner you have to process the entire chain, or get a copy of the database from somewhere you "know" is correct (i.e. another one of your own nodes).

The second claim that it helps lightweight clients also isn't really right. I've actually implemented lightweight mode (the one described by Satoshi at least) and UTXO commitments really change much, at least not fundamentally. You still need all the block headers to track the consensus, and at that point you may as well ask for transactions that match your keys at the same time using the existing merkle trees. A UTXO commitment in the coinbase might make things faster if you were to import a private key, at the cost of significantly increasing resource usage of every single node on the network. Given that private key import is (or should be) rare, this isn't a good tradeoff.

Fortunately, none of that is needed. Satoshi already described everything that is required to make Bitcoin scale in his white paper nearly 5 years ago. Most of it is already implemented. The ability to delete old blocks from the chain when you start running out of disk space is waiting merely for the protocol to be upgraded so nodes can advertise how many blocks they have, and for wallet apps to be upgraded so they find nodes that still have the range of blocks they need to catch up with.


Title: Re: Blockchain Compression
Post by: hazek on July 03, 2013, 08:49:04 PM
I apologize for my rudeness, my frustration got the best of me.

I already explained that block chain pruning will allow you to run a full node that uses an arbitrary (user specified) amount of disk space.

My complaint was that implementing it is not a priority when it should be and that implementing it too late might have grave consequences.

By the way, I've spent over two years implementing SPV mode so people can use the P2P network directly without having to download the block chain - and without relying on any trusted servers. Only a handful of people have put as much effort into keeping Bitcoin decentralised as I have.

Awesome job. However how does the use of SPV clients prevent centralization of the Bitcoin network? Are SPV clients equally sovereign when it comes to deciding which blockchain, which new blocks and transactions are valid?

If you read back my post, you will notice that I called you a brilliant programmer because I really think you are one. And I never said you are not putting in a significant effort in order to improve Bitcoin. My complaint was that the development that you are contributing isn't the development that Bitcoin crucially needs in order to remain decentralized and prosper. I mean unless I'm mistaken about SPV clients in which case please correct me.


p.s.: while I admit that the tone of my post was not entirely constructive, you definitely didn't rise above me by calling my a conspiracy theorist


Title: Re: Blockchain Compression
Post by: Mike Hearn on July 03, 2013, 09:31:49 PM
Well Gavin, at least for now, isn't too worried about disk space usage.  It'd only take a couple of months to fully implement the last parts of pruning if he worked on it seriously, and it's hard to imagine the block chain doubling in size in only a few months. That'd be a nice problem to have. Even then, 16 GB isn't all that much. I have 182 GB free on my computer and it's a pretty ordinary laptop.

At the moment Gavin's biggest worry is security. The payment protocol is designed to complement the Trezor project and give us better security than the best banks. Also, as I said, Pieter has said he wants to work on pruning, so that might be another reason Gavin's not prioritising it at the moment. Pieter has a full time job these days so it's harder for him to find the time, but even so, he's done amazing work on scalability and I'm sure he'll be able to finish off the pruning work some time this year. I'd guess the block chain will be under 12 gigs by the time pruning starts. So no risk of a crisis.

In short, yes it's important and will happen, but we're not going to see Bitcoin collapse if it doesn't happen this year. The numbers just don't work out that way.

Quote
Awesome job. However how does the use of SPV clients prevent centralization of the Bitcoin network? Are SPV clients equally sovereign when it comes to deciding which blockchain, which new blocks and transactions are valid?

I guess that's a rhetorical question? Anyway, at least for other people who are reading, no as described in Satoshi's paper they only validate the block headers. So they pick the hardest chain and then assume the contents are likely to be correct. You can't mine with them but in practice "best chain is correct" is usually good enough, at least for normal individual users.

If you can afford a full node, that's better. But most casual users won't want to run one. The next most decentralised after that is SPV. No central or trusted servers. You place your trust in the majority consensus instead.

If we didn't have SPV clients then Bitcoin usage would be dominated by services like blockchain.info or Coinbase. Both great companies and sites, but ultimately people would end up putting their money into "BitBanks" and relying on third party companies. Those organisations would then either get regulated out of existence, or become real banks and start lending out their bitcoins at interest. In some ways Bitcoin usage is already dominated by these sorts of wallets and it's not even hard to run bitcoin-qt today, so you can imagine how common it'd be in future with much larger blocks. That's why my no 1 priority is to try and build SPV wallets that are competitive with centralised bank-like services, to avoid that future and keep people using the P2P network directly without any middlemen (or more technically, with only p2p nodes and miners as middlemen).

Re: conspiracy theories, yes, sorry. Quite often people on this forum claim that because I work for Google, I must have some incentive to try and centralise Bitcoin or <insert theory here>. It's nonsense of course, Google is focused on Google Wallet and hardly cares about Bitcoin. In fact Google not only employs me but also Pieter, who has done more work on pruning than anyone. It's about as relevant as the fact that Jeff once worked for Red Hat.


Title: Re: Blockchain Compression
Post by: maaku on July 03, 2013, 09:32:19 PM
You can't have full node security by bootstrapping off a miner-majority consensus, that's SPV level security by definition. To bootstrap a full node in a trust free manner you have to process the entire chain, or get a copy of the database from somewhere you "know" is correct (i.e. another one of your own nodes).

Right now the choices are SPV or full node. The misnamed “ultimate blockchain compression” proposal would enable a wide spectrum of security modes in-between the two. Initial synchronization requires SPV level of trust in the checkpointed UTXO set. However from that point forward full validation can be performed, and history can be incrementally validated as far back as required (to the last checkpoint, for example, or the full history). The ability to construct fraud proofs provides an additional layer of security as it provides a compact way to inform other peers of an attack-in-progress. These are valuable options that enable secure limited-trust operation on constrained devices, or a graduated level of security as synchronization is performed.

The second claim that it helps lightweight clients also isn't really right. I've actually implemented lightweight mode (the one described by Satoshi at least) and UTXO commitments really change much, at least not fundamentally. You still need all the block headers to track the consensus, and at that point you may as well ask for transactions that match your keys at the same time using the existing merkle trees. A UTXO commitment in the coinbase might make things faster if you were to import a private key, at the cost of significantly increasing resource usage of every single node on the network. Given that private key import is (or should be) rare, this isn't a good tradeoff.

Mike, you seem content to trust that peers will send you transactions matching your bloom filter. I do not find that reasoning compelling. The UTXO index will give a short, verifiable proof that matching transactions are or are not reported.


Title: Re: Blockchain Compression
Post by: Mike Hearn on July 03, 2013, 09:41:46 PM
Sure, you can start with SPV security and then convert yourself to full security, but that could be done today with no protocol changes. Pieter calls this "headers first mode". Over time I became less convinced it was a good idea to convert between them like that though. I used to think it was obvious but eventually concluded users should make an explicit choice to run a full node. Once they made that choice it doesn't really matter how long it takes to bootstrap, they're in it for the long haul anyway.

For a node that does a selective DoS on Bloom filtering, have you actually observed anyone do such a thing? What would be the motive and why is querying multiple nodes not enough to fix that?

I can see that with a properly constructed set of Merkle trees a remote node could provide a proof that it hasn't excluded any transactions, assuming you trust the last block. But most real wallets want to know about pending transactions too, and you still need the Bloom filtering for that. So the cost/benefit analysis for using address index commitments to avoid malicious Bloom filter drops feels rather questionable to me.

For fraud proofs, I think d'aniel raised that before, I'm afraid I don't remember which proofs require the commitments. Double spends don't. At any rate, whilst those proofs would indeed be useful they weren't the rationale given for "ultimate blockchain compression" originally. A full design doc for different kinds of fraud proofs would be useful.


Title: Re: Blockchain Compression
Post by: TierNolan on July 03, 2013, 09:47:47 PM
It'll be written up in BIP form soon. Until then, this link is the first result on Google for "bitcoin payment protocol":

https://github.com/bitcoin/bitcoin/pull/2539

Heh, thanks, I guess I should have Googled.


Title: Re: Blockchain Compression
Post by: maaku on July 03, 2013, 10:19:17 PM
Mike, you're putting up strawmen with your insistence on Alan's original rationale(s). Authenticated, committed UTXO index structures provide a variety of new capabilities some of which Alan et al may not have anticipated. You seem to think these new capabilities are not useful, or at least do not justify their cost, for unstated reasons that you think are obvious. I disagree.

For example, it does matter how long it takes to bootstrap when you're talking about initial user experience, or a user who would prefer to run a full node but not halt operations while syncing. There's a wide variety of reasons people run bitcoin nodes, and I see any reason to expect the associated security requirements to neatly fall into one or two different models.

As for a selective DoS on Bloom filtering, no I haven't observed such a thing, nor have I been looking either. But that's completely beside the point: in the security business it is our job to act preemptively and close attack vectors, ideally before the black-hat work is done to exploit them. As to querying multiple nodes, you are ignoring network operators who could filter or replace protocol message. The index structure, on the other hand, would allow authenticated negative responses as well, so the querying node won't be happy until it receives a proof or presence OR absence.

No one's arguing for UTXO indices over bloom filters. They each have their uses.


Title: Re: Blockchain Compression
Post by: hazek on July 03, 2013, 10:33:38 PM
Pieter has a full time job these days so it's harder for him to find the time, but even so, he's done amazing work on scalability and I'm sure he'll be able to finish off the pruning work some time this year

I just want it done already and not potentially see it run into a huge problem that could blow up in all of our faces when we'd most need the solution finished hence why I think it should have priority.

Quite often people on this forum claim that because I work for Google, I must have some incentive to try and centralise Bitcoin

I don't think that at all. What I do think is that working for Google has affected you, I'm sure you'll admit it changed how you view certain things, I mean how could it not.., and what I think is it affected you in such a way where your vision and priorities for Bitcoin and my vision and priorities for Bitcoin differ significantly. That's all. - And of course I could be completely off base here, something I'd be really happy to be corrected on.


Title: Re: Blockchain Compression
Post by: Mike Hearn on July 03, 2013, 10:54:17 PM
I guess I have a few concerns with your utxo commitments project.

The first problem is that it still comes with the ultimate blockchain compression name. That's the reason I'm "insisting" on the original rationale. You've said yourself you know that's wrong and decided to keep it anyway - now look at the first posts in this thread. The name is misleading people into thinking it has something to do with reducing the disk space usage of running a full node. But because you don't actually have a full node if you start from a miner-consensus utxo set, that's not what it does. If you go ahead and rebuild the same database in parallel, that's fine, but unless pruning is implemented you'd eventually end up with the same disk space usage as today (well, more as you need space for the extra indexes).

The second problem is that you don't need to make these changes to have immediate initial startup. SPV wallets can already do the initial sync in a few seconds. If you want to run a full node too, just run both MultiBit and bitcoind in parallel until the latter is ready. If you want it to be seamless, just bundle bitcoind with a MultiBit and make it connect to your new local node once it's finished syncing. SPV+local full node = full node security, except that it's a very simple implementation, with the advantage that you can seamlessly switch back and forth afterwards.

I'm not actually convinced it's a good idea though. The user experience of having a full node randomly start syncing in the background because something decided you had enough cpu/disk space is very poor. If the user ever shuts down their app they'll be surprised to discover that next time they start it up, it's a long way behind and takes ages to catch up. If the user is going to make an explicit decision to run a full node, they can as well just download Bitcoin-Qt and run it themselves - that's really a tiny action compared to the ongoing cost of being a full node.

You say, start from a miner-majority commitment to a UTXO set and do full verification from that point on whilst bringing up a full node in parallel. But is this small window of time where you get marginally better security for the small number of users who would run a full node worth all the extra overhead? It doesn't feel like it. Building the full indexes needed to calculate all those merkle trees has quite some overhead and every node would have to do it, or at least every mining node. That would raise the cost of running a node and require us to redo all the scalability calculations that were already done.

The final issue I have is that you've raised a fair bit of money to work on it, but I didn't see where you got consensus that it should be merged into existing codebases. Perhaps I missed it, but I don't think Gavin said he  agreed with the necessary protocol changes. He may well have accepted your design in some thread I didn't see though.


Title: Re: Blockchain Compression
Post by: 2112 on July 03, 2013, 11:52:28 PM
If you go ahead and rebuild the same database in parallel, that's fine, but unless pruning is implemented you'd eventually end up with the same disk space usage as today (well, more as you need space for the extra indexes).
I don't know if Mike Hearn doesn't know it or just pretends to doesn't know. But it clearly fits his pattern of producing misdirecting arguments. He applied nearly the same misdirection almost a year ago in the "[POLL] Multi-sig or scalability--which is more pressing?" thread.

https://bitcointalk.org/index.php?topic=94453.msg1046473#msg1046473

Full undo/redo logs for a UTxO database will indeed consume the same (or slightly higher) amount of storage as a raw blockchain storage. But unlike raw blockchain storage undo/redo logs are temporally clustered. The older and older transaction logs can be put in a slower and slower storage. All good DBMS-es have a way of efficiently backing up old transaction logs and restoring them on demand.

This isn't a place to repeat the same old set of arguments that led to creation of the database management systems as an efficient way to exploit temporal clustering in the data sets. Any decent DBMS textbook will have them.


Title: Re: Blockchain Compression
Post by: Peter Todd on July 04, 2013, 01:51:52 AM
The final issue I have is that you've raised a fair bit of money to work on it, but I didn't see where you got consensus that it should be merged into existing codebases. Perhaps I missed it, but I don't think Gavin said he  agreed with the necessary protocol changes. He may well have accepted your design in some thread I didn't see though.

My conversations with Gavin in the past were that UTXO commitments and UTXO fraud proofs are a necessary precondition to raising the blocksize because without them you have no way of knowing if miners are committing inflation fraud and no way of proving that to others. Other core dev's like Gregory Maxwell share that opinion. If anyone succeeds in creating a robust implementation doing a PR campaign to educate people about the risks of not making it part of a blocksize increase is something I would be more than happy to be involved with, and I'm sure you realize it'll be an even easier message to get across than saying limiting the blocksize is a good thing.

Who doesn't want auditing, fraud prevention and fresh home-made all-American apple pie?

One catch is that unfortunately UTXO commitments are in themselves very dangerous in that they allow you to safely mine without actually validating the chain at all. What's worse is there is a strong incentive to build the code to do so because that capability can be turned into a way to do distributed mining, AKA P2Pool, where every participate has low bandwidth requirements even if the overall blockchain bandwidth requirement exceeds what the participants can keep up with. If miners start doing this, perhaps because mining has become a regulated activity and people want to mine behind Tor, it'll reduce the threshold for what is a 51% attack by whatever % of hashing power is doing mining without validation; we're going to have to require miners to prove they actually posses the UTXO set as part of the UTXO-commitment system. That said I'm pretty sure it's possible to allow those possession proofs to be done in a way where participants can maintain only part of the UTXO set, and thus part of the P2P relay bandwidth, allowing low-bandwidth true mining + larger blocksizes without decreasing the 51% attack threshold and helping solve the censorship problem and keeping thus Bitcoin decentralized. The devil is in the details, but at worst it can be done as a merge-mined alt-coin.

Of course that still doesn't solve the fundamental economic problem that there needs to be an incentive to mine in the first place, but an alt-coin with sane long-term economic incentives (like a minimum inflation rate) can always emerge from Bitcoin's ashes after Bitcoin is destroyed by 51% attacks and such an alt-coin can be bootstrapped by having participants prove their possession or sacrifice of Bitcoin UTXO's to create coins in the "Bitcoin 2" alt-coin, providing a reasonable migration path and avoiding arguments about early-adopters. Similarly adding proof-of-stake to the proof-of-work can be done that way. (note that proof-of-stake may be fundamentally incompatible with SPV nodes; right now that's an open research question) None of this is an issue in the short-term anyway as the subsidy inflation won't hit 1% until around 2032 by the very earliest, even later than that in real terms depending on how many coins turn out to be lost. Applications like fidelity bonds and merge-mined alt-coins that all subsidize mining will help extend the economics incentives to mine longer as well.

Yes, I'm less worried about the blocksize question these days because I'm fairly confident there can exist a decentralized cryptocurrency in the future, but I don't know and don't particularly care if Bitcoin ends up being that currency. I'll also point out that if Bitcoin isn't that currency, the alt-coin that is can be created in a way that destroys Bitcoin itself, for instance by making the PoW be not only compatible with Bitcoin, but by requiring miners to create invalid, or better yet, valid but useless Bitcoin blocks so that increased adoption automatically damages Bitcoin. For instance a useless block can be empty, or just filled with low or zero fee transactions for which the miner can prove they posses the private keys too; blocking non-empty blocks would require nodes to not only synchronize their mempools but also adopt a uniform algorithm for what transactions they must include in a block for the block to be relayed and any deviation there can be exploited to fork Bitcoin. (another argument for requiring miners to prove UTXO set possession) Also note how a 51% attack by Bitcoin miners on the alt-coin becomes an x% attack on Bitcoin. Sure you can mess with the Bitcoin rules to respond, but it becomes a nasty cat-and-mouse game...


Title: Re: Blockchain Compression
Post by: d'aniel on July 04, 2013, 02:24:42 AM
For fraud proofs, I think d'aniel raised that before, I'm afraid I don't remember which proofs require the commitments. Double spends don't. At any rate, whilst those proofs would indeed be useful they weren't the rationale given for "ultimate blockchain compression" originally. A full design doc for different kinds of fraud proofs would be useful.
Actually, none of the fraud proofs/challenges I mentioned in that thread relied on utxo set commitments.  Transactions with nonexistent txins would benefit from them, since then there could be concise proofs of nonexistent txins, but the way I described it, a peer would issue a challenge to find a valid Merkle branch to an allegedly nonexistent txin from some other peer.  If at least one peer is honest (which Bitcoin generally assumes anyway), then they will find the branch if the challenger turned out to be lying, and can ignore that him going forward.

Partially verifying nodes is what I ended up being interested in utxo set commitments for, and they seem to be really useful if/when we get efficient verifiable computing.  Though there definitely isn't any near-term need for this.


Title: Re: Blockchain Compression
Post by: Peter Todd on July 04, 2013, 03:03:10 AM
For fraud proofs, I think d'aniel raised that before, I'm afraid I don't remember which proofs require the commitments. Double spends don't. At any rate, whilst those proofs would indeed be useful they weren't the rationale given for "ultimate blockchain compression" originally. A full design doc for different kinds of fraud proofs would be useful.
Actually, none of the fraud proofs/challenges I mentioned in that thread relied on utxo set commitments.  Transactions with nonexistent txins would benefit from them, since then there could be concise proofs of nonexistent txins, but the way I described it, a peer would issue a challenge to find a valid Merkle branch to an allegedly nonexistent txin from some other peer.  If at least one peer is honest (which Bitcoin generally assumes anyway), then they will find the branch if the peer turned out to be lying, and can ignore that peer going forward.

Yeah, you can go very far without UTXO set commitments, but without them your scenario only works if you assume your peers are full-nodes. If you are an SPV node with SPV peers - a completely valid scenario that we will need in the future and one that is useful with payment protocols - you're stuck and can't do anything with the fraud proofs.

The other issue is that only UTXO set commitments can prove inflation fraud without a copy of the blockchain, IE miners deciding to changing the subsidy and fee rules and create coins out of thin air.


Title: Re: Blockchain Compression
Post by: maaku on July 04, 2013, 04:08:09 AM
Actually, none of the fraud proofs/challenges I mentioned in that thread relied on utxo set commitments.  Transactions with nonexistent txins would benefit from them, since then there could be concise proofs of nonexistent txins, but the way I described it, a peer would issue a challenge to find a valid Merkle branch to an allegedly nonexistent txin from some other peer.  If at least one peer is honest and the network operator (which Bitcoin generally assumes anyway), then they will find the branch if the challenger turned out to be lying, and can ignore that him going forward.

Fixed that for you. Without nonexistance proofs, network operator attacks are trivial.


Title: Re: Blockchain Compression
Post by: d'aniel on July 04, 2013, 04:12:09 AM
Yeah, you can go very far without UTXO set commitments, but without them your scenario only works if you assume your peers are full-nodes. If you are an SPV node with SPV peers - a completely valid scenario that we will need in the future and one that is useful with payment protocols - you're stuck and can't do anything with the fraud proofs.
I don't follow.  How could an SPV node function with only SPV peers?  How would it get the tx data that it needs?

Quote
The other issue is that only UTXO set commitments can prove inflation fraud without a copy of the blockchain, IE miners deciding to changing the subsidy and fee rules and create coins out of thin air.
I addressed this as well in that thread: https://bitcointalk.org/index.php?topic=131493.msg1407971#msg1407971 (https://bitcointalk.org/index.php?topic=131493.msg1407971#msg1407971).  The Merkle tree of transactions is used to authenticate the maximum fee reward calculation in each block.  It didn't seem to require utxo set commitments.



Title: Re: Blockchain Compression
Post by: d'aniel on July 04, 2013, 04:19:29 AM
Actually, none of the fraud proofs/challenges I mentioned in that thread relied on utxo set commitments.  Transactions with nonexistent txins would benefit from them, since then there could be concise proofs of nonexistent txins, but the way I described it, a peer would issue a challenge to find a valid Merkle branch to an allegedly nonexistent txin from some other peer.  If at least one peer is honest and the network operator (which Bitcoin generally assumes anyway), then they will find the branch if the challenger turned out to be lying, and can ignore that him going forward.

Fixed that for you. Without nonexistance proofs, network operator attacks are trivial.
Okay, so the network operator could mislead a node onto his invalid chain by handing it fake fraud challenges.  Can't he do this regardless, by simply refusing to relay valid block headers?


Title: Re: Blockchain Compression
Post by: Peter Todd on July 04, 2013, 04:26:48 AM
I don't follow.  How could an SPV node function with only SPV peers?  How would it get the tx data that it needs?

Payment protocols.

Anyway the whole idea of just assuming SPV nodes are going to be able to get the tx data they need from peers for free is really flawed. Running bloom filters against the blockchain data costs resources and at some point people aren't going to be willing to do that for free.

I've got writing "tit-for-tat" peering on my todo list actually: IE silently fail to relay transactions to peers if they don't relay enough new and valid transactions to you for being leeches on the network. Pure SPV clients, like Android clients, can pay for the resources they consume via micropayment channels or at least proof-of-work. It'd also prevent attackers from DoSing the network by making large numbers of connections to large numbers of nodes to fill the incoming peer slots of nodes on the network; you need very little in the way of resources to pull off that attack right now.

I addressed this as well in that thread: https://bitcointalk.org/index.php?topic=131493.msg1407971#msg1407971.  The Merkle tree of transactions is used to authenticate the maximum fee reward calculation in each block.  It didn't seem to require utxo set commitments.

That's a good point, a changed merkle tree can achieve that too. However maaku is quite correct that network operator attacks are trivial without UTXO set commitments, whereas with them the worst a network operator can do is prevent you from getting a fraud proof message; creating fake confirms is extremely expensive.


Title: Re: Blockchain Compression
Post by: Peter Todd on July 04, 2013, 04:30:10 AM
Okay, so the network operator could mislead a node onto his invalid chain by handing it fake fraud challenges.  Can't he do this regardless, by simply refusing to relay valid block headers?

Among other things there is the problem that without the way to prove that a txout doesn't exist the network operator can prevent a SPV node from ever knowing that they have been paid and there is nothing the SPV node can do about it.


Title: Re: Blockchain Compression
Post by: d'aniel on July 04, 2013, 04:49:10 AM
I don't follow.  How could an SPV node function with only SPV peers?  How would it get the tx data that it needs?

Payment protocols.

Anyway the whole idea of just assuming SPV nodes are going to be able to get the tx data they need from peers for free is really flawed. Running bloom filters against the blockchain data costs resources and at some point people aren't going to be willing to do that for free.

I've got writing "tit-for-tat" peering on my todo list actually: IE silently fail to relay transactions to peers if they don't relay enough new and valid transactions to you for being leeches on the network. Pure SPV clients, like Android clients, can pay for the resources they consume via micropayment channels or at least proof-of-work. It'd also prevent attackers from DoSing the network by making large numbers of connections to large numbers of nodes to fill the incoming peer slots of nodes on the network; you need very little in the way of resources to pull off that attack right now.
Ways to reduce the load off of full nodes, and pay for their efforts seems like good ideas.  But do you expect some SPV nodes to not be able to connect to any full nodes at all at some point in the future?  If there's a market for the service, then surely they will always be able to find some willing to provide it, especially if their security depends on it.

Among other things there is the problem that without the way to prove that a txout doesn't exist the network operator can prevent a SPV node from ever knowing that they have been paid and there is nothing the SPV node can do about it.
That sounds a bit obnoxious, sure, but is it really that big a problem?


Title: Re: Blockchain Compression
Post by: Peter Todd on July 04, 2013, 05:16:22 AM
I don't follow.  How could an SPV node function with only SPV peers?  How would it get the tx data that it needs?

Payment protocols.

Anyway the whole idea of just assuming SPV nodes are going to be able to get the tx data they need from peers for free is really flawed. Running bloom filters against the blockchain data costs resources and at some point people aren't going to be willing to do that for free.

I've got writing "tit-for-tat" peering on my todo list actually: IE silently fail to relay transactions to peers if they don't relay enough new and valid transactions to you for being leeches on the network. Pure SPV clients, like Android clients, can pay for the resources they consume via micropayment channels or at least proof-of-work. It'd also prevent attackers from DoSing the network by making large numbers of connections to large numbers of nodes to fill the incoming peer slots of nodes on the network; you need very little in the way of resources to pull off that attack right now.
Ways to reduce the load off of full nodes, and pay for their efforts seems like good ideas.  But do you expect some SPV nodes to not be able to connect to any full nodes at all at some point in the future?  If there's a market for the service, then surely they will always be able to find some willing to provide it, especially if their security depends on it.

Among other things there is the problem that without the way to prove that a txout doesn't exist the network operator can prevent a SPV node from ever knowing that they have been paid and there is nothing the SPV node can do about it.
That sounds a bit obnoxious, sure, but is it really that big a problem?

Yes: it allows someone to claim they have funds that they have since spent.

Also look at in the broader sense: if you do have UTXO proofs an SPV node can pay for a full-node connection and SPV-related services, either with real funds or a anti-DoS proof-of-work, and be sure that the node is being honest and they are getting accurate data with nothing more than a source of block header info. (relaying of block headers between SPV nodes is something I'm also planning on implementing)


Title: Re: Blockchain Compression
Post by: d'aniel on July 04, 2013, 06:20:07 AM
That sounds a bit obnoxious, sure, but is it really that big a problem?
Yes: it allows someone to claim they have funds that they have since spent.
So in the event that you are connecting to the same network that someone spending coins to you controls or is colluding with the controller of, then you can get screwed.  I agree, that is a sore spot.  But wouldn't it look suspicious if on some new network you weren't able to find any recognizable peers to connect to?

Quote
Also look at in the broader sense: if you do have UTXO proofs an SPV node can pay for a full-node connection and SPV-related services, either with real funds or a anti-DoS proof-of-work, and be sure that the node is being honest and they are getting accurate data with nothing more than a source of block header info. (relaying of block headers between SPV nodes is something I'm also planning on implementing)
To be fair, all the data is provably accurate, you just don't know if you're being told the whole story.

There are definitely some benefits to this, but there are also costs, and I'm just wondering if there are perhaps other cheaper ways to get practically the same benefits.  Thankfully maaku will be providing us with a sense of the costs pretty soon.


Title: Re: Blockchain Compression
Post by: TierNolan on July 04, 2013, 09:49:02 AM
The second problem is that you don't need to make these changes to have immediate initial startup. SPV wallets can already do the initial sync in a few seconds. If you want to run a full node too, just run both MultiBit and bitcoind in parallel until the latter is ready. If you want it to be seamless, just bundle bitcoind with a MultiBit and make it connect to your new local node once it's finished syncing. SPV+local full node = full node security, except that it's a very simple implementation, with the advantage that you can seamlessly switch back and forth afterwards.

The reference client should operate that way anyway.

- download headers
- verify scan backwards [ * ]

The client could give the length of the history that has been confirmed.

It could say that it has verified all blocks back to the inputs to all your transactions and how many total blocks have been scanned.

Each transaction could go through different states
0) not synced (RED)
1) scanned back to earliest input and 1k additional blocks (ORANGE)
2) scanned back to a checkpoint (YELLOW)
3) synced back to genesis block (GREEN)

[ * ] The client would have to keep a hashmap of all transactions that have occured in later blocks, so it can detect a double spend.  Effectively, it would be an "unfunded input set" rather than an "unspent output set".

Even level 1 security would be pretty rock solid.  It would require a 1k re-org to be violated.

Quote
I'm not actually convinced it's a good idea though. The user experience of having a full node randomly start syncing in the background because something decided you had enough cpu/disk space is very poor.

You could add CPU limits, so that it never uses more than 10% of any CPU (do 5ms of work and then sleep for 50ms).  Ideally, with a distributed verification engine, 10% of everyone's CPUs would be more than enough to verify the chain.


Title: Re: Blockchain Compression
Post by: TierNolan on July 04, 2013, 10:44:11 AM
One catch is that unfortunately UTXO commitments are in themselves very dangerous in that they allow you to safely mine without actually validating the chain at all.

You still need the UTXO tree though.  Updating the tree requires that you process the entire block.

Quote
That said I'm pretty sure it's possible to allow those possession proofs to be done in a way where participants can maintain only part of the UTXO set, and thus part of the P2P relay bandwidth, allowing low-bandwidth true mining + larger blocksizes without decreasing the 51% attack threshold and helping solve the censorship problem and keeping thus Bitcoin decentralized. The devil is in the details, but at worst it can be done as a merge-mined alt-coin.

Ideally, there would be some way to merge multiple sub-blocks into a higher level block.

For example, someone publishes a merkle root containing only transactions which start with some prefix.  A combiner could take 16 of those any produce a higher level merkle root.  The prefix for the combined root would be 4 bits less.

Effectively, you would have 16 sub-chains, where each sub-chain deals with transactions that start with a particular prefix.  Each of those chains could have sub-chains too.

The only way to make it work with Bitcoin would be to have a defined number of transactions per block.  You can only combine two merkle trees into 1 merkle tree if both children have the same number of transactions.

There would also need to be some kind of trust system for the combiners.  If you have to verify, then it defeats the purpose.

A "trusted-identity" would certify the combination (and get fees).  If proof is given that the claim is false, then that id is eliminated.


Title: Re: Blockchain Compression
Post by: Peter Todd on July 04, 2013, 11:31:34 AM
One catch is that unfortunately UTXO commitments are in themselves very dangerous in that they allow you to safely mine without actually validating the chain at all.

You still need the UTXO tree though.  Updating the tree requires that you process the entire block.

That's not true unfortunately. You'll need to do some extra UTXO queries strategically to get the right information on the branches next to the transaction being changed, but with that information you have enough to calculate the next UTXO tree state without having any of the data. This is inherent to having a fast UTXO commitment system in the first place because part of making it cheap to provide those commitments is keeping the data changed for every new block at a minimum - exactly opposite the need to have miners prove they actually have the UTXO set at all.

Ideally, there would be some way to merge multiple sub-blocks into a higher level block.

For example, someone publishes a merkle root containing only transactions which start with some prefix.  A combiner could take 16 of those any produce a higher level merkle root.  The prefix for the combined root would be 4 bits less.

Effectively, you would have 16 sub-chains, where each sub-chain deals with transactions that start with a particular prefix.  Each of those chains could have sub-chains too.

The only way to make it work with Bitcoin would be to have a defined number of transactions per block.  You can only combine two merkle trees into 1 merkle tree if both children have the same number of transactions.

There would also need to be some kind of trust system for the combiners.  If you have to verify, then it defeats the purpose.

A "trusted-identity" would certify the combination (and get fees).  If proof is given that the claim is false, then that id is eliminated.

You're missing a key point: transactions can touch multiple parts of the UTXO set so once the UTXO set is split into subsets participants validate (at minimum) pairs of those subsets.


Title: Re: Blockchain Compression
Post by: TierNolan on July 04, 2013, 11:43:11 AM
That's not true unfortunately. You'll need to do some extra UTXO queries strategically to get the right information on the branches next to the transaction being changed, but with that information you have enough to calculate the next UTXO tree state without having any of the data.

Right. 

In fact, you could require all the extra data be attached to the transaction.

Quote
You're missing a key point: transactions can touch multiple parts of the UTXO set so once the UTXO set is split into subsets participants validate (at minimum) pairs of those subsets.

I was thinking that you share out the transactions based on the transaction's own hash.  However, you are right, to actually verify, you need all the info.

In theory, transactions could be passed between the 16 sub-chains, and you need a verification for each of the 16 sub-chains to be actually verified.

You could have 16 blocks per actual "full" block.  A sub-block would contain

- all transactions from the previous block, except
-- Any transactions added 16 blocks previously (they count as confirmed)
-- Any transactions that have inputs from transactions in this sub-chain, but are invalid
- Any new transactions for this sub-set

This is similar to my suggestion to have odd/even transaction rules.  Basically, you guarantee that the next block won't alter certain aspects of the UTXO set.

This allows the next block to be prepared ahead of time.


Title: Re: Blockchain Compression
Post by: d'aniel on July 04, 2013, 09:51:08 PM
Partially verifying nodes is what I ended up being interested in utxo set commitments for, and they seem to be really useful if/when we get efficient verifiable computing.  Though there definitely isn't any near-term need for this.
Re: verifiable computing and utxo set commitments, I noticed that a commitment of the utxo set digest into the block header wouldn't actually be necessary since you'd already have a succinct proof of its validity which is much stronger than simple inclusion in a block header.  Though the authenticated utxo set is still necessary for being able to turn blockchain verification into verifying a chain of succinct verifiable computing proofs(?), and for knowing your copy of the utxo (sub)set is valid at any given time.


Title: Re: Blockchain Compression
Post by: Mike Hearn on July 05, 2013, 08:38:31 AM
It's not quite that simple - to combine provable computations with blocks you could "just" provide the proofs to the clients that verify them, which indeed is a huge upgrade over SPV security assuming the verifications are sufficiently cheap. However as all such schemes require encoding of the entire input to the function and have complexity related to the size of the inputs, the right way to do it is to have the provable program actually work with a UTXO merkle tree and output deltas to that tree. At least, that's the scheme we came up with when I discussed this with Bryan Parno.

If someone wanted to start calculating UTXO merkle trees in order to experiment with provable verification, then I'd think that was cool but it's a long way from "we should actually make this a part of Bitcoin". See that as zerocoin style research. Cutting edge stuff that has great potential but it's not something for todays world.


Title: Re: Blockchain Compression
Post by: TierNolan on July 05, 2013, 08:56:55 AM
I think also there could be a difference between verification for mining and general verification.

The problem with mining is that you need verification to be extremely quick.  A distributed system (or at least a p2p distributed system) isn't likely to be able to keep up with a centralised system.

However, with bad block proofs, a distributed system may be able to detect a bad block within 5-10 blocks.  This wouldn't be sufficient to mine, but would be sufficient to keep miners honest.

Ofc, there is still the problem of proving that data is missing.


Title: Re: Blockchain Compression
Post by: piotr_n on July 05, 2013, 07:09:14 PM
You know I really don't care if that's true. Because what I do know is that it's wasting time on a problem that we might have over spending vital time we don't have on a problem that we have RIGHT NOW which frustrates me to no end.
I understand you very well, but all I can tell you is: get used to this - it's normal. They only fix/optimize things if people have reported (and proven) an issue.
And since by their definition an uncompressed blockchain data is not an issue, they just keep following the usual road map, which is always about adding over-complicated things that:
1) nobody really needs, nor they help him with anything
2) can be as well implemented outside the bitcoin node

But try to explain them that no sane bitcoin user seeks a technology that requires him to submit all his personal details (full name, address, etc.) to a certificate issuing authority, in order to receive a bitcoin... and good luck with it!
They already found it important and they are going to spend 12 months while nothing else matters implementing and testing it, repeating that this is very important and widely requested feature - there is nothing any of us can do about it, except maybe laughing on how the suckers waste their time by burying a complex code of unnecessary features inside a so crucial piece of software that a bitcoin core is. So yeah, at least it's funny :)


Title: Re: Blockchain Compression
Post by: IIOII on July 06, 2013, 06:32:37 PM
You know I really don't care if that's true. Because what I do know is that it's wasting time on a problem that we might have over spending vital time we don't have on a problem that we have RIGHT NOW which frustrates me to no end.
I understand you very well, but all I can tell you is: get used to this - it's normal. They only fix/optimize things if people have reported (and proven) an issue.
And since by their definition an uncompressed blockchain data is not an issue, they just keep following the usual road map, which is always about adding over-complicated things that:
1) nobody really needs, nor they help him with anything
2) can be as well implemented outside the bitcoin node

But try to explain them that no sane bitcoin user seeks a technology that requires him to submit all his personal details (full name, address, etc.) to a certificate issuing authority, in order to receive a bitcoin... and good luck with it!
They already found it important and they are going to spend 12 months while nothing else matters implementing and testing it, repeating that this is very important and widely requested feature - there is nothing any of us can do about it, except maybe laughing on how the suckers waste their time by burying a complex code of unnecessary features inside a so crucial piece of software that a bitcoin core is. So yeah, at least it's funny :)


You're so right.

Instead of improving core functionalities, Bitcoin gets bloated with features that are optional "nice-to-haves" which could be more efficiently solved by third party services.

It's false to claim that blockchain size isn't a problem. Every single bitcoin user I know complains about blockchain download times. Implementing pruning would be a great leap forward in usability.


Title: Re: Blockchain Compression
Post by: mechs on July 06, 2013, 11:52:49 PM
It seems to me their is one quick solution which would be to simply compress the blockchain with a known encryption algorithm such as rar (I as an exercise was able to shrink mine to 65% of its original size) and then the client can decrypt in the memory in the fly as needed.  Hardly a very cpu intensive task for most modern processors and it can be made a feature that can be disabled at the user's option.  It would result in 35% faster downloads and 35% less disk space used at the cost of using the cpu a bit more and some more memory.

This is not a replacment at all for pruning, however this strikes me as fairly quick to implement and should be a part of any solution.  Is there any flaw here I do not see?


Title: Re: Blockchain Compression
Post by: piotr_n on July 07, 2013, 01:56:46 AM
Implementing pruning would be a great leap forward in usability.

Problem with pruning is that if all the nodes remove spent txs from their storage, who is going to serve the block chain? Charities? :)


Title: Re: Blockchain Compression
Post by: maaku on July 07, 2013, 04:28:59 AM
Implementing pruning would be a great leap forward in usability.

The problem isn't (yet) the size of the block chain on disk. 10GB isn't a lot of space. It's network bandwidth - downloading 10GB of data takes a while on any connection. Pruning would do nothing to fix that (you still have to download the blocks before selectively throwing portions away), and it may even make things worse since fewer peers will have the full blockchain to seed from.


Title: Re: Blockchain Compression
Post by: Mike Hearn on July 08, 2013, 12:25:26 PM
Yes, exactly. Pruning is not a bandwidth saver. Mostly when speed of chain download has been examined, it turned out that disk IO and CPU were the limits, not bandwidth. Mostly people could download the chain faster than Bitcoin can digest it. It might have changed a bit since 0.8.1+/ultraprune/leveldb/other optimisations, but you'd still rapidly get back to the point where disk and CPU are the limits rather than the network.

This is why it's so dangerous to throw around random assumptions about what changes should be made. You might end up not fixing the problem you wanted to fix.

If you ARE bandwidth limited, Jeff already runs a blockchain torrent. I don't remember if it's compressed but as it's just a file, it certainly could be. If you want to bootstrap a new node faster, downloading the torrent is one way to do it. You still have to build the indexes yourself of course, and that's going to take a long time no matter what.

For the vast majority of new users, the correct client to use is MultiBit, and since a few days ago that's now our recommendation on bitcoin.org. The Bitcoin-Qt client is now listed as an app you run if you want to help the network by donating resources. This is a scalable fix - MultiBit will always be fast no matter how popular Bitcoin gets, whereas anything you can do to Bitcoin-Qt either changes the security model to be more like MultiBits (in which case why bother) or will make a percentage improvement taking us from "slow" to "slow". Once you make an informed decision to donate your resources to the network over the long term, the exact amount of time it takes to catch up with the chain head becomes less relevant. You're going to run it for months we'd hope, so one or two days doesn't make a big difference.


Title: Re: Blockchain Compression
Post by: maaku on July 08, 2013, 05:06:19 PM
Mike, I will make one comment that if you are going to trust the checkpoint (which you do currently - validation in Bitcoin-Qt isn't performed until after the last checkpoint, right?) and if the checkpoint included an authenticated UTXO index hash under the soft-fork scenario, then you could just download that UTXO set and start verification from there, with *exactly* the same security as you would otherwise would have had. This let's us keep the "one to two days" to perform IBD to a small value, and not continue to increase over time.


Title: Re: Blockchain Compression
Post by: piotr_n on July 08, 2013, 05:16:06 PM
I agree.
If there are these checkpoints anyway, you can as well distribute a new version of the client with the content of the last checkpoints' UTXO, instead of only a block's hash.
This would make bootstrapping so much faster, but also require no storage for all the previous blocks, since any node would be able to download the last checkpoint's db content from the peers - with a checksum it would be equally "safe" as the checpoint values now, but so much more useful.
It's a bold move, but in fact not bolder than the idea of checkpoints itself.
And it would at least optimize something.
Though, again, who am I talking to + the idea of checkpoints is stupid and goes totally against the main purpose of the solution :)


Title: Re: Blockchain Compression
Post by: Peter Todd on July 08, 2013, 05:20:41 PM
Including a UTXO hash in the checkpoint makes for a much less secure model than not doing so.

With checkpoints the client only skips validating transactions, it still builds the UTXO set. You've still shown that an vast amount of hashing power work has been spent towards the blocks leading to the checkpoint, and validating that work, as well as the work after the checkpoint, is pretty easy. More importantly if the checkpoint is invalid and the majority of hashing power isn't following it you'll soon notice because your client doesn't see as many blocks as it should.

On the other hand a UTXO hash is trusting the developers to compute the UTXO set properly for you. They could easily leave UTXO's out and you would never know until your node tried to spend one of them and wound up forked from the main network, or with a majority of hashing power, someone found out their coins were now unspendable by developer fiat. Point being fraud on the part of the developers isn't detected automatically or immediately.

UTXO hashes will be acceptable once UTXO proofs are a requirement for a block, and you'll be able to reason about how much work has been done for any given UTXO proof, but as you know we're a long way from finishing that.


Title: Re: Blockchain Compression
Post by: piotr_n on July 08, 2013, 05:23:38 PM
With checkpoints the client only skips validating transactions, it still builds the UTXO set. You've still shown that an vast amount of hashing power work has been spent towards the blocks leading to the checkpoint, and validating that work, as well as the work after the checkpoint, is pretty easy. More importantly if the checkpoint is invalid and the majority of hashing power isn't following it you'll soon notice because your client doesn't see as many blocks as it should.
But if you are taking a piece of software where some devs have put a checkpoint, why would you trust them in a part of choosing a honest block hash, but not trust that they honestly verified the chain up to that point, difficulty check including?
It just doesn't make any sense to use this software to verify from a scratch a chain that can only be valid if it reaches a block with a specific hash - value fixed inside the same software that is doing (again) the blocks' verification, to reach the same hash - because only then it can work.
It's just a huge overhead.


Title: Re: Blockchain Compression
Post by: piotr_n on July 08, 2013, 05:32:03 PM
Maybe there should be something like a general solution, to compress it once and for all.
Its all about the most recent content of UTXO - which is not so big, comparing to the so much faster growing blockchain size.
Maybe there should be an automatic checkpoint (like a small-generic) each 2016 blocks or so (with a hash of the UTXO db content included inside the block) from which nodes would be able to start their existence, using the chosen hash as a trusted seed to quickly fetch entire UTXO, instead of re-doing the chain up to that point.
If that works, there would be no more need to store, nor to share, the entire blockchain - it could be stored in a museum, though :)


Title: Re: Blockchain Compression
Post by: piotr_n on July 08, 2013, 06:31:45 PM
Maybe actually it should be simpler.
Each miner would have a possibility to mine in a hash of the current utxo db - of course it needs to be sorted, or something, but wrong hash means block is invalid.
And then, by other means, not necessarily ones that need to be sponsored by bitcoin foundation, people would find a way to fetch these snapschot of utxo
as long as you add to the protocol a possibility to mine in hash to utxo and make sure that all the miners get it...
two years at least.
ten since it's my idea :)


Title: Re: Blockchain Compression
Post by: justusranvier on July 08, 2013, 07:50:03 PM
Though, again, who am I talking to + the idea of checkpoints is stupid and goes totally against the main purpose of the solution :)
I don't think there ultimate is a nice, clean, mathematically and cryptographically pure solution to the problem that will work in practise. We'll end up with a messy set of overlapping partial solutions that will provide sufficient security when balanced against usability.

The developers on GitHub could commit malicious checkpoints, but due to the nature of Git there's no way to hide what they've done. Anyone who's been running a full node can notice the discrepancy and spread the word via any number of communication platforms and bring enough scrutiny to the problem that everybody can manually figure out what's going on and apply a fix if necessary.

In the pure theoretical realm of solving the Byzantine Generals' Problem one doesn't normally include the existence of IRC, Twitter, Facebook, and Reddit as part of the solution, but in the real world relying on the ability of users to communicate out of band can sometimes be an acceptable way to resolve an attack. I'd like to see checkpoints distributed via as many mediums as possible, including print media, by as many different entities as possible to make it as difficult as possible for someone to distribute incorrect ones without being detected. We really want everything to work automatically, but it's good to have backup plans to fix things manually when something breaks.


Title: Re: Blockchain Compression
Post by: piotr_n on July 08, 2013, 08:02:34 PM
OK, agreed.
But since you already have checkpoints, why don't you get advantage of them?


Title: Re: Blockchain Compression
Post by: Mike Hearn on July 08, 2013, 08:03:49 PM
I'd actually be OK with including a database copy in the downloads, it's a much simpler and easier thing to do than complicated shared UTXO commitment schemes. You can't verify the database by hand, but you can't verify the binaries either and we obviously provide binaries. The same techniques could work fine - a bunch of different people all collaborate to do a threshold-signing of the included database just like they sign the binaries. Then anyone can tell their node to audit the developers just by letting it recalculate the database like anyone can audit the binaries using gitian.

However it seems I'm in the minority on that one. Whenever I've suggested including a database copy in the downloadable binaries lots of other people jumped up to disagree with me. So - SPV it is.


Title: Re: Blockchain Compression
Post by: piotr_n on July 09, 2013, 08:54:27 AM
I'd actually be OK with including a database copy in the downloads, it's a much simpler and easier thing to do than complicated shared UTXO commitment schemes.
This would only improve the bootstrapping time, but it is nohow a "blockchain compression" solution.

Calculating a reliable check-sum of UTXO snapshot might be a bit time consuming, but the process itself is not a rocket science and the algo should be very simple to implement.
Even if you do the actual hash of all the records, the operation should take far below one minute, on a modern PC.
But if you choose a smarter checksum calculation method, you can speed it up a lot by doing some kind of differential UTXO checksum adjustment while processing each block, which would only require fractions of seconds.
A node would already have the expected checksum value, because a UTXO checksum in a block would actually refer to the UTXO state after the previous block.
Moreover, if you still find calculating of UTXO checksum too much resource unfriendly, you may only allow such checksums to be mined (thus verified) each few thousands blocks.