Bitcoin Forum

Bitcoin => Project Development => Topic started by: maaku on May 13, 2013, 08:28:39 PM



Title: [Fundraising] Finish “Ultimate blockchain compression”
Post by: maaku on May 13, 2013, 08:28:39 PM
Update: Please read the latest news here:

https://bitcointalk.org/index.php?topic=204283.msg3117454#msg3117454

I have approx two weeks left on the first round of funding the members of this forum have so generously provided. I will use that time finish and push the pruning and proof generation code, and time permitting expand the utxosync.py script into a small Django web API for querying the UTXO indicies. This could be used for prototyping wallet software that uses the new proofs. However this will provide no increase in security until bitcoin is soft-forked to include UTXO index commitments, or a merged-mined meta-chain is implemented.

This brings me to the next round funding. I am requesting a final 3 months of time to implement the final revision of the index structure in C++ and using LevelDB, optimize it to achieve sub-second update times (or as close to that performance goal as possible), and either (1) submit a documented pull request adding coinbase UTXO hash commitments to the Satoshi client, or (2) write a proxy server for miners to do their own commitments on a merged-mined meta chain. Which outcome depends on whether acceptance of a coinbase commitment pull request is likely to happen, which in turn depends on the achievable performance metrics which we do not know for certainty at this time.

Because of unexpected costs (insurance) and volatility, I'm forced to increase my monthly cost to 65btc, and will fully cash it out on receipt. I am therefore requesting a total of 195btc to finish the implementation as described above. If this is received by the end of the month, then I can promise the deliverable by the end of the year. Otherwise I will continue work, but at a rate in proportion to funds received, as funds are received ("billing" my time hourly). Please send donations to the same address:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Donation address for implementing phase 2
of "Ultimate blockchain compression"

13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
-----BEGIN PGP SIGNATURE-----

iQIcBAEBAgAGBQJSLkT7AAoJECsa1YSJj/UDJ0IP/26oyR1COsLs/f+rEvz33kFP
0HtGvsjbEoF+7cetJpIV0eyFomngOWpafr0uhy+uO6mGguPHbXPJlmcgywFdKDwB
pQrRVYcT2DQx+Hfwnhn51QNIJoB6LhnykUi9KrDar8FwbNfYOgLaSUDGqKShtDOC
lc/qVkP56cCvalcqs6a6Q8O0D69BLO+TwifMPJmtdzdnn/2Fs9ONdgXpnnNLGwpJ
g/LWPy9Zdjspq7qoH/p3kFWo2S8TmX5EShsMDM8C4oUTnMjXbBvodJQwm6AzC0KY
XWdg+/W82YpMpmAmhSxqw43/VzUrODw9WAn7cXrCA86/IwhihZnNhLsELYP7Cd77
kgBWR9HE+NILWTRn+x8CONfi6+gk8ZqYsKmcj+PcYbf1u2BAknKb1EVpTwNp2ehD
8y6oNFj99vkDfZz8hSmt8HLn7YbU9jnmJcFqXwCwDFZD+nvWi1dHFeqnJppH3lWX
HaIF3V+psYQuMpglk+fFVrbmF+omCzsd7dArCXk4n+txA8rGIVBD2nWz4MUUxB9e
TLIeVr4EkH+yjvEK00VzjryfINE6mG58hetm1/y4Y/1EvoDATOyZhR91UFB6x/21
+pCagBDSVquc7DVYk7e275PnKSxjM4miAcf88jkO6TvcdiUaiYnYGxZQRCBY89/6
WgWf1x6CQvknPrYT6sZv
=hg1Y
-----END PGP SIGNATURE-----

As always, I welcome questions and comments regarding this proposal and my work towards implementing it, particularly from funders or developers of lightweight clients that might use it.



TL;DR: I plan to implement the Ultimate blockchain compression w/ trust-free lite nodes (https://bitcointalk.org/index.php?topic=88208.0) proposal by @etotheipi. Please help me acquire funding work full-time on it by pledging coins to 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP.

After a startup misadventure, I have recently found myself back on the job market. Rather than take a soul-crushing 8-5 job, I have taken inspiration from @etotheipi (https://bitcointalk.org/index.php?topic=141786.msg1549715#msg1549715) and would like to try my hand at full-time open-source development.

In my mind the most critical barrier to the widespread adoption of Bitcoin is the initial block download. Requiring the user to spend days downloading and processing gigabytes of data before getting started is unacceptable, and SPV clients do not provide enough security to be an acceptable alternative for many purposes. We can have our cake and eat it too (https://bitcointalk.org/index.php?topic=88208.0), but it will require a heck of a lot of grunt-work.

I reject bounties as being inappropriate for a task of this scale. The likely length of the project prevents me from doing this as pay-on-delivery as I would be receiving no income for living expenses between now and then, and I have a wife and daughter to support. It would also force me to develop in secret and take on a hostile stance with possible competing developers who may “scoop” my ideas or code to claim the bounty before me. The rules of the bounty may also prevent me from having the flexibility to alter the scope or structure of the project due to unforeseen considerations and unavoidable complications. I do not believe such an outcome would be good for the community.

Rather, I promise to develop in an open fashion. I will implement the UBC proposal directly in the Satoshi bitcoind daemon/Bitcoin-Qt client, with my changes periodically pushed to a github repository. I will make weekly status reports on a blog, cross-posted to this forum. I will engage with the developer community to get peer review on design choices and address concerns as they come up. The general plan for development would be:

* Implement the PATRICIA trie index of the UTXO set, using LevelDB as the storage backend. Implement p2p messages and rpc commands for querying this index.
* Add support for tracking multiple chains to bitcoind, and implement the merged-mined UTXO meta-chain.
* Restructure bitcoind to allow retrieval of wallet transactions from a UTXO peer, and construction of the pruned block/transaction database from the UTXO set.
* Implement bittorrent distribution of blockchain data, offloading IBD and full UTXO set retrieval from the bitcoin protocol.
* Switch the default Bitcoin-Qt behavior to perform IBD in the following manner:

1. Download bitcoin block headers and verify proof-of-work.
2. Download coinbase transactions for most recent X bitcoin blocks.
3. Determine head of UTXO meta-chain, retrieve meta-chain block header, and extract UTXO root hash.
4. Fetch wallet transactions from peer, with UTXO proof.
5. Download UTXO set (fast) and construct full transaction validation database.
6. Begin operation as a validating transaction processing node (total time elapsed: minutes).
7. Download entire block chain history (slow) and verify blocks.
8. (Potentially) begin operation as a mining node (total time elapsed: hours/days).

(Update: please read this blog post (http://utxo.tumblr.com/post/55033837913/new-priorities-for-a-new-wallet) for my current views regarding SPV+ support in the Satoshi client, in view of the fact that the Satoshi client is no longer the officially recommended wallet app. Implementation of SPV+ mode in lightweight clients such as MultiBit or Electrum has taken priority, although recent changes to the Satoshi client make SPV+ mode more plausible.)

I expect that full implementation of this proposal would take 4-6 months. It might take longer. I am requesting 150 BTC (or equivalent fiat) to cover rent, food, insurance, medical bills, etc. for the first 3 months in lieu of taking a job to support my family. Obviously this will not be sufficient time to complete the project, but I am expecting to make significant progress and certainly have a better idea of how much follow-on work would be required. You may then evaluate whether a further donation is a worthwhile investment for you.

I have started a blockchain.info wallet for this purpose.

13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP

If it does not reach 150BTC, I will return all contributions. Please reply in this thread or PM me if you have any questions, and I will be happy to answer them. I will also be a panelist at Bitcoin 2013 and would be happy to meet with anyone in person.

CREDENTIALS

I implemented Freicoin, which adds interest/demurrage to the bitcoin blockchain. This required considerably more design and simulation than one might naďvely expect, as interest obviously depends on timing, transactions lack timestamps, and block time is untrustworthy. The solution developed for Freicoin is original and works well in practice.

More recently I have implemented an improved difficulty adjustment algorithm for bitcoin protocol currencies, replacing the 2016-block average with an optimal Parks-McClellan filter co-designed by @galambo and tweaked to give very quick response response times (adjustments every 9 blocks) while maintaining accuracy and avoiding instability.

I have also co-designed with @jtimon a colored coin proposal that adds interest-bearing assets and a distributed, peer-to-peer exchange. I will shortly be posting this as a separate bounty / work proposal.

Outside of bitcoin, I have worked for several years at NASA-Ames Research Center, doing various web development & data analysis & visualization tasks for the planetary science and astrobiology institutes.

EDIT: After less than a day, the funding goal has been achieved. Thank you! As there are still contributions coming in, I will treat it as a rolling fundraiser for continued development after the 3 months have elapsed. This is a project that will take at least 4-6 months, and possibly more. The project will officially start June 1st, after I have wrapped up some other obligations.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: etotheipi on May 13, 2013, 09:12:02 PM
@maaku

Thank you.  I think you are a good match for this development activity, as you clearly have the self-motivation and technical expertise to pull it off.  And you have the proper attitude as well -- sober but encouraging. 

One thing we might consider is renaming this proposal.  The thread was started a year ago with the phrase "Ultimate Blockchain Compression" because of a really silly oversight on my part:  that the address-indexed DB would have to be in addition to the txHash-indexed DB.  I originally thought it would be a replacement for the tx-indexed DB, but you clearly need both.  Therefore, this necessarily adds resource usage on top of an optimally-pruning miner.  I think the recent discussion where I realized that you can eliminate all the pointers, makes the overhead much more manageable, but it's still non-negligible.    I don't know your real name, but if you make this work, perhaps it would we could just name it the Reiner-Maaku tree :)  But I think the "UBC" part should be adjusted since it's not "ultimate compression" (though it does fit on top of an ultimately-compressed blockchain).

I'm happy to contribute to this design where I can.  As you pointed out, I'm quite busy, but I also think this is quite a valuable addition to the protocol.  Even if it somehow turns out to be infeasible, we'll learn a lot from the exercise that will help us do it right on the next try.  However, I'm confident that this will bear fruit with someone capable devoting themselves to it. 

You might consider/research using something other than LevelDB for the full nodes.  LevelDB is very fast and efficient for some things, but I fear it may not be reliable enough for the heavy lifting that will be done by the mining nodes in the [far] future.  For partial/lite nodes, LevelDB makes perfect sense, especially because it's easily bundle-able into end-user software.  But the miners of the future will probably be running dedicated hardware anyway, so they won't be terribly inconvenienced having to have a hardc0re DB engine available to run something like this, with all the robustness/ACID properties that are desired.   Therefore, it would be really nice to have a generic interface (or two interfaces, one for SQL and one for noSQL-key-value-store), so that something like LevelDB can be bundled with the software for "regular" users, and something more concrete can be swapped in for full-mining nodes. 

Also, I should mention that socrates/amiller had some interesting ideas that would be worth discussing.  He had an idea for lite-storage-but-full-verification nodes, which I'm still not confident it is possible, but he insists it is and we agreed to debate it in the future when I had time.  Maybe you can discuss it here before embarking on this journey, to make sure the design is extensible enough to accommodate it if it is feasible.




Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: lunarboy on May 13, 2013, 10:34:22 PM
just a little 0.1 to show my support.

good luck  :)


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: evoorhees on May 14, 2013, 12:53:32 AM
Maaku, instead of whining and complaining about the size of the blockchain, you have stepped up like a man to solve the situation.

You've just been sent a grant of 150 Bitcoins by myself, on behalf of SatoshiDICE, for the ongoing growth and development of this amazing technology we call Bitcoin. (Note to SD shareholders, this is a personal contribution, not coming out of SD earnings)

Further, if after three months, Alan (whom I respect very highly) recommends further development, I will commit to funding another three months (we'll revisit the btc amount based on the exchange rate then).

Please contact me on skype (evoorhees) and we can discuss further. And I agree with Alan that this needs a better name :)


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: Deafboy on May 14, 2013, 01:03:52 AM
Wow... that escalated quickly!
Sending few cents anyway :)


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: oakpacific on May 14, 2013, 05:28:46 AM
Thank you maaku. My 0.5 BTC because we all want to see it happen.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: jtimon on May 14, 2013, 09:11:33 AM
Great maaku! I've sent 1 btc.

This is much necessary. Specially for certain use cases such as having colors in the chain.

About the funding alternative to bounties, I want to tell the concrete story about the freicoin initial development one, because I think it's an interesting example of how things can change even when all seems clearly and simply defined (at least it was in my head if you disagree it was well defined).
The bounty (https://bitcointalk.org/index.php?topic=36190.0) asked for the implementation of the new currency based on my simple design draft plus adding merged mining from namecoin's code.
First, my solution to the "exact payment problem" didn't guaranteed that a transaction would survive a reorg. This is an important problem he identified, and after some discussion (https://github.com/freicoin/freicoin/issues/3), he came up with the elegant solution of using a referece Height set by the payer to calculate demurrage. Although the solution was reasonably simple in design terms, what I initially thought would be an "easy task" was starting to get substantially more complicated.
There was some rounding potential problems I missed too. He also implemented some gitian building scripts and adapted p2pool to interact with demurrage. He implemented the modification to send the budgets to the foundation that I didn't wanted at first but the community was asking for.
He didn't took the code for merged mining because it wasn't (and still isn't) a priority.

Well, I was more than happy with his work and the other donor didn't complained when I asked so I just gave the bounty to him (after the fact, so not as useful for him as the crowdfunding campaign). But if there were many more donors, maybe some of them would have blocked the bounty release.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: Sukrim on May 14, 2013, 10:36:56 AM
As far as I understand it, it might not be possible to do complete block chain analysis after implementing this change, so there might be some reservations from bitcoin-qt developers to this (as you delete parts of history).

Maybe consider a full fork of Bitcoin-qt ("Bitcoinquick-qt"?) in case this doesn't get accepted to the main bitcoin-qt client. Other than that congratulations on reaching your funding goal so quickly and looking forward to first updates regarding implementation! :)


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: CIYAM on May 14, 2013, 02:26:07 PM
Actually a Project had been created in CIYAM Open for this very thing a while back (was never actually "opened" though) - if this could be useful (in terms of creating Project Areas and then specific Project Tasks) then CIYAM Open would be happy to help manage this free of charge (if the current Project Owner is not interested to do that then I will volunteer my own time to help out).


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: maaku on May 14, 2013, 06:39:06 PM
Thank you everyone.

Let's take it from the top:

The topic/proposal should definitely be renamed, as it does not allow for any pruning beyond what is already possible (in theory) with the ultraprune branch that made it into 0.8. I kept the name in the OP only because that is what people know the proposal as, but now that the project is funded we should correct details like this. Suggestions are welcome; perhaps “unspent-TxOut queries for trust-free lite nodes and quick synchronization”? A bit clunky...

Quote from: etotheipi
I think the recent discussion where I realized that you can eliminate all the pointers, makes the overhead much more manageable, but it's still non-negligible.    I don't know your real name, but if you make this work, perhaps it would we could just name it the Reiner-Maaku tree :)

Ah, but you forget I was the proponent of a 2-3-4 BST ;) But no matter, you won me over. One of my first tasks would be finalizing the serialized form of the hybrid trie structure, while making sure that it is efficiently retrievable from an indexed key-value store and inexpensive to update.

I am confident that LevelDB will be a reasonable default choice and prove reliable even under high load, given the performance numbers I've seen and its heritage from Google's BigTable. However yes, the proper abstraction is “indexed key-value store”, for which there are many NoSQL (and SQL) solutions. I may not go to the extent of actually writing or using an intermediary database wrapper, but I will make sure that there is nothing LevelDB-specific about the solution.

Quote from: etotheipi
Also, I should mention that socrates/amiller had some interesting ideas that would be worth discussing.  He had an idea for lite-storage-but-full-verification nodes, which I'm still not confident it is possible, but he insists it is and we agreed to debate it in the future when I had time.  Maybe you can discuss it here before embarking on this journey, to make sure the design is extensible enough to accommodate it if it is feasible.

Yes, I do recall that discussion, and in fact amiller's description of trust-free lite-node operation is I think the primary long-term benefit. Actually, I confess I thought it was your original purpose the first time I read your proposal. Of course “full-verification” is I think a misuse of terms, as in such a case there is in fact no transaction validation but rather trust that the miners on the meta-chain are themselves properly validating, and then construction of a “balance-proof” from the unspent-TxOut trie.

To be clear, UBC (or whatever we call it) allows for two separate “lite” modes of operation: one where the entire unspent-TxOut set is downloaded from peers and new transactions/blocks are validated against it. This is the same as the pruned mode 0.8 is theoretically capable of, except that the unspent-TxOut set need not be built from scratch.

The second, lighter mode of operation is what amiller describes, where the lite client requests other nodes to provide self-validating proofs of address balances by providing paths through the unspent-TxOut Merkle tree. The client can then proceed to operate in an enhanced SPV mode while downloading first the unspent-TxOut set and then the full blockchain in the background.

I thought the controversy was over balanced BST vs trie structures, but I may be misremembering that discussion. I think I have a professional obligation now to take another look through the whole thread and evaluate all ideas contained within.

@evoorhees, I sent you a Skype invite and I'd be happy to talk with you that way or at the conference. My timezone is U.S. Pacific Daylight Time.

@jtimon, thank you for the reference and kind words. Yes, our “assets” colored-coin proposal is what drew me back to this proposal. If people think SatoshiDice brings a lot of traffic and bloats the block chain, wait until they see what a distributed exchange and transitive credit will do, especially once the block-size limit is relaxed.

Quote from: Sukrim
As far as I understand it, it might not be possible to do complete block chain analysis after implementing this change, so there might be some reservations from bitcoin-qt developers to this (as you delete parts of history).

This proposal in no way impacts block chain analysis / validation. It is strictly adding functionality to a fully-validating node. However that functionality would allow for non-validating clients to operate with significantly better security than current lite clients such as non-validating bitcoinj or electrum. Further, it becomes possible that the Satoshi client could start operation in one of these enhanced lite security modes (with a sync time of seconds or a few minutes), and then transition to full-validation once the initial block download is complete.


@CIYAM Open, thanks I do appreciate it, although I have my own project management tools configured the way I like them ;)


For everyone who has contributed, thank you. As there are still contributions coming in, I will treat it as a rolling fundraiser for continued development after the 3 months have elapsed. Between the conference and some other obligations I am winding up, my time will be split for the next couple of weeks. Therefore the 3 months will officially start June 1st.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: etotheipi on May 14, 2013, 07:03:26 PM
Ah, but you forget I was the proponent of a 2-3-4 BST ;) But no matter, you won me over. One of my first tasks would be finalizing the serialized form of the hybrid trie structure, while making sure that it is efficiently retrievable from an indexed key-value store and inexpensive to update.

The beauty of one the last revelations in that thread was that you can define it in terms of that hybrid tree, but implement it with a regular iteration over the natural sorting of the database.  In other words, if you iterate over the database and are careful about tracking where you were and where you are, you can perfectly simulate the hybrid PATRICIA tree without actually implementing all its complexity.  And at the same time saving all the space you would've spent storing pointers.  You simply define it in terms of the hybrid tree (skip strings, omitting NULL pointers), but the actual implementation will be much simpler.

I am confident that LevelDB will be a reasonable default choice and prove reliable even under high load, given the performance numbers I've seen and its heritage from Google's BigTable. However yes, the proper abstraction is “indexed key-value store”, for which there are many NoSQL (and SQL) solutions. I may not go to the extent of actually writing or using an intermediary database wrapper, but I will make sure that there is nothing LevelDB-specific about the solution.

I was just thinking that in the future and/or when developers decide they don't like LevelDB, we may need to switch.  For instance, I read that LevelDB had some problems above 200 million entries.  I don't know the exact parameters (if it's always 200 million, or if it's DB size, etc), but apparently the scalability and robustness may not be there for the long haul.  But it will work perfectly for a proof-of-concept.

Quote from: etotheipi
Also, I should mention that socrates/amiller had some interesting ideas that would be worth discussing.  He had an idea for lite-storage-but-full-verification nodes, which I'm still not confident it is possible, but he insists it is and we agreed to debate it in the future when I had time.  Maybe you can discuss it here before embarking on this journey, to make sure the design is extensible enough to accommodate it if it is feasible.

Yes, I do recall that discussion, and in fact amiller's description of trust-free lite-node operation is I think the primary long-term benefit. Actually, I confess I thought it was your original purpose the first time I read your proposal. Of course “full-verification” is I think a misuse of terms, as in such a case there is in fact no transaction validation but rather trust that the miners on the meta-chain are themselves properly validating, and then construction of a “balance-proof” from the unspent-TxOut trie.

To be clear, UBC (or whatever we call it) allows for two separate “lite” modes of operation: one where the entire unspent-TxOut set is downloaded from peers and new transactions/blocks are validated against it. This is the same as the pruned mode 0.8 is theoretically capable of, except that the unspent-TxOut set need not be built from scratch.

The second, lighter mode of operation is what amiller describes, where the lite client requests other nodes to provide self-validating proofs of address balances by providing paths through the unspent-TxOut Merkle tree. The client can then proceed to operate in an enhanced SPV mode while downloading first the unspent-TxOut set and then the full blockchain in the background.

I was actually talking about some discussions with socrates outside the thread, on IRC.  You said "I think it's a misuse of terms", but no, actually it's not.  Your recap is accurate concerning what I originally proposed.  But socrates took it one step further, saying that you could build on top of this to create a fully-validating node without storing the whole database.  I believe it involves downloading and verifying parts of the chain in real-time, as you are verifying new blocks.  But it can be done, because if you are confident in the state of the blockchain at block N, you can verify block N+1 efficiently given the availability of this new datastructure we are building.  The node wouldn't be very useful for seeding other nodes, but it would at least enforce the rules on new blocks, and theoretically could even solo mine without holding everything.  

To me, it didn't seem very possible that you could have a bunch of low-storage nodes doing full verification, but I also didn't have the patience at the time to flesh out the idea with him.  Perhaps given a system with a high density of full-storage nodes that can help the lite node do its verification, but it certainly wouldn't be feasible for the network to be made up of them, if no one is serving the full data set.

If he's right, that dramatically loosens the specs needed by future miners, when everyone is predicting that only super-computers/clusters will be able to verify-and-mine.  I have my doubts, but we should definitely invite him to expand on that, and even if it's not completely feasible, there's will probably still be useful ideas in there.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: jtimon on May 15, 2013, 02:08:43 PM
I don't know all the details of the proposals, but let me explain it the way I see it and then you people tell me what I'm missing.
I propose "trust levels" as the name. For simplicity, I'm assuming the root of the UTXO tree is part of the headers, not just merged mined, but we can argue about that later.

Trust level 2: The current Simplified payment verification (https://en.bitcoin.it/wiki/Scalability#Simplified_payment_verification).

Trust level 1: We define a distance in blocks to the past which is secure for the node to assume won't suffer a reorg, say 1000 blocks ago and call it the "pseudo-checkpoint". He downloads the full UTXO tree that was in block (currentHeight-1000), the UTXO tree in the pseudo-checkpoint. He downloads 1000 blocks and reproduces the current UTXO tree from the the old one. For a given unspent output, he can provide its full merkle branch until coinbase OR the pseudo-checkpoint if the coinbase is older.

Trust level 0: The node downloads the rest of the chain, being able to verify that the UTXO tree from the pseudo-checkpoint he used was correct, and he will be able to always provide full output markle branches to their coinbases from now on.

You could download the whole chain from the top to the bottom and optimistically start assuming the last UTXO tree was legit, then validate it with the previous one and the transactions of last block, and so on to the genesis. Effectively decreasing the pseudo-checkpoint height from last block created to genesis.
Once you've achieved trust 0, nothing forces you to store the whole chain. You can then set another pseudo-checkpoint, only being afraid of reorgs and/or not being able to provide long enough output chains.
In fact, you can go to trust level 1 or even trust level 0 to operation level 2 if you want. You can download the whole chain but then only keep the branches of your own outputs.

Nodes would have more room for specialization. I think there will be always "librarian nodes" ala block explorer. distributing the whole history. But I think their lack is what worries @Sukrim.

After this I think you can only improve the storage of the UTXO (and posibly their full merkle brach) using caches.

Is this basically what we're talking about or am I lost and the name "trust levels" is awful?


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: Sukrim on May 15, 2013, 02:41:15 PM
Nodes would have more room for specialization. I think there will be always "librarian nodes" ala block explorer. distributing the whole history. But I think their lack is what worries @Sukrim.

True, also there is no incentive to actually operate such a node and if they get rarer, they actually have to serve even more data which puts a stronger pressure to shut them down.

In the OP however there is also some BitTorrent distribution mechanism mentioned... even though torrent files (actually info hashes) for a given file are not 100% deterministic, they can be made deterministic, if there is an agreement on block size. My suggestion would be to include the info hash of a bootstrap.dat file that contains all blocks from (#currentblock-120-#currentblock mod 2016) - meaning that120 blocks after every difficulty change a new info hash gets posted to the UTXO chain. That way the torrent file stays static enough to be properly seeded but changes quickly enough for new clients to catch up.
As BitTorrent downloads are properly hashed, checked etc. and the blockchain can also be verified again upon importing, this should be safe enough, if one also trusts this UTXO chain.
That way full archival nodes would not be queried for older blocks that much, as the bootstrap procedure for a "Ultimate compression" client would be:

* Get current block height
* Get block height of the highest UTXO block X blocks below current height (I would stay with 120 as this is the magic number for spending coinbase transactions, not over 1000... but this canbe even user configurable)
* Download this UTXO block and all UTXO blocks after it, as well as all mainline blocks since then
* Verify that following UTXO blocks were correctly created and also that mainline blocks are OK.
* Get all mainline block headers since genesis (probably a big art of this data can be already come with the installer) and verify that the recent mainline blocks are part of this chain.
### Here you can already start mining your own blocks ###
* Get the full block torrent info hash from the header (?) of the UTXO block(s)
* Download the torrent via a magnet link over DHT (all you need is the info hash after all)
* Verify all full blocks since genesis
### Now you are at the same stage as current bitcoin-qt ###

Now you can either continue to seed the torrent (and maybe even do this on a dedicated machine!) or be happy that the UTXO chain was actually not lying to you, discard most older block contents (maybe here it might make sense to keep the most recent 1000 or so in case there are longer reorgs somewhere in the future?) and just work on UTXO/mainline chains.

With BitTorrent it would be possible to help distributing the block chain without running bitcoind, a downside is that any client that does not have the info hash of the most current check point will seed a relatively dead torrent...


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: The Bitcoin Catalog on May 15, 2013, 02:48:10 PM
This will be an incredible step forward for bitcoin! I will also contribute and try to raise some funds for this project.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: maaku on May 15, 2013, 06:49:33 PM
@Sukrim, that's essentially my thinking with regards to bittorrent. It's just about moving synchronization of new clients or clients which have been offline for a while off of the p2p bitcoin network, and making it easier for people to offer high-bandwidth “seed nodes” just for this purpose. If done correctly there should be no security implications as blockchain data is self-validating.

@jtimon, you are essentially correct although what you describe is only part of the story. I think “trust level” is an appropriate term. Here's how I would lay them out:

Level 4: Electrum-like client/server. Keys are stored on the client, the but the client trusts the server for information about the unspent-TxOut set. This is only marginally better than sticking your coins on a server-side wallet. I would never make a general recommendation to operate at this level unless you own both the server and the client and have a secure, authenticated connection between the two.

Level 3: Simplified payment verification. Client trusts the “longest” (most work) chain, but reads and processes every block. The client must scan the contents of every block more recent than the oldest address in its wallet in order to be sure that its record of unspent wallet outputs is correct. BitcoinJ has some optimizations not mentioned, but only because they make simplifying assumptions about the circumstances under which wallet transactions might be generated. It remains true that you must touch every transaction of every block that might have a wallet input or output within it.

Both of the above levels will be completely obsoleted by this proposal.

Level 2: The client downloads the “longest” (most work) meta-chain, and retrieves the block of data associated with the current head of the meta-chain. This data includes the Merkle hash of the unspent-TxOut trie and it's associated block height. The client then queries any node exposing the correct service bits about its wallet addresses, retrieving either the associated transaction outputs with proof-of-inclusion, or a negative proof showing that no such output exists in the trie. I call this enhanced simple payment verification, or SPV+, and operates trust-free at an economic security level equal to the hash power of the merged-mined meta-chain.

Level 1: The meta-chain data block probably will include other information, such as a deterministic bittorrent infohash of the serialized unspent-TxOut trie and blockchain checkpoint data. The client downloads the unspent-TxOut trie torrent and verifies that its root hash matches what was in the meta-chain. It then reconstructs the information necessary to do full-block validation from that point forward. The initial synchronization is at the meta-chain level of economic security, but after that it would take a 51% attack on bitcoin itself to subvert a client at this level.

Level 0: The client either verifies the entire chain forwards from the genesis block, or backwards up to the most recent checkpoint as @jtimon described. It is now a fully-validated client operating exactly as the Satoshi client does today, and with the same level of security.

There is also a “0.5” mode that might be good enough for most people: only verify backwards long enough to satisfy your own security requirements (a configurable number of blocks, perhaps as a command line parameter).

I've cross-posted most of this to the original proposal thread, since technical discussion should probably occur there.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: maaku on May 21, 2013, 10:36:25 PM
Due to constructive feedback received at the conference, I have created a ReDonate campaign as an optional method for adding to the "Ultimate blockchain compression w/ trust-free lite nodes" campaign. ReDonate is a service for contributing a one-time donation, while receiving periodic reminders in the future to re-donate if you are happy with the progress being made.

http://redonate.net/dashboard/bitcoin-developer-maaku (http://redonate.net/dashboard/bitcoin-developer-maaku)

As noted in the OP, the 3-month funding goal has been achieved, but it is expected that it will take more than 3 months to implement this feature. Any further contributions now will go toward extending the amount of time I have to work on it.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: justusranvier on May 21, 2013, 11:07:28 PM
Due to constructive feedback received at the conference, I have created a ReDonate campaign as an optional method for adding to the "Ultimate blockchain compression w/ trust-free lite nodes" campaign. ReDonate is a service for contributing a one-time donation, while receiving periodic reminders in the future to re-donate if you are happy with the progress being made.

http://redonate.net/dashboard/bitcoin-developer-maaku
That's the link for your personal dashboard. The rest of us need to use this link:

http://redonate.net/campaign/bitcoin-developer-maaku


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: Bicknellski on May 24, 2013, 10:59:19 AM
Quote
Status: 0/unconfirmed, broadcast through 7 nodes
Date: 24/05/2013 17:59
To: Makku 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
Debit: -0.099 BTC
Transaction fee: -0.0005 BTC
Net amount: -0.0995 BTC
Transaction ID: eae93c9d529c486cb679bf0e9f0a72f43bb4db92580ba21cc6a2d7a66847941e

COMPRESS IT!


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: justusranvier on May 24, 2013, 11:53:50 PM
Occasionally though other coins will create something that can be ported back to BTC.  Over at FRC we have set our the goal for our next hard-fork to adding a block-chain compression feature originally proposed by etotheipi that will solve the notorious huge-download-barrier-for-new-user problem.  Our lead programmer has secured 3 months funding generously provided by evoorhees on behalf of SatoshiDICE on the promise the code will be back compatible with BTC.  Our motivation at FRC is that while our block-chain is small now we expect to be around a long time and don't want to have that download problem in the future so it's in our interest to solve this problem.
maaku, would you mind clarifing the scope of his project?


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: maaku on May 25, 2013, 12:29:02 AM
Impaler can get excited sometimes. I am preparing a blog post explaining in detail what I propose to accomplish. Here is a listing of the major points of order:

* The bitcoin Satoshi client (bitcoind and Bitcoin-Qt) will be updated to support two separate Merkle-tree indices: txid:n -> TxOut, and script -> TxOut.
* JSON-RPC commands will be added to query these indices, retrieving (1) the root hashes of the indices for a block; (2) proof of (non-)existence of an unspent transaction output; or (3) the unspent outputs associated with a script, with proof.
* New P2P messages will be created for making queries (2) and (3) against nodes exposing suitable service bits.
* Implement a merged-mined “meta-chain” which contains instead of transactions, checkpoints of the root hashes of the above indices.
* Modify the Satoshi client to start in security-level 2 (see above), and work its way to security-level 0 as a fully-validating node, allowing for quick startup times irregardless of blockchain size.

There are other related things which may be valuable to accomplish, such as pruned operation, and many, many small details left out. However any work on alternative currencies such as Freicoin is outside of scope, and will not be paid for by this stipend.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: justusranvier on May 25, 2013, 12:36:11 AM
Thank you.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: maaku on May 30, 2013, 07:03:19 PM
I have started a blog where I will post periodic status updates and technical explanations as I begin work on this proposal. Please check it out:

http://utxo.tumblr.com/ (http://utxo.tumblr.com/)


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: solex on June 03, 2013, 10:16:47 PM
Maaku, instead of whining and complaining about the size of the blockchain, you have stepped up like a man to solve the situation.

You've just been sent a grant of 150 Bitcoins by myself, on behalf of SatoshiDICE, for the ongoing growth and development of this amazing technology we call Bitcoin. (Note to SD shareholders, this is a personal contribution, not coming out of SD earnings)

Further, if after three months, Alan (whom I respect very highly) recommends further development, I will commit to funding another three months (we'll revisit the btc amount based on the exchange rate then).

Please contact me on skype (evoorhees) and we can discuss further. And I agree with Alan that this needs a better name :)

Just saw this post, and want to say that this is an awesome expression of support for the future success of Bitcoin!


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: freedomno1 on June 07, 2013, 05:07:04 AM
This is quite high level stuff and I can see how significant these changes would be to the overall usability of bitcoin downloading that client is a significant barrier to its usage as the blockchain grows.
In addition the amount of gruntwork to make such a system work
Since this is a technical discussion thread I will refrain from further posting but I wanted to express my appreciation to developers in building up bitcoin and ensuring it's ongoing growth and commend your transparency
Thank you and I look forward to seeing the new implementation


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: maaku on June 10, 2013, 05:59:29 PM
I've posted a status update to the implementation blog, including a brief overview of the current development pathway:

http://utxo.tumblr.com/post/52638982528/pathway-to-freedom


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: K1773R on June 14, 2013, 12:54:32 PM
great to see you dont abandon it, keep it up :)


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: Arcurus on June 30, 2013, 04:00:57 PM
Hi Maaku,
Do you have also a Freicoin donation address for this project?

I would love to donate some FRCs for this Project ;)

I wish you the best for your project,
Arcurus

UPDATE:

Oh I just saw at your blog, that also Freicoins are accepted at this address, so I can just donate Freicoins to the same address?

"13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP (Bitcoin or Freicoin accepted)."


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: dansmith on June 30, 2013, 05:37:25 PM
maaku, are you aware that RSS feed isn't working on your website.
I can't conveniently keep abreast with your progress.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: maaku on June 30, 2013, 07:53:13 PM
Hi Arcturus, that address should work just fine for freicoin as well. Thanks! I'll see what the RSS  problem is about .. it's a tumblr blog so I'm not sure what control I have over that, but I will try.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: Arcurus on June 30, 2013, 08:14:39 PM
Hi Arcturus, that address should work just fine for freicoin as well. Thanks! I'll see what the RSS  problem is about .. it's a tumblr blog so I'm not sure what control I have over that, but I will try.

nice!

Freicoins are coming :)

And good luck for your coding effords!
Arcurus


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: cozz on July 01, 2013, 10:25:54 PM
Does the UTXO root hash have to be recalculated after every block, and maybe other tree recalculations?

If so, I am a little bit concerned about the performance impact. Lets say bitcoin grows, then the recalculation of this thing could
take minutes.

So do you think, it is possible to do all the necessary recalculations after every block in lets say 1 second, even if bitcoin gets really big and there is huge database?

(other than this, many thanks for working on the project, I have donated 1BTC)


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: maaku on July 02, 2013, 03:01:19 PM
Validation of a full block currently takes more than 1 second on most commodity hardware, and that's with the 1MB blocksize limit. There's a reason Satoshi choose 10 minute interblock times.

But it's a fair question to ask, and we should be concerned about performance. The number of hash operations required to update the UTXO index is approximately:

2 * (I + O) log (N)
I: # inputs in new block
O: # outputs in new block
N: # outputs in UTXO set

So it scales linearly with the size of the block, and logarithmically with the size of the UTXO set. The constant 2 is because there will be two indices maintained, not the 1 index currently used for block validation, although one should expect the constant factor to be different anyway.

Besides the constant-factor difference, the added cost is that log(N) factor. That's not a super-significant change to scalability, and in my own opinion is definitely a worthwhile cost for secure lightweight clients and quicker sync times.

However when it comes to performance and scalability, the elephant in the room is ECDSA, not hashing. A modern consumer PC can hash about 100 MB/s, but only do about 100 ECDSA signature verifications per second. The number of signature operations is related to the number of inputs, and therefore the size of the block. That's a speed difference of 3 to 5 orders of magnitude, depending on natural variation on the size of outputs and the number of signature operations per input.

EDIT: And thank you for your contribution!


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: cozz on July 03, 2013, 08:24:47 PM
A modern consumer PC can hash about 100 MB/s, but only do about 100 ECDSA signature verifications per second.

Just wanted to add that this number seems a little bit low to me, I have run bitcoin with the -benchmark flag, and it tells me
that I can verify over 10000 signatures per second on a 4x4GHz cpu.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: maaku on July 03, 2013, 09:55:33 PM
fBenchmark (-benchmark) doesn't measure ECDSA performance. Are you sure you're reading the output correctly?


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: etotheipi on July 03, 2013, 10:15:03 PM
I know sipa was working on an optimized ECDSA verifier that would outperform SSL about 4x.  I don't know if he finished it (and got it into Bitcoin-Qt/bitcoind yet).

But I do remember that a single core of my i5-2500K did something like 200-500 ECDSA verifications per sec.  So with 4 cores that would be like 1000-2000/s.  And probably something like 5,000/sec when using sipa's optimization. 

However, it still doesn't change the fact that it's a couple orders of magnitude slower than hashing, which can execute about 15,000,000/s on the same hardware.  In the grand scheme of things, the address-indexed DB will not consume much time when implemented properly.  Each block only updates a couple MB of DB values which should take far less than a second, even with synchronous writes.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: cozz on July 03, 2013, 10:24:44 PM
cat debug.log | grep 'Verify '
- Verify 130 txins: 29.40ms (0.226ms/txin)
- Verify 345 txins: 76.33ms (0.221ms/txin)
- Verify 171 txins: 38.20ms (0.223ms/txin)
- Verify 201 txins: 44.65ms (0.222ms/txin)
- Verify 212 txins: 51.27ms (0.242ms/txin)
- Verify 385 txins: 83.74ms (0.218ms/txin)

ok, so this is like 4000-5000 /s


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: maaku on September 04, 2013, 11:44:50 PM
I have pushed to three related code updates, representing work done during the month of August towards implementation of the UTXO index structures for bitcoin. As I've mentioned earlier on my blog, I'm doing the preliminary exploration work in Python, and so far that's paid off. I've gone through three iterations of the prefix-tree design, at each step uncovering weaknesses that require various modifications to the original design. The design, implement, test, benchmark, redesign cycle is averaging about a week, much faster than developing directly in C++ would have been. Python is, however, much slower, chiefly because of the time spent marshaling in and out of the block store or constructing hash preimages, and the resulting dynamic string manipulations. Now that the design is starting to solidify, I will probably do one more iteration before moving to C++ and beginning implementation for the Satoshi client.

The code updates include a new release of python-bitcoin, which contains the UTXO index data structures. Of particular interest to other developers would be the modules patricia.py, ledger.py, and merkle.py. The patricia.py module contains the PatriciaNode class which implements the node-compressed, level-compressed trie structure and whose API looks very similar to an ordered dictionary. ledger.py contains the supporting structures necessary to TxIdIndex and ContractIndex - the two authenticated indices underlying the “Ultimate blockchain compression” proposal. The merkle.py module contains a similarly constructed albeit not finished implementation of prunable Merkle trees. You can access this code here:

https://github.com/monetizeio/python-bitcoin/

The second project, which I'm releasing now for the first time, is a SQL blockchain storage backend using the absolutely wonderful SQLAlchemy object relational mapper. For this project, using SQL for storage has been beneficial as many profiling tools exist, and it allows arbitrary queries to be constructed after-the-fact to help diagnose bottlenecks. There are, however, obvious performance limitations and nobody should ever consider using a SQL backend for a fully validating node. Some people may find it useful for specialized applications requiring join queries, or perhaps a block explorer-like website. The code is available on github, and interfaces with python-bitcoin:

https://github.com/monetizeio/sqlalchemy-bitcoin/

Finally, in case anyone wants to play around with what I've been working on, there's a tool for synchronizing the SQL storage backend with the block chain via bitcoind's JSON-RPC interface. I *only* recommend using this for testnet data, and preferably the smaller testnet-in-a-box chain. Performance is abysmal, but for known reasons explained above. Back-of-the-envelope calculations based on profiling data show that a compiled implementation with smart memory management and a fast key/value data store should operate at least 100x faster, which would be enough to handle the live bitcoin block chain. Code is available here:

https://github.com/maaku/utxo-ledger/

I will be posting more updates soon.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: maaku on September 09, 2013, 09:57:50 PM
Final revision of the index structure has been committed to python-bitcoin. It is now a level-compressed and node-compressed binary prefix tree. This is a pretty big change from the original proposal, but clearly the right decision to make in light of the performance numbers. The purpose of a 256-way radix tree was to reduce the number of hash operations, but that came at the price of greatly increased node sizes. This made index traversal more complex, and resulted in enormous proof sizes. Alternatives such as having a Merkle-ized internal node structure would have resulted in the exact same performance as a binary tree, but with much greater code complexity.

There may be one more minor change (eliminating the no longer needed coinbase flags from the ultraprune structure), but after that hash values and proof structure should remain unchanged from then until the deployment to the Satoshi client. The design phase is now over.

I also made some slight but important adjustments to how the utxosync.py program constructs the root hash. Insertion of coinbase transactions into the ledger is delayed by 99 blocks, meaning that an output shows up in the ledger only once it is eligible to be spent in the next block, according to the network rules. The primary reason is that the index of a block cannot contain its own coinbase, if the hash of the index is itself going to be committed to the coinbase. Otherwise you'd have a circular hash dependency (input depends on output), which is obviously impossible to resolve.

The final performance characteristics: average-case proof size of 64 * log2(utxo transactions) for the txid index, which is currently about 1.35kB. Average case for the address index is: 65 * log2(utxo outputs), currently 1.45kB. The worst case size for a txid proof is 16.25kB, although that will never happen because setting it up would require a SHA-256 pre-image attack. Worst case for the address index is 65 bytes * number of bits in the scriptPubKey, which doesn't require a pre-image but can be protected by not reusing addresses.

Syncing with the block chain takes a few seconds per block, or a few dozen for the really big blocks, with 95% of that being spent in the SQL database, and 4 of the remaining 5% spent marshaling data. As mentioned earlier, moving to LevelDB and a compiled language should eliminate most of that, hopefully resulting in two orders of magnitude speedup. So performance is looking good, so far.



I have approx two weeks left on the first round of funding the members of this forum have so generously provided. I will use that time finish and push the pruning and proof generation code, and time permitting expand the utxosync.py script into a small Django web API for querying the UTXO indicies. This could be used for prototyping wallet software that uses the new proofs. However this will provide no increase in security until bitcoin is soft-forked to include UTXO index commitments, or a merged-mined meta-chain is implemented.

This brings me to the next round funding. I am requesting a final 3 months of time to implement the final revision of the index structure in C++ and using LevelDB, optimize it to achieve sub-second update times (or as close to that performance goal as possible), and either (1) submit a documented pull request adding coinbase UTXO hash commitments to the Satoshi client, or (2) write a proxy server for miners to do their own commitments on a merged-mined meta chain. Which outcome depends on whether acceptance of a coinbase commitment pull request is likely to happen, which in turn depends on the achievable performance metrics which we do not know for certainty at this time.

Because of unexpected costs (insurance) and volatility, I'm forced to increase my monthly cost to 65btc, and will fully cash it out on receipt. I am therefore requesting a total of 195btc to finish the implementation as described above. If this is received by the end of the month, then I can promise the deliverable by the end of the year. Otherwise I will continue work, but at a rate in proportion to funds received, as funds are received ("billing" my time hourly). Please send donations to the same address:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Donation address for implementing phase 2
of "Ultimate blockchain compression"

13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
-----BEGIN PGP SIGNATURE-----

iQIcBAEBAgAGBQJSLkT7AAoJECsa1YSJj/UDJ0IP/26oyR1COsLs/f+rEvz33kFP
0HtGvsjbEoF+7cetJpIV0eyFomngOWpafr0uhy+uO6mGguPHbXPJlmcgywFdKDwB
pQrRVYcT2DQx+Hfwnhn51QNIJoB6LhnykUi9KrDar8FwbNfYOgLaSUDGqKShtDOC
lc/qVkP56cCvalcqs6a6Q8O0D69BLO+TwifMPJmtdzdnn/2Fs9ONdgXpnnNLGwpJ
g/LWPy9Zdjspq7qoH/p3kFWo2S8TmX5EShsMDM8C4oUTnMjXbBvodJQwm6AzC0KY
XWdg+/W82YpMpmAmhSxqw43/VzUrODw9WAn7cXrCA86/IwhihZnNhLsELYP7Cd77
kgBWR9HE+NILWTRn+x8CONfi6+gk8ZqYsKmcj+PcYbf1u2BAknKb1EVpTwNp2ehD
8y6oNFj99vkDfZz8hSmt8HLn7YbU9jnmJcFqXwCwDFZD+nvWi1dHFeqnJppH3lWX
HaIF3V+psYQuMpglk+fFVrbmF+omCzsd7dArCXk4n+txA8rGIVBD2nWz4MUUxB9e
TLIeVr4EkH+yjvEK00VzjryfINE6mG58hetm1/y4Y/1EvoDATOyZhR91UFB6x/21
+pCagBDSVquc7DVYk7e275PnKSxjM4miAcf88jkO6TvcdiUaiYnYGxZQRCBY89/6
WgWf1x6CQvknPrYT6sZv
=hg1Y
-----END PGP SIGNATURE-----

As always, I welcome questions and comments regarding this proposal and my work towards implementing it, particularly from funders or developers of lightweight clients that might use it.


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: justusranvier on September 09, 2013, 11:10:05 PM
I'll continue (https://bitcointalk.org/index.php?topic=93606.msg2325555#msg2325555) donating monthly through your ReDonate campaign for an extra 3 months.


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: maaku on September 11, 2013, 08:23:09 PM
Thank you justusranvier. Your generous support over the last few months is what has kept me going to hit this milestone after the funding would have run out at the end of August. In case people missed it, here's the ReDonate campaign:

http://redonate.net/dashboard/bitcoin-developer-maaku


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: etotheipi on September 23, 2013, 05:33:02 AM
As you all have probably heard by now, Armory just got a huge injection of funds (http://www.coindesk.com/bitcoin-wallet-armory-raises-600k-seed-funding/).  Obviously, we didn't get the money to just "give it away", but I see that what Mark is doing is a huge benefit to Bitcoin, even if it ultimately doesn't work.  And if it does work, Armory would use it pretty much immediately.    As such, I think it make sense for Armory to commit some funding to this project.  I really want to see this idea explored, for the benefit of Bitcoin (not just because I originally came up with it :)).

Armory Technologies, Inc. is pledging 50 BTC for this.  The only catch is that we don't actually have 50 BTC available right now (only USD), but we should have it available some time next week and I will send it then.  Please email/PM me to pester me about it in a week if you don't see it.  

Looking forward to seeing what you can do with another few months!


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: greBit on September 24, 2013, 01:06:50 PM
This is looking extremely promising :)

Is there some document somewhere that explains the key contributions that have been made? I am basically looking for a TL;DR

i.e.
  • what level of compression can you achieve
  • and at what performance cost?

And perhaps a doc explaining the solution?  Pseudocode of the algorithms / discussion of your data structures would be great

thanks


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: maaku on September 25, 2013, 07:10:41 PM
Re: compression ratios, the naming of this project is unfortunate. When originally proposed, @sipa's ultraprune branch was still being implemented. Ultraprune does basically the same "compression," in the form of maintaining a validation index that enables blockchain data to be pruned. This proposal adds no further compression beyond that, and in fact as proposed doubles disk requirements (from about 250MB for ultraprune, to 500MB for this proposal, currently) because it maintains two authenticated UTXO indices: one for validation, one for wallet operations. (Only the former is in ultraprune, in unauthenticated form.)

I have suggested the title “Authenticated index structures for secure lightweight operation of non-mining bitcoin nodes” which much better describes the benefits. I much prefer this to “Ultimate blockchain compression,” but unfortunately the latter is the name it is known by in this community.

Re: performance cost, I have extrapolations. The current code was written for the purpose of evaluating alternative designs and testing asymptotic performance. Through profiling and extrapolation I have a good idea of what final performance numbers will be, so please don't be shocked when you see the current numbers.

The current Python implementation (available here (https://github.com/monetizeio/python-bitcoin)) can take up to 10^3 seconds to index a full 1MB block, using PostgreSQL as a storage backend. A full 95% of that time is spent in the SQL database performing between 10^4 - 10^5 uncached query operations. Switching to LevelDB or similar nosql datastore and adding caching should reduce time spent in the database by nearly two orders of magnitude. Furthermore, 80% of the remaining time is spent marshaling blockchain and index data into Python objects. Even within Python this could be optimized by using buffer objects or a C extension. Reasonable extrapolation shows that updates on the order of 10 seconds should be possible for a straightforward implementation of the current design in C++ using LevelDB as a storage backend.

You can use asymptotic analysis to look at the bare minimum work that must be done to update an index structure, and you end up with a further order of magnitude reduction: If you assume about 10,000 inputs+outputs for a full block - the far upper end of what is possible - then that is about 200,000 updates to the index structure in the worst case. LevelDB could handle this in under a second on a beefy machine. The real performance limiter is the SHA-2 hashes required for each node update, but hardware SHA-256 -- such as the new Intel CPU extensions, or even a GPU accelerated implementation -- should make the database the bottleneck again.

Sub-second updates for a 1MB block is then the long-term performance goal. Reaching it might require some hardware assumptions or follow-on work. But if this project can at least reach the achievable near-term milestone of less than 10 seconds (for a full block, worst case scenario), then performance would probably be adequate for inclusion in the reference client.

(Note that a very simple optimization prevents this from being an onerous requirement for miners: cache the indices for last generated block. There is likely to be a lot of similarity between two consecutively generated blocks (being the same high priority transactions from the same mempool), and all the miner would have to calculate is the differences. Nonce-rolling is not a problem as inclusion of the coinbase in the index is delayed by a number of blocks anyway.)

Re: a technical document / whitepaper, the folks at Invictus Innovations have generously offered to support the writeup of a series of BIPs explaining the data structure and a few example use cases - they are particularly interested in using it for merged mining of BitShares, using a scheme that I developed for Freicoin's eventual merged mining support. This is still being sorted out, but hopefully there will be technical document soon, and the associated discussion from other Bitcoin developers.


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: Peter Todd on September 25, 2013, 08:41:15 PM
(Note that a very simple optimization prevents this from being an onerous requirement for miners: cache the indices for last generated block. There is likely to be a lot of similarity between two consecutively generated blocks (being the same high priority transactions from the same mempool), and all the miner would have to calculate is the differences. Nonce-rolling is not a problem as inclusion of the coinbase in the index is delayed by a number of blocks anyway.)

What's the worst case vs. best case vs. avg case performance of the UTXO updating process?

Remember that we can't make it possible for a miner to intentionally make a block whose UTXO set modifications take an especially long amount of time to process to get an edge on the competition.


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: maaku on September 25, 2013, 10:53:21 PM
To be clear, I'm not sure that example would have the effect you're imagining. Creating a purposefully time-wasting block to validate would hurt, not help the miner. But presumably you meant broadcasting DoS transactions which, if they got in other miner's blocks, would slow down their block propogations. That's possible but not economical for reasons that I'll explain in a minute.

But stepping back from that example, block propagation is why sub-second index construction/validation would be ideal. It places index operations in the same ballpark as the other verification tasks, meaning that it won't significantly affect the time it takes for blocks to propagate through the network. If index construction is done in parallel (using GPU acceleration, for example), it might not have any effect at all.

Rather than just quote numbers I'll explain where they come from:

There are two indices: the validating index containing unspent transactions keyed by <txid>, and the wallet index containing unspent outputs keyed by <scriptPubKey, txid, index>.

Let T be the number of transactions with at least one unspent output, N the total number of unspent outputs, X the number of transactions in a new block, I the number of inputs, and O the number of outputs.

The number of hash operations required to perform an insert, delete, or update to the indices is 1, log(T) or log(N), and len(key) for the best, average, and worst case respectfully. Since the keys of both indices involve hashed values (txid, and typically also the content of scriptPubKey), creating artificially high pathways through the index structure is a variant of the hash pre-image problem - you are essentially constructing transactions whose hash and output scriptPubKeys have certain values so as to occupy and branch specific nodes in the index structure, for the entire length of a given key. This is computationally expensive to do, and actually growing the pathway requires getting the transactions into the UTXO set, which requires some level of burnt BTC to avoid anti-dust protections. So it is not economically viable to construct anything more than a marginally worse pathway; the average case really is the only case worth considering.

For the average case, the number of index update operations are:

validating: (X+I) * log(T)
wallet:     (I+O) * log(N)

Currently, these parameters have the following values:

T = 2,000,000
N = 7,000,000
X =     7,000 (max)
I =    10,000 (max)
O =    15,000 (max)

Those are approximate maximum, worst case values for X, I, and O given a maximum block size of 1MB. You certainly couldn't reach all three maximums simultaneously. If you did though, it'd be about 1,000,000 hash operations (half a "megahash," since we're talking single SHA-2), and more importantly 1,000,000 updates to the leveldb structure, which is about 4-5 seconds on a fast disk. My earlier calculation of a max 200,000 updates (computable in about 0.8s) was based on actual Bitcoin block data, and is somewhat more realistic as a worst case.

EDIT: And again, even if a call to 'getblocktemplate' takes multiple seconds to complete, the *next* call is likely to include mostly the same transactions again, with just a few additions or deletions. So successive getblocktemplate calls should take just milliseconds to compute by basing the index on the last returned tree.

Conceivably, when receiving and validating a block it's also possible to start with a pre-generated index based off the last block + mempool contents, and simply undo transactions that were not included. Assuming, of course, that the mempool is not getting backlogged, in which case it would make more sense to start with the last block. If you collect partial proof-of-works, then you could pre-compute updates from transactions which appear in many of the partial blocks. There's a large design space here to explore, in terms of optimization.


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: greBit on September 25, 2013, 11:09:15 PM
Thanks. This is the kind of information I was after.

Those are approximate maximum, worst case values for X, I, and O given a maximum block size of 1MB. You certainly couldn't reach all three maximums simultaneously. If you did though, it'd be about 1,000,000 hash operations (half a "megahash," since we're talking single SHA-2), and more importantly 1,000,000 updates to the leveldb structure, which is about 4-5 seconds on a fast disk. My earlier calculation of a max 200,000 updates (computable in about 0.8s) was based on actual Bitcoin block data, and is somewhat more realistic as a worst case.

How big is the data structure that is being manipulated? How does it grow with block count, based upon your best/worst/avg cases?

I guess I could infer that its probably waay too big to fit in RAM since you suggest performance depends on disk speed


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: maaku on September 26, 2013, 12:32:07 AM
This structure is a variant of the current ultraprune index used for block validation in the 0.8 client, which consumes about 250MB of data (you can check this with the 'gettxoutsetinfo' RPC). My work uses these data as the leaf nodes of a binary tree, adding about 2 x 10^6 internal nodes to the validation index, for example. This adds about 120MB, currently. The wallet index is a little bit larger, so I would expect the total to be somewhere around 1GB, as of the current block height. This is small enough to fit in RAM, but needs to be persisted to disk as it is an expensive to recreate, hence the dependence on disk speed.


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: Peter Todd on September 26, 2013, 11:42:54 AM
To be clear, I'm not sure that example would have the effect you're imagining. Creating a purposefully time-wasting block to validate would hurt, not help the miner. But presumably you meant broadcasting DoS transactions which, if they got in other miner's blocks, would slow down their block propogations. That's possible but not economical for reasons that I'll explain in a minute.

You can't assume that slowing down block propagation is a bad thing for a miner. First of all block propagation is uneven - it'd be quite possible for large pools to have much faster propagation than the more decentralized solo/p2pool miners we want in the network. Remember, if you can reliably get your blocks to just over 50% of the hashing power, you're better off than if they get to more than that because there are more fee paying transactions for you to pick from, and because in the long run you drive difficulty down due to the work your competitors are wasting. Of course reliably getting to 50% isn't easy, but reliably getting to, say, no more than 80% is probably quite doable, especially if efficient UTXO set handling requires semi-custom setups with GPUs to parallelize it that most full-node operators don't have.

Secondly if block propagation does become an issue miners will switch to mining empty or near empty blocks building on top of the most recent blockheader that they know about until they finally get around to receiving the block in full as a way to preserve their income. Unfortunately even with UTXO commits they can still do that - just re-use the still valid UTXO commitment from the previous block.(1) Obviously this has serious issues with regard to consensus... which is why I say over and over again that any UTXO commitment scheme needs to also have UTXO possession proofs as a part of it to make sure miners can't do that.

1) Modulo the fact that the UTXO set must change because a previously unspendable coinbase output became spendable; if we don't do explicit UTXO possession proofs we should think very carefully about whether or not a miner (or group of miners) can extract the info required to calculate the delta to the UTXO set that the coinbase created more efficiently than just verifying the block honestly. I'm not sure yet one way or another if this is in fact true.

But stepping back from that example, block propagation is why sub-second index construction/validation would be ideal. It places index operations in the same ballpark as the other verification tasks, meaning that it won't significantly affect the time it takes for blocks to propagate through the network. If index construction is done in parallel (using GPU acceleration, for example), it might not have any effect at all.

Rather than just quote numbers I'll explain where they come from:

There are two indices: the validating index containing unspent transactions keyed by <txid>, and the wallet index containing unspent outputs keyed by <scriptPubKey, txid, index>.

Let T be the number of transactions with at least one unspent output, N the total number of unspent outputs, X the number of transactions in a new block, I the number of inputs, and O the number of outputs.

The number of hash operations required to perform an insert, delete, or update to the indices is 1, log(T) or log(N), and len(key) for the best, average, and worst case respectfully. Since the keys of both indices involve hashed values (txid, and typically also the content of scriptPubKey), creating artificially high pathways through the index structure is a variant of the hash pre-image problem - you are essentially constructing transactions whose hash and output scriptPubKeys have certain values so as to occupy and branch specific nodes in the index structure, for the entire length of a given key. This is computationally expensive to do, and actually growing the pathway requires getting the transactions into the UTXO set, which requires some level of burnt BTC to avoid anti-dust protections. So it is not economically viable to construct anything more than a marginally worse pathway; the average case really is the only case worth considering.

re: content of the scriptPubKey, change your algorithm to index by H(scriptPubKey) please - a major part of the value of UTXO commitments is that they allow for compact fraud proofs. If your tree is structured by scriptPubKey the average case and worst case size of a UTXO proof differs by 454x; it's bad engineering that will bite us when someone decides to attack it. In any case the majority of core developers are of the mindset that in the future we should push everything to P2SH addresses anyway; making the UTXO set indexable by crap like "tags" - the rational for using the scriptPubKey directly - just invites people to come up with more ways of stuffing data into the blockchain/UTXO set.

For the average case, the number of index update operations are:

validating: (X+I) * log(T)
wallet:     (I+O) * log(N)

Currently, these parameters have the following values:

T = 2,000,000
N = 7,000,000
X =     7,000 (max)
I =    10,000 (max)
O =    15,000 (max)

Those are approximate maximum, worst case values for X, I, and O given a maximum block size of 1MB. You certainly couldn't reach all three maximums simultaneously. If you did though, it'd be about 1,000,000 hash operations (half a "megahash," since we're talking single SHA-2), and more importantly 1,000,000 updates to the leveldb structure, which is about 4-5 seconds on a fast disk. My earlier calculation of a max 200,000 updates (computable in about 0.8s) was based on actual Bitcoin block data, and is somewhat more realistic as a worst case.

Worst case would be if a miner creates a block filled with transactions spending previously created 10,000 byte scriptPubKeys and creating no new ones. A minimal CTxIn is 37 bytes, so you'd get roughly 1,000,000/37=27,027 txouts spent. Each txout requires on the order of 10,000 hash operations to update it, so you've actually got an upper limit of ~270,270,000 updates, or 22 minutes by your metric... There's a good chance something else would break first, somewhere, which just goes to show that indexing by scriptPubKey directly is a bad idea. Sure it'd be tricky and expensive to create such a monster block, but it is possible and it leads to results literally orders of magnitude different than what is normally expected.

re: single SHA256, I'd recommend SHA256^2 on the basis of length-extension attacks. Who cares if they matter; why bother analyzing it? Doubling the SHA256 primitive is a well accepted belt-and-suspenders countermeasure, and done already elsewhere in Bitcoin.

EDIT: And again, even if a call to 'getblocktemplate' takes multiple seconds to complete, the *next* call is likely to include mostly the same transactions again, with just a few additions or deletions. So successive getblocktemplate calls should take just milliseconds to compute by basing the index on the last returned tree.

Conceivably, when receiving and validating a block it's also possible to start with a pre-generated index based off the last block + mempool contents, and simply undo transactions that were not included. Assuming, of course, that the mempool is not getting backlogged, in which case it would make more sense to start with the last block. If you collect partial proof-of-works, then you could pre-compute updates from transactions which appear in many of the partial blocks. There's a large design space here to explore, in terms of optimization.

Lots of optimizations sure, but given that Bitcoin is a system where failure to update sufficiently fast is a consensus failure if any of those optimizations have significantly different average case and worst case performance it could lead to a fork; that such optimizations exist isn't a good thing.


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: matt608 on September 26, 2013, 01:57:14 PM

@jtimon, thank you for the reference and kind words. Yes, our “assets” colored-coin proposal is what drew me back to this proposal. If people think SatoshiDice brings a lot of traffic and bloats the block chain, wait until they see what a distributed exchange and transitive credit will do, especially once the block-size limit is relaxed.


I was under the impression coloured coins don't cause blockchain bloat, unlike Mastercoin, and that is one of its main benefits.  Am I wrong? 


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: maaku on September 26, 2013, 02:13:44 PM
You can't assume that slowing down block propagation is a bad thing for a miner. First of all block propagation is uneven - it'd be quite possible for large pools to have much faster propagation than the more decentralized solo/p2pool miners we want in the network. Remember, if you can reliably get your blocks to just over 50% of the hashing power, you're better off than if they get to more than that because there are more fee paying transactions for you to pick from, and because in the long run you drive difficulty down due to the work your competitors are wasting. Of course reliably getting to 50% isn't easy, but reliably getting to, say, no more than 80% is probably quite doable, especially if efficient UTXO set handling requires semi-custom setups with GPUs to parallelize it that most full-node operators don't have.

Still, you're playing a game that is not in your favor. In this case you are trading a few extra seconds of alone time hashing the next block against the likelihood that someone else will find a competing (and faster validating) block in the mean time. Unless you are approaching 50% hash power, it's not worth it. The faster your blocks get validated, the better off you are.

Secondly if block propagation does become an issue miners will switch to mining empty or near empty blocks building on top of the most recent blockheader that they know about until they finally get around to receiving the block in full as a way to preserve their income. Unfortunately even with UTXO commits they can still do that - just re-use the still valid UTXO commitment from the previous block.(1) Obviously this has serious issues with regard to consensus... which is why I say over and over again that any UTXO commitment scheme needs to also have UTXO possession proofs as a part of it to make sure miners can't do that.

1) Modulo the fact that the UTXO set must change because a previously unspendable coinbase output became spendable; if we don't do explicit UTXO possession proofs we should think very carefully about whether or not a miner (or group of miners) can extract the info required to calculate the delta to the UTXO set that the coinbase created more efficiently than just verifying the block honestly. I'm not sure yet one way or another if this is in fact true.

As you say, they would have to make at the very least one set of adjustments for the coinbase which becomes newly active, requiring access to part of the datastructure. This could be facilitated by miners including a proof with the block header that provides just the paths necessary for making the coinbase update. Miners know that any full node will reject a header with improperly constructed commitment hashes, so I fail to see what the issue is.

re: content of the scriptPubKey, change your algorithm to index by H(scriptPubKey) please - a major part of the value of UTXO commitments is that they allow for compact fraud proofs. If your tree is structured by scriptPubKey the average case and worst case size of a UTXO proof differs by 454x; it's bad engineering that will bite us when someone decides to attack it. In any case the majority of core developers are of the mindset that in the future we should push everything to P2SH addresses anyway; making the UTXO set indexable by crap like "tags" - the rational for using the scriptPubKey directly - just invites people to come up with more ways of stuffing data into the blockchain/UTXO set.

The full scriptPubKey needs to be stored, or else two lookups would be required - one from the wallet index to check existence, and one from the validation index to actually get the scriptPubKey. That is wasteful. The alternative is key by hash(scriptPubKey) and put scriptPubKey in resulting value: that nearly doubles the size of the index on disk. You'd have to have a really strong reason for doing so, which I don't think the one given is.

BTW, the structure is actually keyed by compress(scriptPubKey), using the same compression mechanism as ultraprune. A standard transaction (pubkey, pubkeyhash, or p2sh) compresses to the minimal number of bytes of either pubkey or hash data. "Attacking" pathways to pubkeyhash or p2sh keys requires multiple hash preimages. I put it in quotes because honestly, what exactly is the purpose of the attack?

Worst case would be if a miner creates a block filled with transactions spending previously created 10,000 byte scriptPubKeys and creating no new ones. A minimal CTxIn is 37 bytes, so you'd get roughly 1,000,000/37=27,027 txouts spent. Each txout requires on the order of 10,000 hash operations to update it, so you've actually got an upper limit of ~270,270,000 updates, or 22 minutes by your metric... There's a good chance something else would break first, somewhere, which just goes to show that indexing by scriptPubKey directly is a bad idea. Sure it'd be tricky and expensive to create such a monster block, but it is possible and it leads to results literally orders of magnitude different than what is normally expected.

"Tricky and expensive" to create such a block? You'd have to create ~270,270,000 outputs, the majority of which are probably non-spendable, requiring *at minimum* 14675.661btc. For an "attack" which at worst requires more than the entire existing block chain history to construct, and adds only 22 minutes to people's resync times. I fail to see why this should be anything but an academic concern.

re: single SHA256, I'd recommend SHA256^2 on the basis of length-extension attacks. Who cares if they matter; why bother analyzing it? Doubling the SHA256 primitive is a well accepted belt-and-suspenders countermeasure, and done already elsewhere in Bitcoin.

A performance factor of 2 is worth a little upfront analysis work. Padding prevents you from being able to construct length-extension attacks - append padding bytes and you no longer have a valid node structure. Length-extension is not an issue, but performance considerations are.

Lots of optimizations sure, but given that Bitcoin is a system where failure to update sufficiently fast is a consensus failure if any of those optimizations have significantly different average case and worst case performance it could lead to a fork; that such optimizations exist isn't a good thing.

I think you mean it could lead to a reorg. "Failure to update sufficiently fast" does not in itself lead to consensus errors. Examples please?


Title: Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”
Post by: maaku on September 26, 2013, 03:06:31 PM
@jtimon, thank you for the reference and kind words. Yes, our “assets” colored-coin proposal is what drew me back to this proposal. If people think SatoshiDice brings a lot of traffic and bloats the block chain, wait until they see what a distributed exchange and transitive credit will do, especially once the block-size limit is relaxed.

I was under the impression coloured coins don't cause blockchain bloat, unlike Mastercoin, and that is one of its main benefits.  Am I wrong? 

I was looser with terminology than I should have been. I meant legitimate bitcoin traffic, not mastercoin-like data and dust bloat. Bitcoin revolutionized payments. Colored coins applies all that same innovation to every other kind of financial contract or property ownership. Of course switching from 1 specific application to infinity general applications will result in more traffic ... lots more traffic would be a bit of an understatement. Just the distributed exchange traffic alone would probably consume more blockchain space than payments, and that's just one application.

When colored coin technology becomes widespread, bitcoin will have to scale. Right now that would push out mobile or embedded devices that already can't keep up with the block chain in a secure manor. This proposal of authenticated indices would make miners and other full nodes do a little extra upfront work for the benefit of lightweight clients that desire to securely work with only the parts of the block chain that concern them.

BTW, we have since published our "user-issued assets" colored coin proposal, under the name Freimarkets:

https://bitcointalk.org/index.php?topic=278671.0


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: Peter Todd on September 26, 2013, 06:05:34 PM
You can't assume that slowing down block propagation is a bad thing for a miner. First of all block propagation is uneven - it'd be quite possible for large pools to have much faster propagation than the more decentralized solo/p2pool miners we want in the network. Remember, if you can reliably get your blocks to just over 50% of the hashing power, you're better off than if they get to more than that because there are more fee paying transactions for you to pick from, and because in the long run you drive difficulty down due to the work your competitors are wasting. Of course reliably getting to 50% isn't easy, but reliably getting to, say, no more than 80% is probably quite doable, especially if efficient UTXO set handling requires semi-custom setups with GPUs to parallelize it that most full-node operators don't have.

Still, you're playing a game that is not in your favor. In this case you are trading a few extra seconds of alone time hashing the next block against the likelihood that someone else will find a competing (and faster validating) block in the mean time. Unless you are approaching 50% hash power, it's not worth it. The faster your blocks get validated, the better off you are.

Read this before making assumptions: https://bitcointalk.org/index.php?topic=144895.0

Under the right circumstances those perverse incentives do exist.

As you say, they would have to make at the very least one set of adjustments for the coinbase which becomes newly active, requiring access to part of the datastructure. This could be facilitated by miners including a proof with the block header that provides just the paths necessary for making the coinbase update. Miners know that any full node will reject a header with improperly constructed commitment hashes, so I fail to see what the issue is.

Shit... your comment just made me realize that with the variant of the perverse block propagation incentives where you simply want to keep other miners from being able to collect transaction fees profitably it's actually in your incentive to propagate whatever information is required to quickly update the UTXO set for the coinbase transaction that becomes spendable in the next block. Other miners will be able to build upon your block - making it less likely to be orphaned - but they won't have the information required to safely add transactions too it until they finish processing the rest of the UTXO updates some time later. Write the code to do this and miners will start using this to keep their profitability up regardless of the consequences down the road.

The full scriptPubKey needs to be stored, or else two lookups would be required - one from the wallet index to check existence, and one from the validation index to actually get the scriptPubKey. That is wasteful. The alternative is key by hash(scriptPubKey) and put scriptPubKey in resulting value: that nearly doubles the size of the index on disk. You'd have to have a really strong reason for doing so, which I don't think the one given is.

BTW, the structure is actually keyed by compress(scriptPubKey), using the same compression mechanism as ultraprune. A standard transaction (pubkey, pubkeyhash, or p2sh) compresses to the minimal number of bytes of either pubkey or hash data. "Attacking" pathways to pubkeyhash or p2sh keys requires multiple hash preimages. I put it in quotes because honestly, what exactly is the purpose of the attack?

Have some imagination: if the attack can cause technical problems with Bitcoin for whatever reason, there will be people and organizations with strong incentives to carry it out. For instance note how you could create a block that's fraudulent, but in a way where proving that it's fraudulent requires a proof that most implementations can't process because of the ridiculously large UTXO path length involved - an excellent way to screw up inflation fraud proofs and get some fake coins created.

Out of curiosity, have you ever worked on any safety critical systems and/or real-time systems? Or even just software where the consequences of it failing had a high financial cost? (prior to your work on Bitcoin that is...)

"Tricky and expensive" to create such a block? You'd have to create ~270,270,000 outputs, the majority of which are probably non-spendable, requiring *at minimum* 14675.661btc. For an "attack" which at worst requires more than the entire existing block chain history to construct, and adds only 22 minutes to people's resync times. I fail to see why this should be anything but an academic concern.

Creating those outputs requires 0 BTC, not 14,675BTC - zero-satoshi outputs are legal. With one satoshi outputs it requires 27BTC. That you used the dust value suggests you don't understand the attack.

As I say, such extreme differences make it very likely that you'll run into implementations - or just different versions - that can't handle such blocks, while others can, leading to dangerous forks. Look at how the March fork happened because of a previously unknown limit on how many database operations could happen in one go in BDB.

re: single SHA256, I'd recommend SHA256^2 on the basis of length-extension attacks. Who cares if they matter; why bother analyzing it? Doubling the SHA256 primitive is a well accepted belt-and-suspenders countermeasure, and done already elsewhere in Bitcoin.

A performance factor of 2 is worth a little upfront analysis work. Padding prevents you from being able to construct length-extension attacks - append padding bytes and you no longer have a valid node structure. Length-extension is not an issue, but performance considerations are.

The performance cost of SHA256^2 compared to SHA256 is a lot less than 2. Heck, I'd bet you 1BTC that even in your existing Python implementation changing from SHA256 to SHA256^2 makes less than a 10% performance difference, and the difference in an optimized C++ version is likely unmeasurable. I really hope you understand why...

Lots of optimizations sure, but given that Bitcoin is a system where failure to update sufficiently fast is a consensus failure if any of those optimizations have significantly different average case and worst case performance it could lead to a fork; that such optimizations exist isn't a good thing.

I think you mean it could lead to a reorg. "Failure to update sufficiently fast" does not in itself lead to consensus errors. Examples please?

A re-org is a failure of consensus, though one that healed itself. Currently re-orgs don't happen too often because the difference in performance between the worst, average, and best performing Bitcoin nodes is small, but if that's not true serious problems develop. As an example when I was testing out the Bloom filter IO attack vulnerability fixed in 0.8.4 I found I was able to get nodes I attacked to fall behind the rest of the network by more than six blocks; an attacker could use such an attack against miners to construct a long-fork to perform double-spend attacks.


FWIW thinking about it further, I realized that your assertion that the UTXO tree can be updated in parallel is incorrect because parallel updates don't allow for succinct fraud proofs of what modifications a given transaction made to the UTXO set; blocks will need to commit to what the top-level UTXO state hash was for every individual transaction processed because without such fine-grained commitments you can't prove that a given UTXO should not have been added to the set without providing all the transactions in the block.


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: maaku on September 26, 2013, 08:07:48 PM
Read this before making assumptions: https://bitcointalk.org/index.php?topic=144895.0

I have read that, and I don't agree. I'm far more inclined to side with Gavin on the issue of block size. Bitcoin will either scale up or become irrelevant. These are the only two viable outcomes I see.

However this is now offtopic, so I'd rather keep discussions of your "keep bitcoin free" campaign outside of this thread.

Have some imagination: if the attack can cause technical problems with Bitcoin for whatever reason, there will be people and organizations with strong incentives to carry it out. For instance note how you could create a block that's fraudulent, but in a way where proving that it's fraudulent requires a proof that most implementations can't process because of the ridiculously large UTXO path length involved - an excellent way to screw up inflation fraud proofs and get some fake coins created.

You are arguing strawmen: You assume that nodes will validate blocks probabilistically, and then show how this will result in possible fraud. The problem lies in the security model you are proposing, namely reliance on fraud proofs instead of full validation.

Out of curiosity, have you ever worked on any safety critical systems and/or real-time systems? Or even just software where the consequences of it failing had a high financial cost? (prior to your work on Bitcoin that is...)

Yes.

Creating those outputs requires 0 BTC, not 14,675BTC - zero-satoshi outputs are legal. With one satoshi outputs it requires 27BTC. That you used the dust value suggests you don't understand the attack.

So now you're assuming that an attacker will be able to successfully mine 10,000 blocks in order setup a one-off 20-minute DoS? I'm pretty sure it'd be cheaper to just buy the 14,675BTC. If someone who wants to destroy or defraud the bitcoin network is able to mine that many blocks in preparation, it's either a *really* long con or Bitcoin is hosed for so many other reasons.

I'm not arguing that it is impossible. I'm arguing that it is not even remotely close to economically plausible, and so far beyond other attack vectors that it's not worth worrying about.

But even if that weren't the case, there are fundamentally simple ways to deal with the issue: introduce a hard limit on the number of index node updates per-block, just as the number of signature operations is currently limited.

As I say, such extreme differences make it very likely that you'll run into implementations - or just different versions - that can't handle such blocks, while others can, leading to dangerous forks. Look at how the March fork happened because of a previously unknown limit on how many database operations could happen in one go in BDB.

That's a wonderfully general argument you can use against anything you don't like. Implementation bugs can exist anywhere. There's no reason to suppose that we're more or less likely to run into them here than somewhere else.

The performance cost of SHA256^2 compared to SHA256 is a lot less than 2. Heck, I'd bet you 1BTC that even in your existing Python implementation changing from SHA256 to SHA256^2 makes less than a 10% performance difference, and the difference in an optimized C++ version is likely unmeasurable. I really hope you understand why...

Enlighten me, please.

EDIT: After thinking about this for some time, the only conclusion I can come to is that you are assuming multiple blocks on the first pass. This is only true of about half of the nodes (most of which are two blocks), the other half are small enough to fit in a single block. While not a performance factor of 2, it's definitely >= 1.5.

A re-org is a failure of consensus, though one that healed itself.

That's not a very useful definition of consensus. Reorgs happen even without malicious intent, and the network is eventually able to reach agreement about older blocks. A real error is when one node rejects a block that other validating nodes accept - no amount of time or work built on the other chain will convince that node to switch. If this proposal is susceptible to that kind of validation error, I'd be interested to know.

Currently re-orgs don't happen too often because the difference in performance between the worst, average, and best performing Bitcoin nodes is small, but if that's not true serious problems develop. As an example when I was testing out the Bloom filter IO attack vulnerability fixed in 0.8.4 I found I was able to get nodes I attacked to fall behind the rest of the network by more than six blocks; an attacker could use such an attack against miners to construct a long-fork to perform double-spend attacks.

If you know a way to do the same with this proposal, which isn't easily solved by processing blocks in parallel, and which doesn't require an additional network DoS vulnerability, please let me know. Until then there are simply too many "what if"s attached.


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: Peter Todd on September 27, 2013, 08:29:29 AM
The performance cost of SHA256^2 compared to SHA256 is a lot less than 2. Heck, I'd bet you 1BTC that even in your existing Python implementation changing from SHA256 to SHA256^2 makes less than a 10% performance difference, and the difference in an optimized C++ version is likely unmeasurable. I really hope you understand why...

Enlighten me, please.

EDIT: After thinking about this for some time, the only conclusion I can come to is that you are assuming multiple blocks on the first pass. This is only true of about half of the nodes (most of which are two blocks), the other half are small enough to fit in a single block. While not a performance factor of 2, it's definitely >= 1.5.

Where do you think the first SHA256() gets the data it's working on? What about the second?


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: Kevlar on October 10, 2013, 05:36:36 PM
What's the latest on the development of this?


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: QuantumQrack on October 26, 2013, 05:47:47 AM
Interested in the status of this as well.


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: maaku on October 27, 2013, 05:47:15 PM
There was a gap in funding in late September, and then I went incommunicado for a few weeks, because of this:

https://s3.amazonaws.com/tmp-6GnLx6CN/1378083_10151977651558738_875717211_s.jpg

Now that we're back home and settled, I'm back to working on authentication trees. Thanks in large part to the generous donation of Armory Technologies, Inc. and a few others, and the recent rise in price, I have the ability to focus on this project for some more months.

Right now I am translating the Python code into a C++ libauthtree implementation, and working on a series of BIPs that describe the trie structure, various generic operations on it, and the specific application to txid indices, address indices, and merged mining. Invictus Innovations is helping with the BIP process due to their interested in using the structure for merged mining headers.

I will be consolidating the recent news into a blog post update soon, hopefully by the end of the week.


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: bg002h on October 28, 2013, 12:40:14 AM
There was a gap in funding in late September, and then I went incommunicado for a few weeks, because of this:

https://s3.amazonaws.com/tmp-6GnLx6CN/1378083_10151977651558738_875717211_s.jpg

Now that we're back home and settled, I'm back to working on authentication trees. Thanks in large part to the generous donation of Armory Technologies, Inc. and a few others, and the recent rise in price, I have the ability to focus on this project for some more months.

Right now I am translating the Python code into a C++ libauthtree implementation, and working on a series of BIPs that describe the trie structure, various generic operations on it, and the specific application to txid indices, address indices, and merged mining. Invictus Innovations is helping with the BIP process due to their interested in using the structure for merged mining headers.

I will be consolidating the recent news into a blog post update soon, hopefully by the end of the week.

Congrats!


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: Peter Todd on October 28, 2013, 01:20:58 AM
There was a gap in funding in late September, and then I went incommunicado for a few weeks, because of this:

https://s3.amazonaws.com/tmp-6GnLx6CN/1378083_10151977651558738_875717211_s.jpg

Congrats!

Sent you a donation for his or her college fund. :)

Now that we're back home and settled, I'm back to working on authentication trees. Thanks in large part to the generous donation of Armory Technologies, Inc. and a few others, and the recent rise in price, I have the ability to focus on this project for some more months.

Right now I am translating the Python code into a C++ libauthtree implementation, and working on a series of BIPs that describe the trie structure, various generic operations on it, and the specific application to txid indices, address indices, and merged mining. Invictus Innovations is helping with the BIP process due to their interested in using the structure for merged mining headers.

I will be consolidating the recent news into a blog post update soon, hopefully by the end of the week.

I should write up a similar document explaining TXO commitments.


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: maaku on October 28, 2013, 08:36:34 AM
Wow, that was very kind and unexpected. Thank you! I forwarded those coins into a cold-storage wallet that will be the start of her education endowment. Hopefully 20 years of deflation will turn it into a sizable chunk of her college education, and maybe even the start of a nest egg.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Education endowment for maaku's daughter:
1Naginiq19q9fdeFLipqFr4TZjjsSdhT3B
-----BEGIN PGP SIGNATURE-----

iQIcBAEBAgAGBQJSbiBOAAoJECsa1YSJj/UD6z4P/3jawSFnpMNt89elZmsm3OaJ
q2GMvC++X8+Y7n/RI5mN7Ubg4E2YLyGS6Xrf5f7mHBMq5FjW5UPL2GV0JDHEN+ni
ILQMvvNhI28Dj1rlZY2KdAYj6WUf50gVFY3Tv3hq3JZHEnoS2r2YPZ4dPH9Ls4zM
i6iQ/LJNs7hZFEiO+EDsPThjepFmRzpaBqi2mpW4BCnFl1gqYJiFUgSKrY2G8YYI
Eqh9Xq1rQft+utYY2gNuc8XtqrscmFmr5dHrWnqPe0cFPWrpG0HqY5QftaqfjMXg
WWZHuTPwul+QVwXOeimoPjdJ9gCi7sXyCZTa9MxvJ5n1QS/ubuHWKx5jUzbSx6m+
iFb+IX+tLYxcOecxfmftM645LQ8WlCJeY0p+EoskNpkdH4q63Sk1Nyq9dJMuQK+2
+MaFbB0Pv3AMk/Qoi/qrCF9F1BVD/CYK72uI8t+7r8iDvQ97iK5lGfLvCxR89Ev3
X3uCiJOpUctMozELjI7w3ga3xRah6mufKU2HgNRqkZAaphp7Cf5v34NwnSeaf9LM
A2tdBo8N86NVpjeyLz5U8TRnSrUcPDQPR4snz+IYF1sCKsRqYz9rNy1WhaIO7j6x
2A6nTnzvLUnenYLusu4hAIJ5XAlQMiGDL6sdALL2Z3or7mlx3IPz5elvn2FRd1fh
xkvdfNXeOkmrFd4x6m6h
=fzcm
-----END PGP SIGNATURE-----


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: Lauda on November 30, 2013, 02:20:36 PM
So how far is this? Still under development?


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: tubbyjr on December 11, 2013, 09:08:57 PM
Any news on this yet or you got enough BTC already?


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: dansmith on January 17, 2014, 01:34:20 PM
I am therefore requesting a total of 195btc to finish the implementation as described above. If this is received by the end of the month, then I can promise the deliverable by the end of the year.

maaku, was there a deliverable as per your promise?


Title: Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”
Post by: maaku on January 18, 2014, 01:01:17 AM
Oy, there is something wrong as I have not been receiving email updates about this thread. My apologies to everyone.

So an update on the current status:

A BIP describing the authenticated prefix tree resulting from my research has been published to the developer mailing list, and is available for view online (https://github.com/maaku/bips/blob/master/drafts/auth-trie.mediawiki). The feedback I got from that is being worked into a 2nd version of the proposal which is nearing completion (I'm currently adding descriptions of the peer-to-peer messages and JSON-RPC APIs, then it is finished).

I also have about 80% written of a BIP describing the validation index, the outline of one describing the wallet index, and an assortment of notes towards two future BIPs related to document timestamping and merged mining improvements. These are the first deliverables, as once developer consensus is achieved other people can start working on implementations of these various features.

Besides the python implementation hosted on Github, there's a C++ implementation suitable for inclusion in the reference client that is about half finished. It could use a lot more unit tests.

However since funding mostly dried up after Armory's very generous donation some months back, I have been working on this proposal only part time (1-2 days per week). The rest of my time is spent on various other funded projects including CoinJoin and preparation for blockchain pruning. Work has not stopped, just slowed down.

@dansmith, unfortunately I think there's some confusion. I did not switch addresses between the first and the second funding rounds, so most of the money received at that address was for the first round. (And, while it sounds like a lot, bitcoin was worth an order of magnitude less back then and it was mostly cashed out upon receipt. Too bad I don't have a knack for predicting markets :\ ) I was unable to finish out the round, so I've been forced to take on other (thankfully related) work as well which has unfortunately slowed things down.


Title: Re: [Fundraising] Finish “Ultimate blockchain compression”
Post by: escrow.ms on June 18, 2014, 04:05:16 PM
Any update on this?


Title: Re: [Fundraising] Finish “Ultimate blockchain compression”
Post by: maaku on June 18, 2014, 09:41:38 PM
Here is something I just posted on a related pull request last night:

Quote
The UTXO commitments I'm working on are currently developer-time limited. There's a working Python implementation and complete test vectors, as well as an optimized commitment mechanism (#3977, which needs a rebase), and C++ serialization code for the proofs. All we need is the C++ code for updating proofs, and the LevelDB backend. So if there are other bitcoind hackers out there interested in doing this The Right Way, contact me. However it requires a soft-fork, so rolling it out necessitates some degree of developer consensus, community eduction, and miner voting process (or the beneficence of ghash.io...), all of which together requires as much as a year if past experience is a judge.


Title: Re: [Fundraising] Finish “Ultimate blockchain compression”
Post by: Sukrim on June 18, 2014, 10:05:45 PM
So you went away from using available fields e.g. in coinbase and going for a forking solution?

What about the new open data fields for transactions? Any way to fit a UTXO hash in there and come to some kind of consensus e.g. via Proof of Stake?


Title: Re: [Fundraising] Finish “Ultimate blockchain compression”
Post by: maaku on June 18, 2014, 10:16:13 PM
UTXO commitments have always required a soft-fork, whether they are added to the coinbase or separate commitment transactions.

What open data fields in transactions? Do you mean OP_RETURN? If so, see my pull request for a soft-fork change to enable this kind of miner commitments with compact proofs:

https://github.com/bitcoin/bitcoin/pull/3977

I would not put proof of stake anywhere near a consensus system. Not with current designs at least.