[Fundraising] Finish “Ultimate blockchain compression”

etotheipi

Legendary

Offline

Activity: 1428
Merit: 1093

Core Armory Developer

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

September 23, 2013, 05:33:02 AM
Last edit: September 23, 2013, 04:00:16 PM by etotheipi

#41

As you all have probably heard by now, Armory just got a huge injection of funds. Obviously, we didn't get the money to just "give it away", but I see that what Mark is doing is a huge benefit to Bitcoin, even if it ultimately doesn't work. And if it does work, Armory would use it pretty much immediately. As such, I think it make sense for Armory to commit some funding to this project. I really want to see this idea explored, for the benefit of Bitcoin (not just because I originally came up with it

).

Armory Technologies, Inc. is pledging 50 BTC for this. The only catch is that we don't actually have 50 BTC available right now (only USD), but we should have it available some time next week and I will send it then. Please email/PM me to pester me about it in a week if you don't see it.

Looking forward to seeing what you can do with another few months!

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here! (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)

greBit

Hero Member

Offline

Activity: 714
Merit: 500

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

September 24, 2013, 01:06:50 PM
Last edit: September 24, 2013, 01:31:24 PM by greBit

#42

This is looking extremely promising

Is there some document somewhere that explains the key contributions that have been made? I am basically looking for a TL;DR

i.e.

what level of compression can you achieve
and at what performance cost?

And perhaps a doc explaining the solution? Pseudocode of the algorithms / discussion of your data structures would be great

thanks

maaku (OP)

Legendary

Offline

Activity: 905
Merit: 1011

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

September 25, 2013, 07:10:41 PM

#43

Re: compression ratios, the naming of this project is unfortunate. When originally proposed, @sipa's ultraprune branch was still being implemented. Ultraprune does basically the same "compression," in the form of maintaining a validation index that enables blockchain data to be pruned. This proposal adds no further compression beyond that, and in fact as proposed doubles disk requirements (from about 250MB for ultraprune, to 500MB for this proposal, currently) because it maintains two authenticated UTXO indices: one for validation, one for wallet operations. (Only the former is in ultraprune, in unauthenticated form.)

I have suggested the title “Authenticated index structures for secure lightweight operation of non-mining bitcoin nodes” which much better describes the benefits. I much prefer this to “Ultimate blockchain compression,” but unfortunately the latter is the name it is known by in this community.

Re: performance cost, I have extrapolations. The current code was written for the purpose of evaluating alternative designs and testing asymptotic performance. Through profiling and extrapolation I have a good idea of what final performance numbers will be, so please don't be shocked when you see the current numbers.

The current Python implementation (available here) can take up to 10^3 seconds to index a full 1MB block, using PostgreSQL as a storage backend. A full 95% of that time is spent in the SQL database performing between 10^4 - 10^5 uncached query operations. Switching to LevelDB or similar nosql datastore and adding caching should reduce time spent in the database by nearly two orders of magnitude. Furthermore, 80% of the remaining time is spent marshaling blockchain and index data into Python objects. Even within Python this could be optimized by using buffer objects or a C extension. Reasonable extrapolation shows that updates on the order of 10 seconds should be possible for a straightforward implementation of the current design in C++ using LevelDB as a storage backend.

You can use asymptotic analysis to look at the bare minimum work that must be done to update an index structure, and you end up with a further order of magnitude reduction: If you assume about 10,000 inputs+outputs for a full block - the far upper end of what is possible - then that is about 200,000 updates to the index structure in the worst case. LevelDB could handle this in under a second on a beefy machine. The real performance limiter is the SHA-2 hashes required for each node update, but hardware SHA-256 -- such as the new Intel CPU extensions, or even a GPU accelerated implementation -- should make the database the bottleneck again.

Sub-second updates for a 1MB block is then the long-term performance goal. Reaching it might require some hardware assumptions or follow-on work. But if this project can at least reach the achievable near-term milestone of less than 10 seconds (for a full block, worst case scenario), then performance would probably be adequate for inclusion in the reference client.

(Note that a very simple optimization prevents this from being an onerous requirement for miners: cache the indices for last generated block. There is likely to be a lot of similarity between two consecutively generated blocks (being the same high priority transactions from the same mempool), and all the miner would have to calculate is the differences. Nonce-rolling is not a problem as inclusion of the coinbase in the index is delayed by a number of blocks anyway.)

Re: a technical document / whitepaper, the folks at Invictus Innovations have generously offered to support the writeup of a series of BIPs explaining the data structure and a few example use cases - they are particularly interested in using it for merged mining of BitShares, using a scheme that I developed for Freicoin's eventual merged mining support. This is still being sorted out, but hopefully there will be technical document soon, and the associated discussion from other Bitcoin developers.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP

Peter Todd

Legendary

Offline

Activity: 1120
Merit: 1150

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

September 25, 2013, 08:41:15 PM

#44

Quote from: maaku on September 25, 2013, 07:10:41 PM

(Note that a very simple optimization prevents this from being an onerous requirement for miners: cache the indices for last generated block. There is likely to be a lot of similarity between two consecutively generated blocks (being the same high priority transactions from the same mempool), and all the miner would have to calculate is the differences. Nonce-rolling is not a problem as inclusion of the coinbase in the index is delayed by a number of blocks anyway.)

What's the worst case vs. best case vs. avg case performance of the UTXO updating process?

Remember that we can't make it possible for a miner to intentionally make a block whose UTXO set modifications take an especially long amount of time to process to get an edge on the competition.

BTC: 1FCYd7j4CThTMzts78rh6iQJLBRGPW9fWv PGP: 7FAB114267E4FA04

maaku (OP)

Legendary

Offline

Activity: 905
Merit: 1011

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

September 25, 2013, 10:53:21 PM

#45

To be clear, I'm not sure that example would have the effect you're imagining. Creating a purposefully time-wasting block to validate would hurt, not help the miner. But presumably you meant broadcasting DoS transactions which, if they got in other miner's blocks, would slow down their block propogations. That's possible but not economical for reasons that I'll explain in a minute.

But stepping back from that example, block propagation is why sub-second index construction/validation would be ideal. It places index operations in the same ballpark as the other verification tasks, meaning that it won't significantly affect the time it takes for blocks to propagate through the network. If index construction is done in parallel (using GPU acceleration, for example), it might not have any effect at all.

Rather than just quote numbers I'll explain where they come from:

There are two indices: the validating index containing unspent transactions keyed by <txid>, and the wallet index containing unspent outputs keyed by <scriptPubKey, txid, index>.

Let T be the number of transactions with at least one unspent output, N the total number of unspent outputs, X the number of transactions in a new block, I the number of inputs, and O the number of outputs.

The number of hash operations required to perform an insert, delete, or update to the indices is 1, log(T) or log(N), and len(key) for the best, average, and worst case respectfully. Since the keys of both indices involve hashed values (txid, and typically also the content of scriptPubKey), creating artificially high pathways through the index structure is a variant of the hash pre-image problem - you are essentially constructing transactions whose hash and output scriptPubKeys have certain values so as to occupy and branch specific nodes in the index structure, for the entire length of a given key. This is computationally expensive to do, and actually growing the pathway requires getting the transactions into the UTXO set, which requires some level of burnt BTC to avoid anti-dust protections. So it is not economically viable to construct anything more than a marginally worse pathway; the average case really is the only case worth considering.

For the average case, the number of index update operations are:

validating: (X+I) * log(T)
wallet: (I+O) * log(N)

Currently, these parameters have the following values:

T = 2,000,000
N = 7,000,000
X = 7,000 (max)
I = 10,000 (max)
O = 15,000 (max)

Those are approximate maximum, worst case values for X, I, and O given a maximum block size of 1MB. You certainly couldn't reach all three maximums simultaneously. If you did though, it'd be about 1,000,000 hash operations (half a "megahash," since we're talking single SHA-2), and more importantly 1,000,000 updates to the leveldb structure, which is about 4-5 seconds on a fast disk. My earlier calculation of a max 200,000 updates (computable in about 0.8s) was based on actual Bitcoin block data, and is somewhat more realistic as a worst case.

EDIT: And again, even if a call to 'getblocktemplate' takes multiple seconds to complete, the *next* call is likely to include mostly the same transactions again, with just a few additions or deletions. So successive getblocktemplate calls should take just milliseconds to compute by basing the index on the last returned tree.

Conceivably, when receiving and validating a block it's also possible to start with a pre-generated index based off the last block + mempool contents, and simply undo transactions that were not included. Assuming, of course, that the mempool is not getting backlogged, in which case it would make more sense to start with the last block. If you collect partial proof-of-works, then you could pre-compute updates from transactions which appear in many of the partial blocks. There's a large design space here to explore, in terms of optimization.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP

greBit

Hero Member

Offline

Activity: 714
Merit: 500

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

September 25, 2013, 11:09:15 PM

#46

Thanks. This is the kind of information I was after.

Quote from: maaku on September 25, 2013, 10:53:21 PM

Those are approximate maximum, worst case values for X, I, and O given a maximum block size of 1MB. You certainly couldn't reach all three maximums simultaneously. If you did though, it'd be about 1,000,000 hash operations (half a "megahash," since we're talking single SHA-2), and more importantly 1,000,000 updates to the leveldb structure, which is about 4-5 seconds on a fast disk. My earlier calculation of a max 200,000 updates (computable in about 0.8s) was based on actual Bitcoin block data, and is somewhat more realistic as a worst case.

How big is the data structure that is being manipulated? How does it grow with block count, based upon your best/worst/avg cases?

I guess I could infer that its probably waay too big to fit in RAM since you suggest performance depends on disk speed

maaku (OP)

Legendary

Offline

Activity: 905
Merit: 1011

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

September 26, 2013, 12:32:07 AM

#47

This structure is a variant of the current ultraprune index used for block validation in the 0.8 client, which consumes about 250MB of data (you can check this with the 'gettxoutsetinfo' RPC). My work uses these data as the leaf nodes of a binary tree, adding about 2 x 10^6 internal nodes to the validation index, for example. This adds about 120MB, currently. The wallet index is a little bit larger, so I would expect the total to be somewhere around 1GB, as of the current block height. This is small enough to fit in RAM, but needs to be persisted to disk as it is an expensive to recreate, hence the dependence on disk speed.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP

Peter Todd

Legendary

Offline

Activity: 1120
Merit: 1150

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

September 26, 2013, 11:42:54 AM

#48

Quote from: maaku on September 25, 2013, 10:53:21 PM

To be clear, I'm not sure that example would have the effect you're imagining. Creating a purposefully time-wasting block to validate would hurt, not help the miner. But presumably you meant broadcasting DoS transactions which, if they got in other miner's blocks, would slow down their block propogations. That's possible but not economical for reasons that I'll explain in a minute.

You can't assume that slowing down block propagation is a bad thing for a miner. First of all block propagation is uneven - it'd be quite possible for large pools to have much faster propagation than the more decentralized solo/p2pool miners we want in the network. Remember, if you can reliably get your blocks to just over 50% of the hashing power, you're better off than if they get to more than that because there are more fee paying transactions for you to pick from, and because in the long run you drive difficulty down due to the work your competitors are wasting. Of course reliably getting to 50% isn't easy, but reliably getting to, say, no more than 80% is probably quite doable, especially if efficient UTXO set handling requires semi-custom setups with GPUs to parallelize it that most full-node operators don't have.

Secondly if block propagation does become an issue miners will switch to mining empty or near empty blocks building on top of the most recent blockheader that they know about until they finally get around to receiving the block in full as a way to preserve their income. Unfortunately even with UTXO commits they can still do that - just re-use the still valid UTXO commitment from the previous block.(1) Obviously this has serious issues with regard to consensus... which is why I say over and over again that any UTXO commitment scheme needs to also have UTXO possession proofs as a part of it to make sure miners can't do that.

1) Modulo the fact that the UTXO set must change because a previously unspendable coinbase output became spendable; if we don't do explicit UTXO possession proofs we should think very carefully about whether or not a miner (or group of miners) can extract the info required to calculate the delta to the UTXO set that the coinbase created more efficiently than just verifying the block honestly. I'm not sure yet one way or another if this is in fact true.

Quote from: maaku on September 25, 2013, 10:53:21 PM

But stepping back from that example, block propagation is why sub-second index construction/validation would be ideal. It places index operations in the same ballpark as the other verification tasks, meaning that it won't significantly affect the time it takes for blocks to propagate through the network. If index construction is done in parallel (using GPU acceleration, for example), it might not have any effect at all.

Rather than just quote numbers I'll explain where they come from:

There are two indices: the validating index containing unspent transactions keyed by <txid>, and the wallet index containing unspent outputs keyed by <scriptPubKey, txid, index>.

Let T be the number of transactions with at least one unspent output, N the total number of unspent outputs, X the number of transactions in a new block, I the number of inputs, and O the number of outputs.

The number of hash operations required to perform an insert, delete, or update to the indices is 1, log(T) or log(N), and len(key) for the best, average, and worst case respectfully. Since the keys of both indices involve hashed values (txid, and typically also the content of scriptPubKey), creating artificially high pathways through the index structure is a variant of the hash pre-image problem - you are essentially constructing transactions whose hash and output scriptPubKeys have certain values so as to occupy and branch specific nodes in the index structure, for the entire length of a given key. This is computationally expensive to do, and actually growing the pathway requires getting the transactions into the UTXO set, which requires some level of burnt BTC to avoid anti-dust protections. So it is not economically viable to construct anything more than a marginally worse pathway; the average case really is the only case worth considering.

re: content of the scriptPubKey, change your algorithm to index by H(scriptPubKey) please - a major part of the value of UTXO commitments is that they allow for compact fraud proofs. If your tree is structured by scriptPubKey the average case and worst case size of a UTXO proof differs by 454x; it's bad engineering that will bite us when someone decides to attack it. In any case the majority of core developers are of the mindset that in the future we should push everything to P2SH addresses anyway; making the UTXO set indexable by crap like "tags" - the rational for using the scriptPubKey directly - just invites people to come up with more ways of stuffing data into the blockchain/UTXO set.

Quote from: maaku on September 25, 2013, 10:53:21 PM

For the average case, the number of index update operations are:

validating: (X+I) * log(T)
wallet: (I+O) * log(N)

Currently, these parameters have the following values:

T = 2,000,000
N = 7,000,000
X = 7,000 (max)
I = 10,000 (max)
O = 15,000 (max)

Those are approximate maximum, worst case values for X, I, and O given a maximum block size of 1MB. You certainly couldn't reach all three maximums simultaneously. If you did though, it'd be about 1,000,000 hash operations (half a "megahash," since we're talking single SHA-2), and more importantly 1,000,000 updates to the leveldb structure, which is about 4-5 seconds on a fast disk. My earlier calculation of a max 200,000 updates (computable in about 0.8s) was based on actual Bitcoin block data, and is somewhat more realistic as a worst case.

Worst case would be if a miner creates a block filled with transactions spending previously created 10,000 byte scriptPubKeys and creating no new ones. A minimal CTxIn is 37 bytes, so you'd get roughly 1,000,000/37=27,027 txouts spent. Each txout requires on the order of 10,000 hash operations to update it, so you've actually got an upper limit of ~270,270,000 updates, or 22 minutes by your metric... There's a good chance something else would break first, somewhere, which just goes to show that indexing by scriptPubKey directly is a bad idea. Sure it'd be tricky and expensive to create such a monster block, but it is possible and it leads to results literally orders of magnitude different than what is normally expected.

re: single SHA256, I'd recommend SHA256^2 on the basis of length-extension attacks. Who cares if they matter; why bother analyzing it? Doubling the SHA256 primitive is a well accepted belt-and-suspenders countermeasure, and done already elsewhere in Bitcoin.

Quote from: maaku on September 25, 2013, 10:53:21 PM

EDIT: And again, even if a call to 'getblocktemplate' takes multiple seconds to complete, the *next* call is likely to include mostly the same transactions again, with just a few additions or deletions. So successive getblocktemplate calls should take just milliseconds to compute by basing the index on the last returned tree.

Conceivably, when receiving and validating a block it's also possible to start with a pre-generated index based off the last block + mempool contents, and simply undo transactions that were not included. Assuming, of course, that the mempool is not getting backlogged, in which case it would make more sense to start with the last block. If you collect partial proof-of-works, then you could pre-compute updates from transactions which appear in many of the partial blocks. There's a large design space here to explore, in terms of optimization.

Lots of optimizations sure, but given that Bitcoin is a system where failure to update sufficiently fast is a consensus failure if any of those optimizations have significantly different average case and worst case performance it could lead to a fork; that such optimizations exist isn't a good thing.

BTC: 1FCYd7j4CThTMzts78rh6iQJLBRGPW9fWv PGP: 7FAB114267E4FA04

matt608

Hero Member

Offline

Activity: 882
Merit: 1000

Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”

September 26, 2013, 01:57:14 PM

#49

Quote from: maaku on May 14, 2013, 06:39:06 PM

@jtimon, thank you for the reference and kind words. Yes, our “assets” colored-coin proposal is what drew me back to this proposal. If people think SatoshiDice brings a lot of traffic and bloats the block chain, wait until they see what a distributed exchange and transitive credit will do, especially once the block-size limit is relaxed.

I was under the impression coloured coins don't cause blockchain bloat, unlike Mastercoin, and that is one of its main benefits. Am I wrong?

maaku (OP)

Legendary

Offline

Activity: 905
Merit: 1011

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

September 26, 2013, 02:13:44 PM

#50

Quote from: retep on September 26, 2013, 11:42:54 AM

You can't assume that slowing down block propagation is a bad thing for a miner. First of all block propagation is uneven - it'd be quite possible for large pools to have much faster propagation than the more decentralized solo/p2pool miners we want in the network. Remember, if you can reliably get your blocks to just over 50% of the hashing power, you're better off than if they get to more than that because there are more fee paying transactions for you to pick from, and because in the long run you drive difficulty down due to the work your competitors are wasting. Of course reliably getting to 50% isn't easy, but reliably getting to, say, no more than 80% is probably quite doable, especially if efficient UTXO set handling requires semi-custom setups with GPUs to parallelize it that most full-node operators don't have.

Still, you're playing a game that is not in your favor. In this case you are trading a few extra seconds of alone time hashing the next block against the likelihood that someone else will find a competing (and faster validating) block in the mean time. Unless you are approaching 50% hash power, it's not worth it. The faster your blocks get validated, the better off you are.

Quote from: retep on September 26, 2013, 11:42:54 AM

Secondly if block propagation does become an issue miners will switch to mining empty or near empty blocks building on top of the most recent blockheader that they know about until they finally get around to receiving the block in full as a way to preserve their income. Unfortunately even with UTXO commits they can still do that - just re-use the still valid UTXO commitment from the previous block.(1) Obviously this has serious issues with regard to consensus... which is why I say over and over again that any UTXO commitment scheme needs to also have UTXO possession proofs as a part of it to make sure miners can't do that.

1) Modulo the fact that the UTXO set must change because a previously unspendable coinbase output became spendable; if we don't do explicit UTXO possession proofs we should think very carefully about whether or not a miner (or group of miners) can extract the info required to calculate the delta to the UTXO set that the coinbase created more efficiently than just verifying the block honestly. I'm not sure yet one way or another if this is in fact true.

As you say, they would have to make at the very least one set of adjustments for the coinbase which becomes newly active, requiring access to part of the datastructure. This could be facilitated by miners including a proof with the block header that provides just the paths necessary for making the coinbase update. Miners know that any full node will reject a header with improperly constructed commitment hashes, so I fail to see what the issue is.

Quote from: retep on September 26, 2013, 11:42:54 AM

re: content of the scriptPubKey, change your algorithm to index by H(scriptPubKey) please - a major part of the value of UTXO commitments is that they allow for compact fraud proofs. If your tree is structured by scriptPubKey the average case and worst case size of a UTXO proof differs by 454x; it's bad engineering that will bite us when someone decides to attack it. In any case the majority of core developers are of the mindset that in the future we should push everything to P2SH addresses anyway; making the UTXO set indexable by crap like "tags" - the rational for using the scriptPubKey directly - just invites people to come up with more ways of stuffing data into the blockchain/UTXO set.

The full scriptPubKey needs to be stored, or else two lookups would be required - one from the wallet index to check existence, and one from the validation index to actually get the scriptPubKey. That is wasteful. The alternative is key by hash(scriptPubKey) and put scriptPubKey in resulting value: that nearly doubles the size of the index on disk. You'd have to have a really strong reason for doing so, which I don't think the one given is.

BTW, the structure is actually keyed by compress(scriptPubKey), using the same compression mechanism as ultraprune. A standard transaction (pubkey, pubkeyhash, or p2sh) compresses to the minimal number of bytes of either pubkey or hash data. "Attacking" pathways to pubkeyhash or p2sh keys requires multiple hash preimages. I put it in quotes because honestly, what exactly is the purpose of the attack?

Quote from: retep on September 26, 2013, 11:42:54 AM

Worst case would be if a miner creates a block filled with transactions spending previously created 10,000 byte scriptPubKeys and creating no new ones. A minimal CTxIn is 37 bytes, so you'd get roughly 1,000,000/37=27,027 txouts spent. Each txout requires on the order of 10,000 hash operations to update it, so you've actually got an upper limit of ~270,270,000 updates, or 22 minutes by your metric... There's a good chance something else would break first, somewhere, which just goes to show that indexing by scriptPubKey directly is a bad idea. Sure it'd be tricky and expensive to create such a monster block, but it is possible and it leads to results literally orders of magnitude different than what is normally expected.

"Tricky and expensive" to create such a block? You'd have to create ~270,270,000 outputs, the majority of which are probably non-spendable, requiring *at minimum* 14675.661btc. For an "attack" which at worst requires more than the entire existing block chain history to construct, and adds only 22 minutes to people's resync times. I fail to see why this should be anything but an academic concern.

Quote from: retep on September 26, 2013, 11:42:54 AM

re: single SHA256, I'd recommend SHA256^2 on the basis of length-extension attacks. Who cares if they matter; why bother analyzing it? Doubling the SHA256 primitive is a well accepted belt-and-suspenders countermeasure, and done already elsewhere in Bitcoin.

A performance factor of 2 is worth a little upfront analysis work. Padding prevents you from being able to construct length-extension attacks - append padding bytes and you no longer have a valid node structure. Length-extension is not an issue, but performance considerations are.

Quote from: retep on September 26, 2013, 11:42:54 AM

Lots of optimizations sure, but given that Bitcoin is a system where failure to update sufficiently fast is a consensus failure if any of those optimizations have significantly different average case and worst case performance it could lead to a fork; that such optimizations exist isn't a good thing.

I think you mean it could lead to a reorg. "Failure to update sufficiently fast" does not in itself lead to consensus errors. Examples please?

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP

maaku (OP)

Legendary

Offline

Activity: 905
Merit: 1011

Re: Implementing “Ultimate blockchain compression & trust-free lite nodes”

September 26, 2013, 03:06:31 PM

#51

Quote from: matt608 on September 26, 2013, 01:57:14 PM

Quote from: maaku on May 14, 2013, 06:39:06 PM

@jtimon, thank you for the reference and kind words. Yes, our “assets” colored-coin proposal is what drew me back to this proposal. If people think SatoshiDice brings a lot of traffic and bloats the block chain, wait until they see what a distributed exchange and transitive credit will do, especially once the block-size limit is relaxed.

I was under the impression coloured coins don't cause blockchain bloat, unlike Mastercoin, and that is one of its main benefits. Am I wrong?

I was looser with terminology than I should have been. I meant legitimate bitcoin traffic, not mastercoin-like data and dust bloat. Bitcoin revolutionized payments. Colored coins applies all that same innovation to every other kind of financial contract or property ownership. Of course switching from 1 specific application to infinity general applications will result in more traffic ... lots more traffic would be a bit of an understatement. Just the distributed exchange traffic alone would probably consume more blockchain space than payments, and that's just one application.

When colored coin technology becomes widespread, bitcoin will have to scale. Right now that would push out mobile or embedded devices that already can't keep up with the block chain in a secure manor. This proposal of authenticated indices would make miners and other full nodes do a little extra upfront work for the benefit of lightweight clients that desire to securely work with only the parts of the block chain that concern them.

BTW, we have since published our "user-issued assets" colored coin proposal, under the name Freimarkets:

https://bitcointalk.org/index.php?topic=278671.0

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP

Peter Todd

Legendary

Offline

Activity: 1120
Merit: 1150

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

September 26, 2013, 06:05:34 PM

#52

Quote from: maaku on September 26, 2013, 02:13:44 PM

Quote from: retep on September 26, 2013, 11:42:54 AM

You can't assume that slowing down block propagation is a bad thing for a miner. First of all block propagation is uneven - it'd be quite possible for large pools to have much faster propagation than the more decentralized solo/p2pool miners we want in the network. Remember, if you can reliably get your blocks to just over 50% of the hashing power, you're better off than if they get to more than that because there are more fee paying transactions for you to pick from, and because in the long run you drive difficulty down due to the work your competitors are wasting. Of course reliably getting to 50% isn't easy, but reliably getting to, say, no more than 80% is probably quite doable, especially if efficient UTXO set handling requires semi-custom setups with GPUs to parallelize it that most full-node operators don't have.

Still, you're playing a game that is not in your favor. In this case you are trading a few extra seconds of alone time hashing the next block against the likelihood that someone else will find a competing (and faster validating) block in the mean time. Unless you are approaching 50% hash power, it's not worth it. The faster your blocks get validated, the better off you are.

Read this before making assumptions: https://bitcointalk.org/index.php?topic=144895.0

Under the right circumstances those perverse incentives do exist.

Quote from: maaku on September 26, 2013, 02:13:44 PM

As you say, they would have to make at the very least one set of adjustments for the coinbase which becomes newly active, requiring access to part of the datastructure. This could be facilitated by miners including a proof with the block header that provides just the paths necessary for making the coinbase update. Miners know that any full node will reject a header with improperly constructed commitment hashes, so I fail to see what the issue is.

Shit... your comment just made me realize that with the variant of the perverse block propagation incentives where you simply want to keep other miners from being able to collect transaction fees profitably it's actually in your incentive to propagate whatever information is required to quickly update the UTXO set for the coinbase transaction that becomes spendable in the next block. Other miners will be able to build upon your block - making it less likely to be orphaned - but they won't have the information required to safely add transactions too it until they finish processing the rest of the UTXO updates some time later. Write the code to do this and miners will start using this to keep their profitability up regardless of the consequences down the road.

Quote from: maaku on September 26, 2013, 02:13:44 PM

The full scriptPubKey needs to be stored, or else two lookups would be required - one from the wallet index to check existence, and one from the validation index to actually get the scriptPubKey. That is wasteful. The alternative is key by hash(scriptPubKey) and put scriptPubKey in resulting value: that nearly doubles the size of the index on disk. You'd have to have a really strong reason for doing so, which I don't think the one given is.

BTW, the structure is actually keyed by compress(scriptPubKey), using the same compression mechanism as ultraprune. A standard transaction (pubkey, pubkeyhash, or p2sh) compresses to the minimal number of bytes of either pubkey or hash data. "Attacking" pathways to pubkeyhash or p2sh keys requires multiple hash preimages. I put it in quotes because honestly, what exactly is the purpose of the attack?

Have some imagination: if the attack can cause technical problems with Bitcoin for whatever reason, there will be people and organizations with strong incentives to carry it out. For instance note how you could create a block that's fraudulent, but in a way where proving that it's fraudulent requires a proof that most implementations can't process because of the ridiculously large UTXO path length involved - an excellent way to screw up inflation fraud proofs and get some fake coins created.

Out of curiosity, have you ever worked on any safety critical systems and/or real-time systems? Or even just software where the consequences of it failing had a high financial cost? (prior to your work on Bitcoin that is...)

Quote from: maaku on September 26, 2013, 02:13:44 PM

"Tricky and expensive" to create such a block? You'd have to create ~270,270,000 outputs, the majority of which are probably non-spendable, requiring *at minimum* 14675.661btc. For an "attack" which at worst requires more than the entire existing block chain history to construct, and adds only 22 minutes to people's resync times. I fail to see why this should be anything but an academic concern.

Creating those outputs requires 0 BTC, not 14,675BTC - zero-satoshi outputs are legal. With one satoshi outputs it requires 27BTC. That you used the dust value suggests you don't understand the attack.

As I say, such extreme differences make it very likely that you'll run into implementations - or just different versions - that can't handle such blocks, while others can, leading to dangerous forks. Look at how the March fork happened because of a previously unknown limit on how many database operations could happen in one go in BDB.

Quote from: maaku on September 26, 2013, 02:13:44 PM

Quote from: retep on September 26, 2013, 11:42:54 AM

re: single SHA256, I'd recommend SHA256^2 on the basis of length-extension attacks. Who cares if they matter; why bother analyzing it? Doubling the SHA256 primitive is a well accepted belt-and-suspenders countermeasure, and done already elsewhere in Bitcoin.

A performance factor of 2 is worth a little upfront analysis work. Padding prevents you from being able to construct length-extension attacks - append padding bytes and you no longer have a valid node structure. Length-extension is not an issue, but performance considerations are.

The performance cost of SHA256^2 compared to SHA256 is a lot less than 2. Heck, I'd bet you 1BTC that even in your existing Python implementation changing from SHA256 to SHA256^2 makes less than a 10% performance difference, and the difference in an optimized C++ version is likely unmeasurable. I really hope you understand why...

Quote from: maaku on September 26, 2013, 02:13:44 PM

Quote from: retep on September 26, 2013, 11:42:54 AM

Lots of optimizations sure, but given that Bitcoin is a system where failure to update sufficiently fast is a consensus failure if any of those optimizations have significantly different average case and worst case performance it could lead to a fork; that such optimizations exist isn't a good thing.

I think you mean it could lead to a reorg. "Failure to update sufficiently fast" does not in itself lead to consensus errors. Examples please?

A re-org is a failure of consensus, though one that healed itself. Currently re-orgs don't happen too often because the difference in performance between the worst, average, and best performing Bitcoin nodes is small, but if that's not true serious problems develop. As an example when I was testing out the Bloom filter IO attack vulnerability fixed in 0.8.4 I found I was able to get nodes I attacked to fall behind the rest of the network by more than six blocks; an attacker could use such an attack against miners to construct a long-fork to perform double-spend attacks.

FWIW thinking about it further, I realized that your assertion that the UTXO tree can be updated in parallel is incorrect because parallel updates don't allow for succinct fraud proofs of what modifications a given transaction made to the UTXO set; blocks will need to commit to what the top-level UTXO state hash was for every individual transaction processed because without such fine-grained commitments you can't prove that a given UTXO should not have been added to the set without providing all the transactions in the block.

BTC: 1FCYd7j4CThTMzts78rh6iQJLBRGPW9fWv PGP: 7FAB114267E4FA04

maaku (OP)

Legendary

Offline

Activity: 905
Merit: 1011

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

September 26, 2013, 08:07:48 PM
Last edit: September 27, 2013, 12:13:54 AM by maaku

#53

Quote from: retep on September 26, 2013, 06:05:34 PM

Read this before making assumptions: https://bitcointalk.org/index.php?topic=144895.0

I have read that, and I don't agree. I'm far more inclined to side with Gavin on the issue of block size. Bitcoin will either scale up or become irrelevant. These are the only two viable outcomes I see.

However this is now offtopic, so I'd rather keep discussions of your "keep bitcoin free" campaign outside of this thread.

Quote from: retep on September 26, 2013, 06:05:34 PM

Have some imagination: if the attack can cause technical problems with Bitcoin for whatever reason, there will be people and organizations with strong incentives to carry it out. For instance note how you could create a block that's fraudulent, but in a way where proving that it's fraudulent requires a proof that most implementations can't process because of the ridiculously large UTXO path length involved - an excellent way to screw up inflation fraud proofs and get some fake coins created.

You are arguing strawmen: You assume that nodes will validate blocks probabilistically, and then show how this will result in possible fraud. The problem lies in the security model you are proposing, namely reliance on fraud proofs instead of full validation.

Quote from: retep on September 26, 2013, 06:05:34 PM

Out of curiosity, have you ever worked on any safety critical systems and/or real-time systems? Or even just software where the consequences of it failing had a high financial cost? (prior to your work on Bitcoin that is...)

Yes.

Quote from: retep on September 26, 2013, 06:05:34 PM

Creating those outputs requires 0 BTC, not 14,675BTC - zero-satoshi outputs are legal. With one satoshi outputs it requires 27BTC. That you used the dust value suggests you don't understand the attack.

So now you're assuming that an attacker will be able to successfully mine 10,000 blocks in order setup a one-off 20-minute DoS? I'm pretty sure it'd be cheaper to just buy the 14,675BTC. If someone who wants to destroy or defraud the bitcoin network is able to mine that many blocks in preparation, it's either a *really* long con or Bitcoin is hosed for so many other reasons.

I'm not arguing that it is impossible. I'm arguing that it is not even remotely close to economically plausible, and so far beyond other attack vectors that it's not worth worrying about.

But even if that weren't the case, there are fundamentally simple ways to deal with the issue: introduce a hard limit on the number of index node updates per-block, just as the number of signature operations is currently limited.

Quote from: retep on September 26, 2013, 06:05:34 PM

As I say, such extreme differences make it very likely that you'll run into implementations - or just different versions - that can't handle such blocks, while others can, leading to dangerous forks. Look at how the March fork happened because of a previously unknown limit on how many database operations could happen in one go in BDB.

That's a wonderfully general argument you can use against anything you don't like. Implementation bugs can exist anywhere. There's no reason to suppose that we're more or less likely to run into them here than somewhere else.

Quote from: retep on September 26, 2013, 06:05:34 PM

The performance cost of SHA256^2 compared to SHA256 is a lot less than 2. Heck, I'd bet you 1BTC that even in your existing Python implementation changing from SHA256 to SHA256^2 makes less than a 10% performance difference, and the difference in an optimized C++ version is likely unmeasurable. I really hope you understand why...

Enlighten me, please.

EDIT: After thinking about this for some time, the only conclusion I can come to is that you are assuming multiple blocks on the first pass. This is only true of about half of the nodes (most of which are two blocks), the other half are small enough to fit in a single block. While not a performance factor of 2, it's definitely >= 1.5.

Quote from: retep on September 26, 2013, 06:05:34 PM

A re-org is a failure of consensus, though one that healed itself.

That's not a very useful definition of consensus. Reorgs happen even without malicious intent, and the network is eventually able to reach agreement about older blocks. A real error is when one node rejects a block that other validating nodes accept - no amount of time or work built on the other chain will convince that node to switch. If this proposal is susceptible to that kind of validation error, I'd be interested to know.

Quote from: retep on September 26, 2013, 06:05:34 PM

Currently re-orgs don't happen too often because the difference in performance between the worst, average, and best performing Bitcoin nodes is small, but if that's not true serious problems develop. As an example when I was testing out the Bloom filter IO attack vulnerability fixed in 0.8.4 I found I was able to get nodes I attacked to fall behind the rest of the network by more than six blocks; an attacker could use such an attack against miners to construct a long-fork to perform double-spend attacks.

If you know a way to do the same with this proposal, which isn't easily solved by processing blocks in parallel, and which doesn't require an additional network DoS vulnerability, please let me know. Until then there are simply too many "what if"s attached.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP

Peter Todd

Legendary

Offline

Activity: 1120
Merit: 1150

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

September 27, 2013, 08:29:29 AM

#54

Quote from: maaku on September 26, 2013, 08:07:48 PM

Quote from: retep on September 26, 2013, 06:05:34 PM

The performance cost of SHA256^2 compared to SHA256 is a lot less than 2. Heck, I'd bet you 1BTC that even in your existing Python implementation changing from SHA256 to SHA256^2 makes less than a 10% performance difference, and the difference in an optimized C++ version is likely unmeasurable. I really hope you understand why...

Enlighten me, please.

EDIT: After thinking about this for some time, the only conclusion I can come to is that you are assuming multiple blocks on the first pass. This is only true of about half of the nodes (most of which are two blocks), the other half are small enough to fit in a single block. While not a performance factor of 2, it's definitely >= 1.5.

Where do you think the first SHA256() gets the data it's working on? What about the second?

BTC: 1FCYd7j4CThTMzts78rh6iQJLBRGPW9fWv PGP: 7FAB114267E4FA04

Kevlar

Sr. Member

Offline

Activity: 602
Merit: 254

🔰FERRUM NETWORK🔰

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

October 10, 2013, 05:36:36 PM

#55

What's the latest on the development of this?

Ferrum Network • Interoperability Network for Financial Applications

QuantumQrack

Sr. Member

Offline

Activity: 337
Merit: 250

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

October 26, 2013, 05:47:47 AM

#56

Interested in the status of this as well.

maaku (OP)

Legendary

Offline

Activity: 905
Merit: 1011

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

October 27, 2013, 05:47:15 PM

#57

There was a gap in funding in late September, and then I went incommunicado for a few weeks, because of this:

Now that we're back home and settled, I'm back to working on authentication trees. Thanks in large part to the generous donation of Armory Technologies, Inc. and a few others, and the recent rise in price, I have the ability to focus on this project for some more months.

Right now I am translating the Python code into a C++ libauthtree implementation, and working on a series of BIPs that describe the trie structure, various generic operations on it, and the specific application to txid indices, address indices, and merged mining. Invictus Innovations is helping with the BIP process due to their interested in using the structure for merged mining headers.

I will be consolidating the recent news into a blog post update soon, hopefully by the end of the week.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP

bg002h

Donator
Legendary

Offline

Activity: 1463
Merit: 1047

I outlived my lifetime membership:)

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

October 28, 2013, 12:40:14 AM

#58

Quote from: maaku on October 27, 2013, 05:47:15 PM

There was a gap in funding in late September, and then I went incommunicado for a few weeks, because of this:

Now that we're back home and settled, I'm back to working on authentication trees. Thanks in large part to the generous donation of Armory Technologies, Inc. and a few others, and the recent rise in price, I have the ability to focus on this project for some more months.

Right now I am translating the Python code into a C++ libauthtree implementation, and working on a series of BIPs that describe the trie structure, various generic operations on it, and the specific application to txid indices, address indices, and merged mining. Invictus Innovations is helping with the BIP process due to their interested in using the structure for merged mining headers.

I will be consolidating the recent news into a blog post update soon, hopefully by the end of the week.

Congrats!

Hardforks aren't that hard. It’s getting others to use them that's hard.
1GCDzqmX2Cf513E8NeThNHxiYEivU1Chhe

Peter Todd

Legendary

Offline

Activity: 1120
Merit: 1150

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

October 28, 2013, 01:20:58 AM

#59

Quote from: maaku on October 27, 2013, 05:47:15 PM

There was a gap in funding in late September, and then I went incommunicado for a few weeks, because of this:

Congrats!

Sent you a donation for his or her college fund.

Quote from: maaku on October 27, 2013, 05:47:15 PM

Now that we're back home and settled, I'm back to working on authentication trees. Thanks in large part to the generous donation of Armory Technologies, Inc. and a few others, and the recent rise in price, I have the ability to focus on this project for some more months.

Right now I am translating the Python code into a C++ libauthtree implementation, and working on a series of BIPs that describe the trie structure, various generic operations on it, and the specific application to txid indices, address indices, and merged mining. Invictus Innovations is helping with the BIP process due to their interested in using the structure for merged mining headers.

I will be consolidating the recent news into a blog post update soon, hopefully by the end of the week.

I should write up a similar document explaining TXO commitments.

BTC: 1FCYd7j4CThTMzts78rh6iQJLBRGPW9fWv PGP: 7FAB114267E4FA04

maaku (OP)

Legendary

Offline

Activity: 905
Merit: 1011

Re: [Fundraising 195btc] Finish “Ultimate blockchain compression”

October 28, 2013, 08:36:34 AM

#60

Wow, that was very kind and unexpected. Thank you! I forwarded those coins into a cold-storage wallet that will be the start of her education endowment. Hopefully 20 years of deflation will turn it into a sizable chunk of her college education, and maybe even the start of a nest egg.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Education endowment for maaku's daughter:
1Naginiq19q9fdeFLipqFr4TZjjsSdhT3B
-----BEGIN PGP SIGNATURE-----

iQIcBAEBAgAGBQJSbiBOAAoJECsa1YSJj/UD6z4P/3jawSFnpMNt89elZmsm3OaJ
q2GMvC++X8+Y7n/RI5mN7Ubg4E2YLyGS6Xrf5f7mHBMq5FjW5UPL2GV0JDHEN+ni
ILQMvvNhI28Dj1rlZY2KdAYj6WUf50gVFY3Tv3hq3JZHEnoS2r2YPZ4dPH9Ls4zM
i6iQ/LJNs7hZFEiO+EDsPThjepFmRzpaBqi2mpW4BCnFl1gqYJiFUgSKrY2G8YYI
Eqh9Xq1rQft+utYY2gNuc8XtqrscmFmr5dHrWnqPe0cFPWrpG0HqY5QftaqfjMXg
WWZHuTPwul+QVwXOeimoPjdJ9gCi7sXyCZTa9MxvJ5n1QS/ubuHWKx5jUzbSx6m+
iFb+IX+tLYxcOecxfmftM645LQ8WlCJeY0p+EoskNpkdH4q63Sk1Nyq9dJMuQK+2
+MaFbB0Pv3AMk/Qoi/qrCF9F1BVD/CYK72uI8t+7r8iDvQ97iK5lGfLvCxR89Ev3
X3uCiJOpUctMozELjI7w3ga3xRah6mufKU2HgNRqkZAaphp7Cf5v34NwnSeaf9LM
A2tdBo8N86NVpjeyLz5U8TRnSrUcPDQPR4snz+IYF1sCKsRqYz9rNy1WhaIO7j6x
2A6nTnzvLUnenYLusu4hAIJ5XAlQMiGDL6sdALL2Z3or7mlx3IPz5elvn2FRd1fh
xkvdfNXeOkmrFd4x6m6h
=fzcm
-----END PGP SIGNATURE-----

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP