etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
June 17, 2012, 06:33:40 PM Last edit: January 20, 2014, 04:54:23 PM by etotheipi Merited by ABCbits (49), suchmoon (4), Husna QA (3) |
|
This idea has been scattered throughout some other threads, but there is no one place that fully explains the idea with pictures. I believe this relieves two major problems with the network at once -- compression/pruning, and lightweight node security -- and does so in a non-disruptive way. I am not positive that this is the right way to go, but it definitely warrants discussion. Summary: [SEE ILLUSTRATIONS BELOW] Use a special tree data structure to organize all unspent-TxOuts on the network, and use the root of this tree to communicate its "signature" between nodes. The leaves of this tree actually correspond to addresses/scripts, and the data at the leaf is actually a root of the unspent-TxOut list for that address/script. To maintain security of the tree signatures, it will be included in the header of an alternate blockchain, which will be secured by merged mining. This provides the same compression as the simpler unspent-TxOut merkle tree, but also gives nodes a way to download just the unspent-TxOut list for each address in their wallet, and verify that list directly against the blockheaders. Therefore, even lightweight nodes can get full address information, from any untrusted peer, and with only a tiny amount of downloaded data (a few kB). (NOTE: I have illustrated everything as using straight merkle-trees, but as noted in the downsides/uncertainties section: a variant of the merkle-tree will have to be to used that guarantees efficient updating of the tree.) (1) Major Benefits:(1a) Near-optimal blockchain compression: theoretically, the size of the pruned blockchain would be proportional to the transaction volume (thus could go up or down), instead of the entire global history which always increases in size. In practice, it wouldn't be so clean, but you really won't do any better than this. Whoops! Before this idea was fully developed, I had overlooked the fact that full nodes will still have to maintain the transaction-indexed database. This address-indexed DB is not a replacement, but would have to be in addition to the that. Therefore, it necessarily increases the amount of work and data storage of a full node. But it can simply be an "add-on" to an existing "ultraprune" implementation. (Either way, this should actually be a downside).- (1b) Trustless lightweight-node support: New nodes entering the network for the first time, will only have to download a tiny amount of data to get full, verifiable knowledge of their balance and how to spend it (much of which can be stored between loads). A single honest peer out of thousands guarantees you get, and recognize, good data.
- (1c) Perfectly non-disruptive: There is no main-network protocol or blockchain changes at all. All the balance-tree information is maintained and verified in a separate blockchain through merged mining. In fact, it's so non-disruptive, it could be implemented without any core-dev support at all (though I/we would like their involvement)
- (1d) Efficient tree querying&updating: The full-but-pruned nodes of the network will be able to maintain this data structure efficiently. New blocks simply add or remove unspent coins from the tree, and all operations are "constant time and space" (there is an upper limit on how much time and space is required to prove inclusion of, insert, or delete a piece of data, no matter how big the network is)
- (1e) No user-setup or options: Unlike overlay networks, achieving full trust does not require finding a trusted node, or subscribing to a service. Just like the main blockchain -- you find a bunch of random peers and get the longest chain. This could be bootstrapped in a similar fashion as the main network.
(2) Downsides and Uncertainties:- (1a) See revised (1a) above
- (2a) Complexity of concept: This is not simple. It's a second blockchain, requiring merged mining -- though if it is successful and supported by the community, it could be added to the network by requiring that miners compute and include the root hash of this data structure in the coinbase script (just like with block height). This is entirely feasible, but it could be a bear to implement it.
- (2b) Uncertainties about lite-node bootstrap data: Depending on how the data is structured, there may still be a bit of a data for a lite node to download to get the full security of a full node. It will, undoubtedly, be much less than downloading the entire chain. But, there is obviously implications if this security comes at the cost of 1 MB/wallet, or 100 MB/wallet (still better than 4GB, as of this writing). UPDATE: My initial estimate based on the "Hybrid PATRICIA/Brandais Tree" (aka Reiner-Tree), is that a wallet with 100 addresses could verify its own balance with about 250 kB.
- (2c) [SEE UPDATE AT BOTTOM] Merkle-tree Alternative Needed: Vanilla merkle-trees will not work, because adding or removing single branches is likely to cause complete recomputation of the tree. But it should be possible to create an alternative with the following properties:
- Commutative computation: a node should be able to get the same answer regardless of whether the tree is computed from scratch, or is based on updating a previous tree.
- O(log(N)) updating: removing or adding a single leaf node should be able to be done in O(log(N)) time. With a vanilla merkle tree, this is true only if you remove a node and add a node to the same leaf location.
(3) Assumptions::- (3a) Need verifiable tree roots: I argue that a regular overlay network won't suffice, solely because it's too easy for malicious nodes to spread incorrect data and muck up the network. If there's enough malicious nodes in an overlay network, it could make lite nodes that depend on it unusable. I am assuming it is necessary to have a verifiable source for pruned-headers -- a separate blockchain succeeds because correctness of data is required to be accepted.
- (3b) Merged mining does what we think it does: It is a secure way to maintain a separate blockchain, leveraging existing mining power.
- (3c) Efficient sorting: Leaf nodes of the main tree will have to be sorted so that all nodes can arrive at the same answer. However, this can be done using bucket-sort in O(N) time, because the leaf nodes are hashes which should be uniformly distributed.
Alt-Chain Merkle Tree construction:-- For each address/script, collect all unspent-TxOuts -- Compute merkle root of each TxOut tree -- Sort roots, use as leaf nodes for a master-merkle-tree. -- Include merkle-root of master tree in alternate chain header. https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinercompression.png Getting your balance:-- Download headers of both chains -- Request unspent-TxOut-hash list. -- Compute sub-merkle root for this address -- Request secondary-branch nodes (O(log(N)) -- Compute master root; compare to block header -- Request the full TxOuts for each unspent-TxOut-hash above https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinercompression2.png Alternate Chain:All data is included on the alternate blockchain, which is maintained through merged mining on the main chain. This is only one extra tx per block on the main chain. That is the full extent of its impact on the main chain, and any nodes that are ignoring/unaware of the alt-chain. https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinerchain.png
Yes, this is a huge undertaking. Yes, there's a lot of uncertainties. Yes, I need a new merkle tree structure. But, this idea would kill two massive birds with one stone (kill two albatrosses with one block?) Alright, tear it apart!
UPDATE:After lots and lots of discussion and debate, I believe that the address index should be maintained as a trie-like structure. Other's have expressed interest in a binary-search tree (BST). Either way, the structure can be adapted to have the same properties we desire of a merkle tree, but with a lot more flexibility, such as very quick insertion, deletion, querying, updating, etc. My preference is the creme-de-la-creme of tries -- a hybrid PATRICIA tree (level-compressed trie) and de-la-Braindais tree (node-compressed). It looks something like this: https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/DataStructures_Values.pngThe structure would be indexed by TxOut script ("recipient"), and each node is recursively authenticated by the nodes below it. The uniqueness of the trie structure guarantees that there is exactly one solution for a given set of TxOuts, which also means that only the existing set of TxOuts need to be obtained in order to create the trie (the BST requires replaying all transactions, in order, to have a well-defined internal structure). For education on trie structures, see my pretty pictures in this post.
|
|
|
|
hazek
Legendary
Offline
Activity: 1078
Merit: 1003
|
|
June 17, 2012, 06:43:53 PM |
|
Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has planed an order of priorities accordingly.
|
My personality type: INTJ - please forgive my weaknesses (Not naturally in tune with others feelings; may be insensitive at times, tend to respond to conflict with logic and reason, tend to believe I'm always right)
If however you enjoyed my post: 15j781DjuJeVsZgYbDVt2NZsGrWKRWFHpp
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
June 17, 2012, 06:47:49 PM |
|
Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has an order of priorities planed accordingly.
I too believe this is a critical issue for Bitcoin, as a whole. I had floated the idea that handling blockchain size was critical, in the past, but other issues seemed more pressing for the devs at the time -- I didn't have a solid idea to promote, and the blockchain size wasn't so out of hand yet. One nice benefit of this solution is that because it's an alt-chain, technically no core devs have to be on-board. It can be done completely indepedently and operate completely non-disruptively, even with only the support of other devs who believe in it. I'd certainly like to get core devs interested in it, as they are very smart people who probably have a lot of good ideas to add. But one of the biggest upsides here is that it can be done completely indepdently.
|
|
|
|
socrates1024
Full Member
Offline
Activity: 126
Merit: 110
Andrew Miller
|
|
June 17, 2012, 07:22:01 PM Last edit: June 17, 2012, 07:46:30 PM by socrates1024 |
|
Let me try to explain a solution to the 'alternate Merkle tree' you require. The basic idea is to use a balanced binary search tree, such as a red-black tree or a 2-3 tree. Updating such a data structure, including rebalancing, only requires accessing O(log N) tree nodes. A lite client would be able to verify such an update by receiving just the relevant nodes. There is never a need to recompute the entire tree from scratch. Balancing is strict, in that the worst-case length from the root to a leaf never exceeds O(log N). There's been a bunch of academic research on this topic, where it's known as "Authenticated Data Structures" [1,2,3]. Although the idea has been around for almost 15 years, I don't know of a single implementation. So I made a toy implementation available at https://github.com/amiller/redblackmerkleI'm hoping to spread awareness of this technique, since it's pretty clear that some clever use of Merkle trees is going to be important in Bitcoin's future. Let's discuss this! P.S. Most of these structures only require collision resistant hash functions. However, if you want to use a fancy hash functions with special (homomorphic) properties, you can make even more efficient structures, e.g. a Merkle 'bloom filter' [4]. [1] Certificate Revocation and Certificate Update Noar and Nissim, 1998. USENIX https://www.usenix.net/publications/library/proceedings/sec98/full_papers/nissim/nissim.pdf[2] Authenticated Data Structures Roberto Tamassia, 2003. http://cs.brown.edu/research/pubs/pdfs/2003/Tamassia-2003-ADS.pdf[3] Persistent Authenticated Data Structures and their applications Anagnostopoulos, Goodrich, and Tamassia http://cs.brown.edu/people/aris/pubs/pad.pdf[4] Cryptography for efficiency: Authenticated Data Structures Based on Lattices and Parallel Online Memory Checking Papamanthao and Tamassia, 2011 http://www.cse.msstate.edu/~ramkumar/gw-102.pdf
|
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
June 17, 2012, 07:33:39 PM |
|
Let me try to explain a solution to the 'alternate Merkle tree' you require. The basic idea is to use a balanced binary search tree, such as a red-black tree or a 2-3 tree. Updating such a data structure, including rebalancing, only requires accessing O(log N) tree nodes. A lite client would be able to verify such an update by receiving just the relevant nodes. There is never a need to recompute the entire tree from scratch. Balancing is strict, in that the worst-case length from the root to a leaf never exceeds O(log N). There's been a bunch of academic research on this topic, where it's known as "Authenticated Data Structures" [1,2,3]. Although the idea has been around for almost 15 years, I don't know of a single implementation. So I made a toy implementation available at https://github.com/amiller/redblackmerkleI'm hoping spread awareness of this technique, since it's pretty clear that some clever use of Merkle trees is going to be important in Bitcoin's future. Let's discuss this! P.S. Most of these structures only require collision resistant hash functions. However, if you want to use a fancy hash functions with special (homomorphic) properties, you can make even more efficient structures, e.g. a Merkle 'bloom filter' [4]. [1] Certificate Revocation and Certificate Update Noar and Nissim, 1998. USENIX https://www.usenix.net/publications/library/proceedings/sec98/full_papers/nissim/nissim.pdf[2] Authenticated Data Structures Roberto Tamassia, 2003. http://cs.brown.edu/research/pubs/pdfs/2003/Tamassia-2003-ADS.pdf[3] Persistent Authenticated Data Structures and their applications Anagnostopoulos, Goodrich, and Tamassia http://cs.brown.edu/people/aris/pubs/pad.pdf[4] Cryptography for efficiency: Authenticated Data Structures Based on Lattices and Parallel Online Memory Checking Papamanthao and Tamassia, 2011 http://www.cse.msstate.edu/~ramkumar/gw-102.pdfWow, thanks socrates! That's an excellent place to start. Of course, I should've thought about that from the start, since I am adept with data structures, and especially trees/tries of sorts. I have even coded a red-black tree before... My brain was stuck on how to modify the base merkle-tree concept into what I wanted, and didn't consider going back to a different (though related) data structure. There is one problem, though it may be part of the materials you already referenced: the tree must be constructed identically by all parties, and from any state. And all binary-tree structures are insert-order dependent, unless you're storing them in some sort of trie. Specifying an insert order doesn't work, because someone constructing from scratch doesn't know how someone updating from a previous tree will insert them. But I wouldn't want to go to a basic trie, due to the space inefficiency. Something like a Patricia tree/trie would probably work, but it's very difficult to implement (correctly) and there's not a lot of existing, trusted implementations out there. I'll dig into your materials a bit. Thanks!
|
|
|
|
2112
Legendary
Offline
Activity: 2128
Merit: 1073
|
|
June 17, 2012, 07:50:25 PM |
|
(2c) Merkle-tree Alternative Needed
I think this is the crucial observation. Bitcoin doesn't really have a single contiguous bitstream that would need a protection of a hash tree. It has a collection of transactions that are only weakly ordered in a lattice fashion. The better data structure would be probably some classic database representation like B-tree with cryptographically signed transaction log. To allow integrity verification with a trucated log the log blocks should contain something like a hash of inorder traversal of the database content before the update. This will allow for a quick verification of the new log blocks received from the network. To allow backward compatibility with forward-delta block-chain the Bitcoin protocol would need additional constraint on the ordering of the transactions in the blocks.
|
|
|
|
socrates1024
Full Member
Offline
Activity: 126
Merit: 110
Andrew Miller
|
|
June 17, 2012, 08:12:00 PM |
|
There are two kinds of orderings here. First is the order in which updates are made to the Merkle trees. Each block is ordered, and within a block transactions are ordered, and within a transaction the txIns and txOuts are ordered. To update your Merkle trees as the result of a bitcoin transaction, you remove all the inputs, and insert all the new outputs. Everyone should be able to do this in the same order, right? Now, within a Merkle tree, the order is determined by a key. Think of each Merkle tree as a database index. We would like to have at least two kinds of indexes, which means maintaining two instances of the Merkle tree: 1) TxOuts are identified by the transaction hash and an index within that transaction. So we need to search by (txhash,idx) in order to see if an output has been spent. When outputs are inserted into this tree, they're stored in sorted order according to (txhash,idx). 2) It's also desirable to find all the available txouts for a particular address. Let a second Merkle tree contain keys of the form (scriptpubkey). Now, given a root hash, you can ask for a verifiable list of all your spendable coins. Alternately, instead of thinking of it as a different tree for each index, you can think of it as a composite structure. The general form is a Search DAG (Directed Acyclic Graph), but the idea is exactly the same [5]. (This includes B-Trees). [5] A General Model for Authenticated Data Structures Martel, Nuckolls, Devanbu, Gertz, Kwong, Stubblebine, 2004. http://truthsayer.cs.ucdavis.edu/algorithmica.pdf
|
|
|
|
proudhon
Legendary
Offline
Activity: 2198
Merit: 1311
|
|
June 17, 2012, 08:41:15 PM |
|
No clue about any of this stuff, but thank you guys for working on this. I think it's important to have this sorted out before adoption and use gets really heavy.
|
Bitcoin Fact: the price of bitcoin will not be greater than $70k for more than 25 consecutive days at any point in the rest of recorded human history.
|
|
|
unclescrooge
|
|
June 17, 2012, 09:32:27 PM |
|
I agree with others, this is a hot issue. And as much as I dislike gambles, I'm thankful Satoshidice put pressure on the blockchain like this.
|
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
June 17, 2012, 09:35:40 PM Last edit: June 17, 2012, 09:57:38 PM by etotheipi |
|
There are two kinds of orderings here. First is the order in which updates are made to the Merkle trees. Each block is ordered, and within a block transactions are ordered, and within a transaction the txIns and txOuts are ordered. To update your Merkle trees as the result of a bitcoin transaction, you remove all the inputs, and insert all the new outputs. Everyone should be able to do this in the same order, right?
Now, within a Merkle tree, the order is determined by a key. Think of each Merkle tree as a database index. We would like to have at least two kinds of indexes, which means maintaining two instances of the Merkle tree:
Consider 10 unspent-TxOuts, {0,1,2,3,4,5,6,7,8,9}. You and I are both constructing the same binary/red-back tree, except that you are starting from scratch and I am starting from a tree where elements {0,1,2,5,6,9,13} are part of the tree. I have to add {3,4,7,8} and remove {13} from my tree. You just add all 10 elements in a specified order. We're going to get different roots. I technically know how your tree would've been constructed, but I might have to start over and reinsert everything from scratch if I wanted to make sure my tree structure matches yours. That's what I mean about "commutative computation" -- we need to make sure that regardless of whether you're starting from scratch, or updating an existing tree from any point in the past, that you'll get the same answer. As I said before, a trie would work, but is generally very space inefficient, and that matters here.
|
|
|
|
casascius
Mike Caldwell
VIP
Legendary
Offline
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
|
|
June 17, 2012, 10:04:31 PM |
|
Question: as users download this alt-chain, do they always download the alt-chain from its genesis block (imagine namecoin), or do they always only download a certain number of blocks back, after which point, all earlier blocks are discarded? (imagine p2pool).
I would have to imagine it was the second.
If it was the first, the value of the alt-chain would tend to zero over time: it would only be useful for pruning spent transactions that existed at the time the alt-chain was created, as all the blocks representing diffs to the alt chain would be the same as (in proportion) to the diffs on the main chain. That is, at least the way I understood it.
If it was the second, I would imagine that it would be more sensible to create a superblock (say, every 1000 blocks) that publishes a brand new tree, and then all non-superblocks that follow (up to the next superblock) would be updates to that tree. Anyone downloading the alt-chain would never need more than 1000 blocks: one superblock and all the incremental blocks since the superblock was created. Anything older than the superblock would be simply unnecessary.
|
Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable. I never believe them. If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins. I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion. Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice. Don't keep coins online. Use paper or hardware wallets instead.
|
|
|
casascius
Mike Caldwell
VIP
Legendary
Offline
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
|
|
June 17, 2012, 10:07:29 PM |
|
I am assuming it is necessary to have a verifiable source for pruned-headers -- a separate blockchain succeeds because correctness of data is required to be accepted.
I believe you could provide this by publishing both your "genesis block" as well as the source code for the one-off utility you make to produce it. One holding a complete copy of the Bitcoin block chain should be able to run your program and create a bit-for-bit copy of your genesis block. Get a few people to do this, and to GPG-sign the hash of the resulting output. Voila.
|
Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable. I never believe them. If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins. I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion. Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice. Don't keep coins online. Use paper or hardware wallets instead.
|
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
June 17, 2012, 10:35:28 PM |
|
Question: as users download this alt-chain, do they always download the alt-chain from its genesis block (imagine namecoin), or do they always only download a certain number of blocks back, after which point, all earlier blocks are discarded? (imagine p2pool).
I would have to imagine it was the second.
If it was the first, the value of the alt-chain would tend to zero over time: it would only be useful for pruning spent transactions that existed at the time the alt-chain was created, as all the blocks representing diffs to the alt chain would be the same as (in proportion) to the diffs on the main chain. That is, at least the way I understood it.
If it was the second, I would imagine that it would be more sensible to create a superblock (say, every 1000 blocks) that publishes a brand new tree, and then all non-superblocks that follow (up to the next superblock) would be updates to that tree. Anyone downloading the alt-chain would never need more than 1000 blocks: one superblock and all the incremental blocks since the superblock was created. Anything older than the superblock would be simply unnecessary.
The second blockchain actually would have no substance to it. It would solely consist of headers which would contain the master-merkle roots. The merkle-roots are created from the main chain data, so you go get the data from that network. There would be no block reward -- your reward is on the main chain which you mining at the same time. So yes, everyone downloads the entire alt-chain. But the entirety of this extra chain is the headers themselves: so it's only about 4-5 MB per year. I am assuming it is necessary to have a verifiable source for pruned-headers -- a separate blockchain succeeds because correctness of data is required to be accepted.
I believe you could provide this by publishing both your "genesis block" as well as the source code for the one-off utility you make to produce it. One holding a complete copy of the Bitcoin block chain should be able to run your program and create a bit-for-bit copy of your genesis block. Get a few people to do this, and to GPG-sign the hash of the resulting output. Voila. See (1e) in the original post: the point of this exercise is to not have to trust specific nodes, GPG/PGP signatures, add centralization, etc. You trust the proof-of-work. The merkle-root with the most work behind it on the alt-chain is the merkle-root that you trust to be correct, just like you do with the main-network headers. We don't want users to have to setup a list of trusted authorities. Then you have revocation lists. And keep the list updated. And maintain GPG keys of such authorities. And politics about who should be trusted authorities. And of course, centralization. Or you setup a alternate blockchain, and trust the data that has the most work behind it. Voila.
|
|
|
|
maaku
Legendary
Offline
Activity: 905
Merit: 1012
|
|
June 17, 2012, 11:32:21 PM Last edit: June 18, 2012, 12:03:45 AM by maaku |
|
As others have mentioned, a self-balancing binary search tree would solve the only real technical issue here. Red-black trees would work fine, or their generalized parent structure the 2-3-4 tree (B+tree of order 4), which would provide a conceptually cleaner implementation and serialization format. Overall, great work. I can assist you in implementing it. There is one problem, though it may be part of the materials you already referenced: the tree must be constructed identically by all parties, and from any state. And all binary-tree structures are insert-order dependent, unless you're storing them in some sort of trie. Specifying an insert order doesn't work, because someone constructing from scratch doesn't know how someone updating from a previous tree will insert them. But I wouldn't want to go to a basic trie, due to the space inefficiency. Something like a Patricia tree/trie would probably work, but it's very difficult to implement (correctly) and there's not a lot of existing, trusted implementations out there. (EDIT: Sorry, this is basically what socrates above. I should have read the whole thread first:) This is a non-issue; simply specify the order of insertion/deletion. For example: “Process blocks in order; for each block process transactions in order; and for each transaction first delete all inputs (in order) from, then insert all outputs (in order) into the alt-chain Merkle tree”. You might have to throw a special case in there for transactions that have as input the output of a transaction that occurs later in the same block (is that even allowed?). Why (and how) would you be creating a tree from scratch without either access to the blockchain or the tree in the last alt-chain checkpoint? Consider 10 unspent-TxOuts, {0,1,2,3,4,5,6,7,8,9}.
You and I are both constructing the same binary/red-back tree, except that you are starting from scratch and I am starting from a tree where elements {0,1,2,5,6,9,13} are part of the tree.
I have to add {3,4,7,8} and remove {13} from my tree. You just add all 10 elements in a specified order.
We're going to get different roots. I technically know how your tree would've been constructed, but I might have to start over and reinsert everything from scratch if I wanted to make sure my tree structure matches yours.
That's what I mean about "commutative computation" -- we need to make sure that regardless of whether you're starting from scratch, or updating an existing tree from any point in the past, that you'll get the same answer. No, that's going in the wrong direction. Updates to the blockchain occur in atomic operations: blocks. Simply mandate that trees are constructed/updated according to the canonical ordering provided by the blockchain. If you insist on creating the search tree from scratch, simply replay the blockchain inserting and removing in the order specified therein. Or you can start from the tree of the last found alt-chain checkpoint, and replay insertions & deletions from that point forward. Yes, order of operations matters, so standardize an order of operations.
|
I'm an independent developer working on bitcoin-core, making my living off community donations. If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
June 18, 2012, 12:03:41 AM |
|
Two big issues brought up so far, in outside discussion:
(1): I am currently in respectful disagreement with Socrates about the ability to construct stateless unspent-txout-trees. Recap: If one were to use a red-black tree to store this info, then the exact structure of that tree will depend on the entire, exact, incremental history of additions and deletions to the tree starting from the genesis block. To construct the tree from scratch, I must replay the entire history in the same order every time.
I am of the opinion that you should be able to start with the current list of unspent-TxOuts, however it is that you got them, and should be able to construct the correct tree without knowing anything about the history of how elements were added and deleted. If using the vanilla red-black trees, if I start with just the current unspent TxOuts list, I have every node in the tree -- but I need the entire history of already-spent-TxOuts just to be able to replay that history to construct the tree correctly. This seems inelegant and dangerous. But I can't figure out why, other than gut instinct.
The counter argument is that you will never find yourself in this position: you are either downloading and processing the whole chain, in which case you have no problem replaying the incremental updates. Or you download from some checkpoint, in which case you can still replay the insertions and deletions from that checkpoint with minimal effort.
(2): I misunderstood DiThi's original proposal, as a much simpler tree structure that could not provide the trustless lite-node behavior that my trees do. However, as I re-read it, I'm realizing that I think he's right -- a simpler tree of unspent TxOuts does actually give you that capability. Is this right?
A couple months ago when I first theorized this idea, I had a very good good reason for believing that you needed to aggregate the unspent-TxOuts (UTXOs) by address/script. If you didn't do it, bad things would happen. Unfortunately, I don't remember what those bad things were! It's entirely reasonable that I mis-applied some logic in my head and decided I needed an unnecesssarily-complex data structure to achieve security.
So, am I wrong? I'm not sure. What are the weaknesses and advantages of DiThi's tree structure vs. mine (his leaf nodes are UTXOs, mine are roots of UTXO sub trees). One thing I can think of is: how do you know that a peer gave you the complete list of UTXOs and is not hiding the rest? Though, I've already made an assumption that you have at least one honest peer, so that probably not an issue... is it?
|
|
|
|
BrightAnarchist
Donator
Legendary
Offline
Activity: 853
Merit: 1000
|
|
June 18, 2012, 12:09:42 AM |
|
listening... very good ideas here, similar to my balance chain concept but much more fleshed out
|
|
|
|
maaku
Legendary
Offline
Activity: 905
Merit: 1012
|
|
June 18, 2012, 12:17:43 AM |
|
The counter argument is that you will never find yourself in this position: you are either downloading and processing the whole chain, in which case you have no problem replaying the incremental updates. Or you download from some checkpoint, in which case you can still replay the insertions and deletions from that checkpoint with minimal effort. Correct. So why does it matter? I'm not sure why it's “inelegant” either, since as you point out there is no use case for recreating the tree from only an unsorted list of outputs anyway. A couple months ago when I first theorized this idea, I had a very good good reason for believing that you needed to aggregate the unspent-TxOuts (UTXOs) by address/script. If you didn't do it, bad things would happen. Unfortunately, I don't remember what those bad things were! It's entirely reasonable that I mis-applied some logic in my head and decided I needed an unnecesssarily-complex data structure to achieve security. If nothing else, it is certainly convenient to pull in all the unspent outputs for an address without having to go all over the tree, as you can under your approach. That's a very common use case, so it would make sense to optimize for it.
|
I'm an independent developer working on bitcoin-core, making my living off community donations. If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
June 18, 2012, 12:27:37 AM |
|
The counter argument is that you will never find yourself in this position: you are either downloading and processing the whole chain, in which case you have no problem replaying the incremental updates. Or you download from some checkpoint, in which case you can still replay the insertions and deletions from that checkpoint with minimal effort. Correct. So why does it matter? I'm not sure why it's “inelegant” either, since as you point out there is no use case for recreating the tree from only an unsorted list of outputs anyway. I didn't point that out, because I don't think I agree with it. It's a counter-argument that, while I can't dispute it directly at the moment, makes me extremely uncomfortable. What future use case haven't we considered that would be stunted by this decision? I'm only agreeing that I don't have a direct counter the argument for it (and thus cannot fully defend my position). But I'd hardly believe I'm the only one who would be bothered by it: I think it's extremely inelegant that if I have every node in the tree, that I still have to download GB more data just to know how to organize it (or rather, I might as well just re-download the tree directly, at that point).
|
|
|
|
socrates1024
Full Member
Offline
Activity: 126
Merit: 110
Andrew Miller
|
|
June 18, 2012, 12:51:42 AM |
|
I had a very good good reason for believing that you needed to aggregate the unspent-TxOuts (UTXOs) by address/script. If you didn't do it, bad things would happen. Unfortunately, I don't remember what those bad things were!
One of the neat things about a Merkle search structure, rather than just an arbitrarily-ordered Merkle tree, is that you can prove that a key is not in the database. Even with a typical Merkle tree, like the current blockchain, it would require a linear effort to prove that a transaction doesn't exist - assuming all you have is an O(1) root hash, and you don't trust anyone! Even more generally, you can do a verified 'range query' for only O(M log N) effort (where M is the number of results, N is the size of the tree). If you store each unspent-coin in a binary search tree, ordered by the address, then you can ask an untrusted server to give you a snapshot of all the spendable coins associated with that address. There's no way for them to omit any. Let me try to describe the scenario how I prefer, since it's hard to keep track of the terms otherwise. There are two parties, the Lite-Client (Client) and the Helper (Server). The goal is for the Client not to have to trust the Server, but for the Server to store all the data. The Client only ever needs to store (like, on disk) a constant O(1) amount of state - the root hash. In order to decide what root hash to use, the Client will have to rely on the proof-of-work to recognize the most recent block. If the Client asks the Server for the list of unspent-coins matching a target address, he receives two or more O(log N) paths through the Merkle tree. The first one is path is to the element with the largest address (lexical ordering) that is smaller than the target address. The last one is a path to the smallest element with a larger address. If there were no transactions matching the target, then these two elements will be adjacent. In any case, the Client iterates through the paths he receives, at each step checking that the paths are adjacent in the tree (and, of course, that the hashes are consistent and lead to the root). [edit]I hope this turns into a Merkle trees megathread![/edit]
|
|
|
|
|