etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
November 04, 2012, 05:47:44 PM Last edit: November 04, 2012, 06:08:25 PM by etotheipi |
|
I know it's been a while since this topic has been visited, but I'd like to make a proposal here:
Rather than settle on the "best" way to implement this tree, how about settle on the "simplest", so that way the community can catch the bug and start cranking their brains on the best way to implement this tree with the right balance of features and complexity when the time comes to consider making it part of the protocol standard.
By "simplest", I mean implemented as follows:
1. A simple binary tree that starts out balanced, but maintaining the balancing from block to block is optional. A miner has the choice to rebalance the tree for the next block, but doesn't have to. The lack of a requirement to keep the tree balanced is meant as an effort to discourage mining empty blocks because a miner doesn't want the CPU burden or any delay associated with rebuilding the whole tree with each incoming transaction.
2. No ability to roll back. Rolling back must be accomplished either by rebuilding the tree from scratch, or by starting with a known good backup and rolling it forward. Backups are systematically deleted such that the backup burden grows O(log n) relative to the total block height. More specifically, the backup of any given block's tree should have a lifetime of 2^n blocks where n is the number of contiguous zero bits at the end of the block height. Block 0x7890's tree backup should last sixteen blocks because 0x7890 ends in four zero bits. The backup for the tree of block 0x10000 should last 0x10000 blocks.
Now, why would I suggest a methodology that clearly avoids taking advantage of features that would make a "better mousetrap" so to speak? Here are the most prominent reasons:
1. At some point, the Bitcoin community may come to a consensus that we should redefine a valid Bitcoin block to incorporate the root hash of a valid meta-tree rather than having it be optional coinbase clutter. Until then, this is merely an optional performance-enhancing and usability-enhancing feature without any hard commitment to a standard. We should help people understand the base case for what it is, and then move on to derivative cases that do the job better.
2. There is serious value in simplicity. The more things are needlessly complex, the higher the barrier to entry for new developers of bitcoin software. We are at a point where we need more developers on board than we need the disk space saved by what would be (for the current block height and all block heights for the foreseeable future) about 20 backups of the meta tree on each user's disk. Besides being much more difficult for the average developer to understand, requiring a tree that must do a moonwalk during a rare edge case which is very difficult for a developer to reproduce and test makes for an exploitable danger that implementations may fail to do the right thing when the right thing is needed the most.
3. The Bitcoin community could use the lessons learned in a basic "proof of concept" implementation of this without being committed to any specific methodology for optimizing it. This will help the community at large understand which use cases evolve from the availability of the tree, and then come to an intelligent consensus as to what features and attributes of a meta tree are the most valuable and which bargains of complexity versus power are worth making.
I generally approve of the idea of prototyping the meta-chain CONOPs, and let people/devs start thrashing out how to use it, how to improve it etc. However, if you're arguing for simplicity, then you must use the Trie/Patricia/De la Brandais tree. There is no need for snapshotting/backups. Put the data in. If a block has to be rolled back, remove them from the tree. For a given snapshot in time, all Trie-based implementations will agree. It's part of their design. This just won't be possible using BST's, though. It's not a matter of preference, it's a matter of standardization. If you use BST, you might be inclined to use STL map<a,b> in C++, or similar implementation in Java, etc. But the map<> data structure will be designed/optimized different for each architecture, compiler, and OS. There's no guarantee that a BST in Linux using gcc 4.3 will even match the BST implementation in Linux gcc 4.8 -- they might've changed the BST implementation under-the-hood, optimizing the rebalancing operations differently. And you'd never know, because underlying tree structure is not specified in the C++ standard for map<>. Only the expected run times of insert/delete/query/etc. So, miners won't be able to agree on the root hash unless they all build the BST exactly the same way, so they must agree on the BST algorithm to use. Who's writing that implementation? Will they create an implementation of the exact same algorithm in C, Java, C++, haskell, etc? This is why I keep pushing Trie-based algorithms -- any implementation of the Trie (or whatever variant is agreed upon) will work. A trie that is implemented in C++ by someone in China can be used to produce the same root hash as a Java implementation written by some kid in his basement in Idaho (assuming the implementations are correct). Yes, it's possible to BSTs, but it's a lot of work. And it's not simple. To design this data structure around BSTs requires every implementer of the meta-chain to use this specific BST algorithm. There is no such ambiguity with Trie structures -- you could look them up in any textbook. So, I agree that we should do this. It needs to be done and I think something like this is ultimately the future of BTC (especially lite nodes). If only I could get all my other priorities in Armory finished, then I would be focusing on this.
|
|
|
|
socrates1024
Full Member
Offline
Activity: 126
Merit: 110
Andrew Miller
|
|
November 04, 2012, 06:08:41 PM |
|
Yes, it's possible to BSTs, but it's a lot of work. And it's not simple. To design this data structure around BSTs requires every implementer of the meta-chain to use this specific BST algorithm. There is no such ambiguity with Trie structures -- you could look them up in any textbook.
It's true that every implementer will need to use the same algorithm, otherwise the root hashes will be incompatible. And of course you can't just use a standard library map because those do not support computing hashes! But there are plenty of ambiguities involved in using tries. Here's an earlier quote from you where you mentioned an optimization using skip strings: In the Patricia/Hybrid Tree case, there are purely branch nodes and leaf nodes, though the branch nodes may have "skip strings". So a leaf node's hash value is just the root hash of the subtree. And a branch node's value is the hash of the concatenated skip string and its non-null child node values.
Surely if you use skip strings, that will change the root hash, so everyone would have to agree on the particular algorithm to use? Let's make several implementations, and evaluate which one is the best. By way of an update, I am still making gradual progress on a BST implementation. Previously I had implemented a red-black balanced merkle tree in python and described the complete protocol so that any independent implementation should arrive at the same hash. Unfortunately the implementation was too slow/memory-intensive to get past 130k or so of the blockchain. Since then, I have made a C++ implementation that's thousands of times faster (it runs within a factor of 3 as fast as the linux kernel LLRB tree, and although it doesn't compute hashes, the structure of the tree will be the same). https://github.com/amiller/redblackmerkle/blob/llrb/c_redblack.hpp I have also sketched out a solution for I/O efficient validation of a batch of updates involving a priority queue. But I'm not prepared to make a full post on this yet. I'm pointing all this out so that you can't say no progress is being made! Until someone from the 'trie' camp catches up, the simplest solution is a BST since some code for this already exists.
|
|
|
|
2112
Legendary
Offline
Activity: 2128
Merit: 1073
|
|
November 04, 2012, 06:25:28 PM |
|
I generally approve of the idea of prototyping the meta-chain CONOPs, and let people/devs start thrashing out how to use it, how to improve it etc.
However, if you're arguing for simplicity, then you must use the Trie/Patricia/De la Brandais tree. There is no need for snapshotting/backups. Put the data in. If a block has to be rolled back, remove them from the tree. For a given snapshot in time, all Trie-based implementations will agree. It's part of their design.
This just won't be possible using BST's, though. It's not a matter of preference, it's a matter of standardization. If you use BST, you might be inclined to use STL map<a,b> in C++, or similar implementation in Java, etc. But the map<> data structure will be designed/optimized different for each architecture, compiler, and OS. There's no guarantee that a BST in Linux using gcc 4.3 will even match the BST implementation in Linux gcc 4.8 -- they might've changed the BST implementation under-the-hood, optimizing the rebalancing operations differently. And you'd never know, because underlying tree structure is not specified in the C++ standard for map<>. Only the expected run times of insert/delete/query/etc.
So, miners won't be able to agree on the root hash unless they all build the BST exactly the same way, so they must agree on the BST algorithm to use. Who's writing that implementation? Will they create an implementation of the exact same algorithm in C, Java, C++, haskell, etc? This is why I keep pushing Trie-based algorithms -- any implementation of the Trie (or whatever variant is agreed upon) will work. A trie that is implemented in C++ by someone in China can be used to produce the same root hash as a Java implementation written by some kid in his basement in Idaho (assuming the implementations are correct).
Yes, it's possible to BSTs, but it's a lot of work. And it's not simple. To design this data structure around BSTs requires every implementer of the meta-chain to use this specific BST algorithm. There is no such ambiguity with Trie structures -- you could look them up in any textbook.
So, I agree that we should do this. It needs to be done and I think something like this is ultimately the future of BTC (especially lite nodes). If only I could get all my other priorities in Armory finished, then I would be focusing on this.
The claim "This just won't be possible using BST's, though." is plain false. It confuses the data structure and algorithm with their implementation. This gotta be some sort of miscommunication, or maybe the author had too much fun at a party yesterday.
|
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
November 04, 2012, 06:31:28 PM |
|
Yes, it's possible to BSTs, but it's a lot of work. And it's not simple. To design this data structure around BSTs requires every implementer of the meta-chain to use this specific BST algorithm. There is no such ambiguity with Trie structures -- you could look them up in any textbook.
It's true that every implementer will need to use the same algorithm, otherwise the root hashes will be incompatible. And of course you can't just use a standard library map because those do not support computing hashes! But there are plenty of ambiguities involved in using tries. Here's an earlier quote from you where you mentioned an optimization using skip strings: In the Patricia/Hybrid Tree case, there are purely branch nodes and leaf nodes, though the branch nodes may have "skip strings". So a leaf node's hash value is just the root hash of the subtree. And a branch node's value is the hash of the concatenated skip string and its non-null child node values.
Surely if you use skip strings, that will change the root hash, so everyone would have to agree on the particular algorithm to use? Let's make several implementations, and evaluate which one is the best. "Skip strings" are part of the standard Patricia tree implementation. It just may be called something different in different texts. Either you use a Trie, Patricia tree, a De la Brandais tree, or a Hybrid tree. Once the correct one is agreed upon, the ambiguities in implementation shrink to basically nothing. It becomes a question of how to traverse and aggregate tree data for Bitcoin purposes, not how to implement the data structure. That's something that will have to be done for any data structure that is used. On the other hand, a red-black tree that is optimized differently, and thus produce different root hash, will still be called a red-black tree. To describe to someone what that optimization is, well, requires a description of the algorithm (and probably code samples). I know it's possible, I'm just pointing out the difficulties that could arise out of different people unknowingly producing different tree structures. Most likely, under bizarre conditions with complicated rebalance operations, and it would be remarkably frustrating to debug. I'm pointing all this out so that you can't say no progress is being made! Until someone from the 'trie' camp catches up, the simplest solution is a BST since some code for this already exists.
I do appreciate that you are doing this. I wish I had time for it. Perhaps your C++ implementation is sufficient for porting to other languages, so that such a uniform implementation can be achieved. Clearly, I'm still opposed to it for other reasons (like necessitating backups/snapshots for re-orgs), but you can still accomplish what is needed. And I hope that we can hit all the snags and start figuring them out sooner than later. The claim "This just won't be possible using BST's, though." is plain false. It confuses the data structure and algorithm with their implementation. This gotta be some sort of miscommunication, or maybe the author had too much fun at a party yesterday.
Perhaps you misunderstood me. I was saying it won't be possible for everyone to agree on the root hash unless they use the exact same implementation. Standard implementations (such as STL map<>) are standardized only in run-time, not underlying structure. Thus, a widespread "experiment" using BSTs won't be simple without a uniform implementation across all languages, etc. This may be a lot of work. However, if it was tries, I consider it quite simple that anyone can download any [correct] trie implementation from anywhere, and know that they can get the same answer. Because the structure is guaranteed.
|
|
|
|
2112
Legendary
Offline
Activity: 2128
Merit: 1073
|
|
November 04, 2012, 06:41:57 PM Last edit: November 04, 2012, 07:01:56 PM by 2112 |
|
Because the structure is guaranteed.
I am not buying this guarantee. In this Bitcoin milieu I've soo much confused-endian (or accidental-endian) code that I will not take anything for granted, least of which that there will be a sensible agreement on how to represent the bignums. Edit: Or maybe I will state it like this: The structure will be guaranteed so long as any implementor is confused the same way as the original authors of the Satoshi client and finds the implementation errors made in the original client obvious and natural.
|
|
|
|
Pieter Wuille
|
|
November 04, 2012, 06:44:46 PM |
|
Obviously a fully-specified algorithm will need to be decided upon. That is no different than the definition of the current Merkle tree implementation (together with its weird odd-number-of-hashes-on-level behaviour) currently in use by Bitcoin.
Obviously implementations will be necessary for this, and one will not be able to reuse standard libraries that have not fully specified behaviour (or don't have authentication built in). It will almost certainly require a specific implementation in every language clients need. This has dangers, but it's not impossible. Extensive test sets will be needed that have 100% coverage in a reference implementation, to be sure every edge case or weird rebalancing rule is tested.
My gut feeling makes me prefer trie/patricia based solutions, because their structure is independent from the history that lead to their contents. This means a simply diff of the set represented by two tries is enough to specify the modification from one to the other. Authenticated balanced trees on the other hand do expose their history through the merkle root, and differences need to contain structural information. This is not a good reason, just a preference, and the one who implements the first usable and acceptable solutions will probably get to specify what data structure is chosen. Fully deterministic set representation may make an implementation easier to test, though.
|
I do Bitcoin stuff.
|
|
|
casascius
Mike Caldwell
VIP
Legendary
Offline
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
|
|
November 04, 2012, 06:45:19 PM |
|
By recommending a binary tree, or rather, a Merkle tree in the form most resembling a binary tree, I am suggesting that from block to block, the miner has the option of either just updating the tree (in a well-defined deterministic manner but leaving it unbalanced) or updating and rebalancing the tree such that all of the leaf nodes are the same distance (or distance-1) from the root, again in a well-defined deterministic manner.
I am not suggesting leaving the implementation up to the STL or any other form of library.
I don't believe Patricia trees are "simpler" when measured in the amount of human learning and neurons one must dedicate to understanding the concept. That doesn't mean I think it's too hard to learn, but rather, I doubt the cost (measured in terms of complexity of the specification) is worth the benefit.
If you tell a developer, "Now you've got to learn what a Patricia tree is to write a Bitcoin client", and then "Now that you've implemented it, you've got to simulate numerous cases of rollbacks to test and feel good that your implementation works backwards as well as forward" you have just made many more developers say "to hell with it, I'll develop something else".
... not to mention implementing a chain reorg strategy consisting of "now talk to your peers and start asking them for now-orphaned blocks (hopefully they have them still), preferably delivered in reverse order, so you can roll your tree back intact" rather than starting with a copy of the tree at or before the point in time of the split and rolling it forward.
|
Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable. I never believe them. If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins. I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion. Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice. Don't keep coins online. Use paper or hardware wallets instead.
|
|
|
jojkaart
Member
Offline
Activity: 97
Merit: 10
|
|
November 04, 2012, 06:52:22 PM |
|
If you tell a developer, "Now you've got to learn what a Patricia tree is", and then "Now that you've implemented it, you've got to simulate numerous cases of rollbacks to test and feel good that your implementation works backwards as well as forward" you have just made many more developers say "to hell with it, I'll develop something else".
This sort of a developer is dangerous to have developing Bitcoin's internal structures. I think it's better they stay away.
|
|
|
|
2112
Legendary
Offline
Activity: 2128
Merit: 1073
|
|
November 04, 2012, 06:55:47 PM |
|
It will almost certainly require a specific implementation in every language clients need. This has dangers, but it's not impossible. Extensive test sets will be needed that have 100% coverage in a reference implementation, to be sure every edge case or weird rebalancing rule is tested.
I think the requirement for "every language" is a vast overkill. I would say from my past experience is sufficient to have a clean portable C implemenation (or C++ implementation in a C-style, without the reliance on the unstable C++ language features like std::* or boost::*). Once that's done in my experience the code can be transcribed to just about anything that is Turing-equivalent. But such C (or subset-C++) implementation will have to be correctly deal with endianness and alignment issues. I'm not sure if the core development team is willing to commit to an endian-correct (or endian-neutral) implementation of Bitcoin.
|
|
|
|
casascius
Mike Caldwell
VIP
Legendary
Offline
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
|
|
November 04, 2012, 06:56:15 PM |
|
If you tell a developer, "Now you've got to learn what a Patricia tree is", and then "Now that you've implemented it, you've got to simulate numerous cases of rollbacks to test and feel good that your implementation works backwards as well as forward" you have just made many more developers say "to hell with it, I'll develop something else".
This sort of a developer is dangerous to have developing Bitcoin's internal structures. I think it's better they stay away. Which sort of developer? The one who revels in complexity, as though complexity breeds integrity? This guy is surely already busy on his first implementation of Bitcoin, in assembly language. He'll be done by 2017, assuming the architecture he's developing for is still popular enough that people will be able to run it. Or do you mean the one who walks away? And this benefits bitcoin because the fewer clients, the better?
|
Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable. I never believe them. If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins. I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion. Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice. Don't keep coins online. Use paper or hardware wallets instead.
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
November 04, 2012, 07:00:58 PM |
|
By recommending a binary tree, or rather, a Merkle tree in the form most resembling a binary tree, I am suggesting that from block to block, the miner has the option of either just updating the tree (in a well-defined deterministic manner but leaving it unbalanced) or updating and rebalancing the tree such that all of the leaf nodes are the same distance (or distance-1) from the root, again in a well-defined deterministic manner.
I am not suggesting leaving the implementation up to the STL or any other form of library.
I don't believe Patricia trees are "simpler" when measured in the amount of human learning and neurons one must dedicate to understanding the concept. That doesn't mean I think it's too hard to learn, but rather, I doubt the cost (measured in terms of complexity of the specification) is worth the benefit.
If you tell a developer, "Now you've got to learn what a Patricia tree is", and then "Now that you've implemented it, you've got to simulate numerous cases of rollbacks to test and feel good that your implementation works backwards as well as forward" you have just made many more developers say "to hell with it, I'll develop something else".
... not to mention implementing a chain reorg strategy consisting of "now talk to your peers and start asking them for now-orphaned blocks (hopefully they have them still) so you can roll your tree back intact" rather than starting with a copy of the tree at or before the point in time of the split and rolling it forward.
Either way, the developer has to get into the implementation details of the data structure. They have to understand it. And really, neither structure is particularly complicated. Perhaps some devs are more familiar with BSTs. But to say that a miner "has the option" to rebalance -- that doesn't make sense. Any rebalancing operation on a BST will change the root hash. It must be determined from the start exactly when and how rebalance ops will happen. Or else everyone gets different answers. And as for the comment about "simulating numerous cases of rollbacks" -- This case is dramatically simpler with a patricia tree structure -- you just add and remove elements from the tree using standard insert & delete operations. It doesn't get much simpler than that (besides maybe keeping around the last few blocks worth of deleted TxOuts, which is probably a few kB). On the other hand, you may be talking about gigabytes of data to store "backups" or "snapshots" of the BST, just in order to accommodate the possibility of a rollback. And how many copies do you need to store? You can keep the last state of the tree, but what if there's a 2-block reorg? Well, now you need two copies. To handle arbitrary-sized rollbacks, you could really be thrashing your hard-drive, and in such a way that everything has changed while you were attempting to swap gigabytes of data around.
|
|
|
|
casascius
Mike Caldwell
VIP
Legendary
Offline
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
|
|
November 04, 2012, 07:06:04 PM |
|
Either way, the developer has to get into the implementation details of the data structure. They have to understand it. And really, neither structure is particularly complicated. Perhaps some devs are more familiar with BSTs. But to say that a miner "has the option" to rebalance -- that doesn't make sense. Any rebalancing operation on a BST will change the root hash. It must be determined from the start exactly when and how rebalance ops will happen. Or else everyone gets different answers.
I will clarify. For every block, given the set of transactions contained in that block, there are 2 potential hash values that are acceptable as the root hash. One of them represents the tree with the transactions applied to them. This case is checked first, because it's the least expensive for a client to do so. The second one represents the tree after it has been completely rebalanced. A client should have no problem determining which way it went simply by trying the first case, and then the second. And as for the comment about "simulating numerous cases of rollbacks" -- This case is dramatically simpler with a patricia tree structure -- you just add and remove elements from the tree using standard insert & delete operations. It doesn't get much simpler than that (besides maybe keeping around the last few blocks worth of deleted TxOuts, which is probably a few kB). On the other hand, you may be talking about gigabytes of data to store "backups" or "snapshots" of the BST, just in order to accommodate the possibility of a rollback. And how many copies do you need to store? You can keep the last state of the tree, but what if there's a 2-block reorg? Well, now you need two copies. To handle arbitrary-sized rollbacks, you could really be thrashing your hard-drive, and in such a way that everything has changed while you were attempting to swap gigabytes of data around.
How do you envision rolling the tree back in the case where you have just determined that all of the blocks you have on hand are now invalid, and getting the now-correct state of the Patricia meta tree requires you to ask peers for orphaned blocks you don't have? Must future Bitcoin implementations be required to keep orphan blocks on hand and serve them to peers to support the ability of others to roll their tree backwards? In perspective, it's not the idea of Patricia or any other kind of tree I am having a problem with, it's the added complexity of supporting this sort of roll back that goes well beyond understanding a new kind of tree.
|
Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable. I never believe them. If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins. I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion. Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice. Don't keep coins online. Use paper or hardware wallets instead.
|
|
|
jojkaart
Member
Offline
Activity: 97
Merit: 10
|
|
November 04, 2012, 07:08:50 PM |
|
This sort of a developer is dangerous to have developing Bitcoin's internal structures. I think it's better they stay away.
Which sort of developer? The one who revels in complexity, as though complexity breeds integrity? This guy is surely already busy on his first implementation of Bitcoin, in assembly language. He'll be done by 2017, assuming the architecture he's developing for is still popular enough that people will be able to run it. Or do you mean the one who walks away? And this benefits bitcoin because the fewer clients, the better? No, the developer you described clearly has no patience to test his code so that it works properly. We're better off without such developers.
|
|
|
|
casascius
Mike Caldwell
VIP
Legendary
Offline
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
|
|
November 04, 2012, 07:15:11 PM Last edit: November 04, 2012, 07:32:31 PM by casascius |
|
And as for the comment about "simulating numerous cases of rollbacks" -- This case is dramatically simpler with a patricia tree structure -- you just add and remove elements from the tree using standard insert & delete operations. It doesn't get much simpler than that (besides maybe keeping around the last few blocks worth of deleted TxOuts, which is probably a few kB). On the other hand, you may be talking about gigabytes of data to store "backups" or "snapshots" of the BST, just in order to accommodate the possibility of a rollback. And how many copies do you need to store? You can keep the last state of the tree, but what if there's a 2-block reorg? Well, now you need two copies. To handle arbitrary-sized rollbacks, you could really be thrashing your hard-drive, and in such a way that everything has changed while you were attempting to swap gigabytes of data around.
I don't believe the snapshots of the tree would be gigabytes. I mentioned previously a simple scheme where each snapshot of the tree has a lifetime of 2^n blocks, where n is the number of binary zeroes the block height ends with. So if the current block height is 0x12345, then you can expect to be storing the trees for 0x12344, 0x12340, 0x12300, 0x12200, 0x12000, 0x10000, and 0x0. So you have a way to restore and roll forward any rollback simply by requesting blocks from peers as is already supported. To address the specific case of "what about a 2-block reorg", at 0x12345, you'd use the backup at 0x12340 and move forward. EDIT: Using 2^(n+1) might be better so that (for example) upon reaching block 0x20000, a two-block reorg does not require a fetch all the way back from 0x10000. So, at 0x12345, using 2^(n+1) you would have: 0x12344, 0x12342, 0x12340, 0x12338, 0x12300, 0x12280, 0x12200, 0x12100, 0x12000, 0x11000, 0x10000, 0x8000, and 0x0. At 0x20000 instead of just 0x10000 and 0x0, you would have: 0x1ffff, 0x1fffe, 0x1fffc, 0x1fff8, 0x1fff0, 0x1ffe0, 0x1ffc0, 0x1ff80, 0x1ff00, 0x1fe00, 0x1fc00, 0x1f800, 0x1f000, 0x1e000, 0x1c000, 0x18000, 0x10000, and 0x0. This is sort of the scheme I had in mind when I originally scribbled down 2^n but before actually calculating it out made me realize I'd be storing far less than I was shooting for.
|
Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable. I never believe them. If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins. I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion. Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice. Don't keep coins online. Use paper or hardware wallets instead.
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
November 04, 2012, 07:17:03 PM |
|
This sort of a developer is dangerous to have developing Bitcoin's internal structures. I think it's better they stay away.
Which sort of developer? The one who revels in complexity, as though complexity breeds integrity? This guy is surely already busy on his first implementation of Bitcoin, in assembly language. He'll be done by 2017, assuming the architecture he's developing for is still popular enough that people will be able to run it. Or do you mean the one who walks away? And this benefits bitcoin because the fewer clients, the better? No, the developer you described clearly has no patience to test his code so that it works properly. We're better off without such developers. I'm not sure what you guys are talking about. If you disagree with meta-chain at all, then state it as such. Otherwise, this discussion is about two different mechanisms to achieve the same end result. My core argument is the the trie-based solution is much less complex overall, easier to implement and get right, and has numerous other benefits -- such as dead-simple rollbacks and the whole thing is parallelizable (different threads/CPUs/servers can maintain different sub-branches of a patricia tree, and report their results can easily be accumulated at the end -- this is not possible with BSTs). If you want to contribute to this discussion, then please do.
|
|
|
|
casascius
Mike Caldwell
VIP
Legendary
Offline
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
|
|
November 04, 2012, 07:21:15 PM |
|
This sort of a developer is dangerous to have developing Bitcoin's internal structures. I think it's better they stay away.
Which sort of developer? The one who revels in complexity, as though complexity breeds integrity? This guy is surely already busy on his first implementation of Bitcoin, in assembly language. He'll be done by 2017, assuming the architecture he's developing for is still popular enough that people will be able to run it. Or do you mean the one who walks away? And this benefits bitcoin because the fewer clients, the better? No, the developer you described clearly has no patience to test his code so that it works properly. We're better off without such developers. The best piece of code is the one that implements a specification whose complexity and edge cases require no testing because they don't exist. We're better off with such specifications.
|
Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable. I never believe them. If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins. I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion. Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice. Don't keep coins online. Use paper or hardware wallets instead.
|
|
|
jojkaart
Member
Offline
Activity: 97
Merit: 10
|
|
November 04, 2012, 07:30:26 PM |
|
I will clarify. For every block, given the set of transactions contained in that block, there are 2 potential hash values that are acceptable as the root hash. One of them represents the tree with the transactions applied to them. This case is checked first, because it's the least expensive for a client to do so. The second one represents the tree after it has been completely rebalanced. A client should have no problem determining which way it went simply by trying the first case, and then the second.
The problem here is that the full rebalancing operation requires everyone to run the rebalancing algorithm to even verify it was done correctly. This means it has to be optimized so that even weaker systems are able to do it. Otherwise, there's no point in including the simpler algorithm. However, if you do optimize it that way, then the point of having the simpler algorithm vanishes completely and the whole design ends up simpler by just having the full rebalance algorithm.
|
|
|
|
2112
Legendary
Offline
Activity: 2128
Merit: 1073
|
|
November 04, 2012, 07:33:02 PM |
|
Otherwise, this discussion is about two different mechanisms to achieve the same end result. My core argument is the the trie-based solution is much less complex overall, easier to implement and get right, and has numerous other benefits -- such as dead-simple rollbacks and the whole thing is parallelizable (different threads/CPUs/servers can maintain different sub-branches of a patricia tree, and report their results can easily be accumulated at the end -- this is not possible with BSTs). If you want to contribute to this discussion, then please do.
Again, the claim "this is not possible with BSTs" about impossibility of parallelism in b-trees is false. I wonder what is going on here?
|
|
|
|
casascius
Mike Caldwell
VIP
Legendary
Offline
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
|
|
November 04, 2012, 07:36:30 PM |
|
The problem here is that the full rebalancing operation requires everyone to run the rebalancing algorithm to even verify it was done correctly. This means it has to be optimized so that even weaker systems are able to do it. Otherwise, there's no point in including the simpler algorithm. However, if you do optimize it that way, then the point of having the simpler algorithm vanishes completely and the whole design ends up simpler by just having the full rebalance algorithm.
I can't imagine that the rebalancing algorithm is going to be costlier in CPU time than validating sets of ECDSA signatures on incoming blocks as is already required. The most expensive operation in rebuilding the tree is SHA256 hashing. We're doing quadrillions of these hashes network-wide every 10 minutes. What's a few million more per node, once every 10 minutes? I can see wanting to avoid making every miner node rebalance the tree in response to every incoming transaction (i.e. every roll of Satoshi Dice) - this would motivate miners to mine empty blocks, and the reason for having the simpler option. But a possibility of doing a rebalance capped with a maximum of once per block sounds plenty realistic.
|
Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable. I never believe them. If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins. I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion. Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice. Don't keep coins online. Use paper or hardware wallets instead.
|
|
|
Pieter Wuille
|
|
November 04, 2012, 07:38:31 PM Last edit: November 04, 2012, 07:49:18 PM by Pieter Wuille |
|
Not sure how up-to-date you guys are with development of the reference client, but in the 0.8 validation engine ("ultraprune"), a first step towards ideas like proposed in this thread was taken. It may be interesting in this discussion.
We now do keep an explicit set of unspent transaction outputs, but 1) indexed by txid (as that is necessary for block validation) instead of by address, and 2) without explicit tree structure or authentication. Still, some of it is relevant.
As casascius proposed, it only keeps the currently-active UTXO state, and no effort is done to keep older trees available by sharing subtrees. Keeping full snapshots of older trees is a bad idea in my opinion (and re-rolling the entire history even more so), but on-disk files with data to 'undo' the applications of blocks to the UTXO set are an easy solution. If you see blocks as patches to the UTXO structure, these undo files are the reverse patches. They are only necessary for reorganisations, and as that happens far less frequently than accessing the currently active tree state, they don't need super-fast access anyway (simple rule in optimizing for cache effects: if two data sets have different access patterns, don't mix them).
If the block headers commit to (the hash of) these undo files as well as the merkle root of the current UTXO state, clients could ask their peers for backward movement data, which is enough to release a client stuck in a side chain without transferring the entire UTXO state. I see no reason for not doing this.
So, progressing to the full idea, the first step is adding an authentication/merkle structure on top of the UTXO state. Only a hash in the coinbase is necessary that commits to the current state and the undo data hash to move back to the previous block, is needed. In case of not fully deterministic data structures (like balanced trees, as opposed to tries/patricia trees), the undo data perhaps needs to be extended with structural undo data, to have enough information to roll back to the exact same previous structure.
The last step is extending this to an address-to-txid index, or something equivalent. I don't think this will be hard, if you already have everything above.
PS: giving miners the choice to either rebalance or not is a bad idea, imho, as it leads to increased variation (and worse: variation under control of miners) in block validation. Especially since balancing after one addition is typically O(log n), and a full rebalancing of the entire tree is O(n).
|
I do Bitcoin stuff.
|
|
|
|