Peter Todd
Legendary
Offline
Activity: 1120
Merit: 1152
|
|
February 13, 2013, 01:11:04 AM |
|
This requires the tree to be completely regenerated for each block, since the hash of every transaction would change.
Read the rest of the message - that's just a toy example to illustrate the general concept. My worked example does not require the whole tree to be recalculated nor does it require the block number for lookup. The attack on the system I suggested would require having lots of transactions that has the same initial bits in their hash.
If you had 1 million of them and added them to the tree, then it would become quite deep. Each time you add one, the nodes would have to recalculate the entire path from root to that node and redo all the hashing. The effort per node would be proportional to N squared, where N is the number of collisions.
I think your misunderstanding stems from the idea that the datastructure and the authenticating hash are the same thing. They aren't, they're quite separate. The authenticating hash is an addition to the datastructure, and can be calculated independently. Take your example of adding a million transactions, which is unrealistic anyway as a 1MiB block can't have more than a few thousand. In a radix tree what you would do is first add each transaction to the tree, keeping track of the set of all modified nodes. Then once you had finished the update you would recalculate the hashes for each changed node, deepest first, in a single O(n) operation. A real Bitcoin node would basically have a limit on how much CPU time it was willing to use on adding transactions to the UTXO set, and collect incoming transactions into batches small enough to stay under that limit. Given we're only talking tens of updates/second chances are this optimization wouldn't even be required anyway. Anyway, thinking about this a bit more, on reflection you are right and I think this unbalanced tree stuff is a total non-issue in any radix tree or similar structure where depth is a function of common-prefix length. For the 1.6million tx's in the UTXO set you would expect a tree 20 branches deep. On the other hand to find a transaction with n bits in common with another transaction requires n^2 work, so right there you aren't going to see depths more than 64 or so even with highly unrealistic amounts of effort thrown at the problem. (bitcoin has done a 66bit zero-prefix hash IIRC) The issue with long depths is purely proof size, and 64*n isn't much worse than 20*n, so as you suggest, why make things complicated? Too bad, it was a clever idea. If you really need to prove a tx was in the UTXO set as of the current best block, either wait for another block to be generated, or prove that it isn't in the UTXO set of the previous block, and prove that it is in the merkle tree of the current best block.
You will probably have to do something like that anyway. The plan was to merge mine, so some bitcoin blocks won't have matching tree root node hashes. So, prove it wasn't spent up to 10 blocks previous and then check the last 10 blocks manually. Yeah, until this is a hard and fast network rule you'll have to check that the hashing power devoted to these UTXO proofs is sufficient. Speaking of... we should have every one of these UTXO things have a reference to the previous valid UTXO set. This would turn the whole shebang into a proper chain and thus allow clients to figure out both which is the longest chain, and how much hashing power is being devoted to it. Equally if we think in terms of merge-mining a chain, adding support for additional UTXO indexes, such as known scriptPubKeys and so forth, is just a matter of adding additional chains to be merge mined, and UTXO-only and UTXO+scriptPubKey can co-exist just fine in this scenario. A disadvantage is we'll need to add some stuff to the P2P network, but I have another idea... So right now, merge-mined chains have the problem that the proof of work goes through the coinbase transaction. The issue here is that to prove a path to the proof of work, you need the whole coinbase transaction, and it can be quite large, for instance in the case of P2Pool or Eligius. So I'm proposing a new standard, where one transaction of the following form is used to include an additional merge-mined digest, and that transaction will always contain exactly one txin, and one txout of value zero using the following: scriptSig: <32-byte digest> scriptPubKey: Any digest matching this form will be assumed to represent the miner's hashing power, thus miners should not allow such transactions into their blocks blindly. They are currently non-standard, so this will not happen by default, and the scriptSig has no legitimate use. The scriptPubKey is spendable by the scriptSig, so for the txin miners would usually use the txout created by a previous block following this standard. If none are available the miner can insert an additional zero-fee transaction creating a suitable txout (of zero value!) in the block. The digest would of course represent the tip of a merkle tree. Every merge mined digest in that tree will have a zero byte appended, and then the digests will be hashed together. What would go in the alt-chain is then the merkle path, that is every leaf digest required to get to the transaction, and what side the leafs were on. Note how this is different from the merge-mining standard currently used by namecoin, and fixes the issues it has with conflicting slot numbers. Appending the zero byte is critical, because that action means to verify the an alt-chain block hash was legitimately merge-mined you simply check that every other digest in the path to the transaction has exactly 32 bytes, and that the transaction itself follows the above form. Note this also means the miner has the flexibility to use something other than a merkle tree to combine the alt-chain block hashes if they want; alt-chains should put reasonable limits of PoW size. Alt-chains should also still support the same idea in the coinbase, for miners that don't plan on making large coinbase transactions. (the majority) Now, the useful part, is we can easily add more than one of these transactions, and distinguish them by different sequence numbers in the txin. You would do that for the UTXO set, with one value of defined for just the UTXO set itself, another for a scriptPubKey index, and whatever else gets added as the system is improved. The previous block where this miner considered the merge-mined UTXO digest to be valid can be specified with nLockTime. The advantage of this idea is that because the final UTXO digests are completely deterministic we don't need to build a new P2P network to pass around the digests in the merkle path to the coinbase. This also shows the advantage of using a separate transaction, because it keeps the UTXO proof size to a minimum, important for applications like fidelity bonded banks where UTXO proofs would become part of fraud proofs; we need to keep proof size to a minimum. Equally being able to prove the hashing power devoted to the UTXO set merge-mine chain is useful, and the only way to do that is provide a bunch of recent UTXO block headers and associated PoW's. Again, keeping the size down for this is desirable as the SPV node examining those headers may be on a low-bandwidth connection.
|
|
|
|
TierNolan
Legendary
Offline
Activity: 1232
Merit: 1104
|
|
February 13, 2013, 01:39:22 AM |
|
Take your example of adding a million transactions, which is unrealistic anyway as a 1MiB block can't have more than a few thousand. You could do them over multiple blocks. However, the max depth of the tree is only 256 levels, so you can't actually kill the receiving node and getting a hash match would be almost impossible anyway. In a radix tree what you would do is first add each transaction to the tree, keeping track of the set of all modified nodes. Then once you had finished the update you would recalculate the hashes for each changed node, deepest first, in a single O(n) operation. O(NLog(N)) ? Each node may cause an update to a full path to the bottom of the tree. Anyway, thinking about this a bit more, on reflection you are right and I think this unbalanced tree stuff is a total non-issue in any radix tree or similar structure where depth is a function of common-prefix length. For the 1.6million tx's in the UTXO set you would expect a tree 20 branches deep. On the other hand to find a transaction with n bits in common with another transaction requires n^2 work, so right there you aren't going to see depths more than 64 or so even with highly unrealistic amounts of effort thrown at the problem. (bitcoin has done a 66bit zero-prefix hash IIRC)
It is 2^n work, not n^2, I assume a typo? The issue with long depths is purely proof size, and 64*n isn't much worse than 20*n, so as you suggest, why make things complicated?
Right, and it only affects that transaction anyway. Yeah, until this is a hard and fast network rule you'll have to check that the hashing power devoted to these UTXO proofs is sufficient. I have to look up the details of merged mining, but I think you get the same protection as the main chain (at least for the blocks where it works)? Speaking of... we should have every one of these UTXO things have a reference to the previous valid UTXO set. Absolutely. I assumed that was the plan. Equally if we think in terms of merge-mining a chain, adding support for additional UTXO indexes, such as known scriptPubKeys and so forth, is just a matter of adding additional chains to be merge mined, and UTXO-only and UTXO+scriptPubKey can co-exist just fine in this scenario. A disadvantage is we'll need to add some stuff to the P2P network, but I have another idea...
Right, a general way to start new chains would be a good idea. There would need to be some way to "register" a genesis hash and then that chain would only be used for 1 purpose. The key is keeping agreement on the rules for the parallel chains. So right now, merge-mined chains have the problem that the proof of work goes through the coinbase transaction. Yeah, need to look up the process. The issue here is that to prove a path to the proof of work, you need the whole coinbase transaction, and it can be quite large, for instance in the case of P2Pool or Eligius. So I'm proposing a new standard, where one transaction of the following form is used to include an additional merge-mined digest, and that transaction will always contain exactly one txin, and one txout of value zero using the following:
scriptSig: <32-byte digest> scriptPubKey:
Are transactions with 0 in 0 out allowed under the spec? Any digest matching this form will be assumed to represent the miner's hashing power, thus miners should not allow such transactions into their blocks blindly. They are currently non-standard, so this will not happen by default, and the scriptSig has no legitimate use. The scriptPubKey is spendable by the scriptSig, so for the txin miners would usually use the txout created by a previous block following this standard. If none are available the miner can insert an additional zero-fee transaction creating a suitable txout (of zero value!) in the block. So, this basically creates a chain that has been stamped over and over with the output of one tx being the input to the next one. What happens if you have something like Root -> none -> none -> Valid1 (from root) -> Valid2 (from valid1) -> Invalid1 (from valid2) -> stamped (from Invalid2) -> .... There is no way to remove the stamped one. Better would be including <sub-chain-id> <previous valid root in sub-chain> <new valid root> - Anyway need to read up on merged mining - The digest would of course represent the tip of a merkle tree. Every merge mined digest in that tree will have a zero byte appended, and then the digests will be hashed together. What would go in the alt-chain is then the merkle path, that is every leaf digest required to get to the transaction, and what side the leafs were on. Note how this is different from the merge-mining standard currently used by namecoin, and fixes the issues it has with conflicting slot numbers.
Appending the zero byte is critical, because that action means to verify the an alt-chain block hash was legitimately merge-mined you simply check that every other digest in the path to the transaction has exactly 32 bytes, and that the transaction itself follows the above form. Note this also means the miner has the flexibility to use something other than a merkle tree to combine the alt-chain block hashes if they want; alt-chains should put reasonable limits of PoW size.
Alt-chains should also still support the same idea in the coinbase, for miners that don't plan on making large coinbase transactions. (the majority)
Now, the useful part, is we can easily add more than one of these transactions, and distinguish them by different sequence numbers in the txin. You would do that for the UTXO set, with one value of defined for just the UTXO set itself, another for a scriptPubKey index, and whatever else gets added as the system is improved. The previous block where this miner considered the merge-mined UTXO digest to be valid can be specified with nLockTime. The advantage of this idea is that because the final UTXO digests are completely deterministic we don't need to build a new P2P network to pass around the digests in the merkle path to the coinbase.
This also shows the advantage of using a separate transaction, because it keeps the UTXO proof size to a minimum, important for applications like fidelity bonded banks where UTXO proofs would become part of fraud proofs; we need to keep proof size to a minimum. Equally being able to prove the hashing power devoted to the UTXO set merge-mine chain is useful, and the only way to do that is provide a bunch of recent UTXO block headers and associated PoW's. Again, keeping the size down for this is desirable as the SPV node examining those headers may be on a low-bandwidth connection.
|
1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
|
|
|
Peter Todd
Legendary
Offline
Activity: 1120
Merit: 1152
|
|
February 13, 2013, 05:00:19 AM Last edit: February 13, 2013, 05:25:47 AM by retep |
|
In a radix tree what you would do is first add each transaction to the tree, keeping track of the set of all modified nodes. Then once you had finished the update you would recalculate the hashes for each changed node, deepest first, in a single O(n) operation. O(NLog(N)) ? Each node may cause an update to a full path to the bottom of the tree. No actually. Every operation in a radix tree takes O(k) time, where k is the maximum length of the key in the set. Since Bitcoin transactions have a fixed key length of 256 bits, that's O(1) time. Additionally since the number of intermediate nodes created for a transaction in the tree can't be more than k, a radix tree is O(k*n) space; again O(n) space for Bitcoin. I mean, it's not so much that you are wrong, it's just that the log2(n) part is bounded by a fixed small number so it really is appropriate to just say O(n), and as I explained, updating a batch of n transactions, especially given that n << N (where N is the total size of the txout set) is an efficient operation. Note that the n in O(n) is the number of new transactions, not the number of existing ones. It is 2^n work, not n^2, I assume a typo?
Oops, good catch. Yeah, until this is a hard and fast network rule you'll have to check that the hashing power devoted to these UTXO proofs is sufficient. I have to look up the details of merged mining, but I think you get the same protection as the main chain (at least for the blocks where it works)? For determining if a block in an alt-chain can be reversed you could be correct under some rules, but in this case each valid PoW is really more like a vote that the UTXO set mined by that PoW is valid. Thus the protection is only the hashing power that mined the blocks containing the PoW's. Speaking of... we should have every one of these UTXO things have a reference to the previous valid UTXO set. Absolutely. I assumed that was the plan. You'd hope so, but I'll have to admit I somehow didn't realize that until today. Right, a general way to start new chains would be a good idea. There would need to be some way to "register" a genesis hash and then that chain would only be used for 1 purpose.
The key is keeping agreement on the rules for the parallel chains.
It's tricky though, because the PoW can mean totally different things for different types of chains. Not to mention how for any application but timestamping you need to write a whole set of chain rules too. That said, following a single merge-mining standard is a good thing; I only proposed this new one because as far as I know namecoin is the only one using the existing multi-chain standard, and that standard sucks. However: scriptSig: <32-byte digest> scriptPubKey:
Are transactions with 0 in 0 out allowed under the spec? They sure are! The only thing a tx needs is one or more txin's and one or more txouts. Both scriptSigs and scriptPubKeys are allowed to be empty, and the value can be empty. (although you can't spend an empty scriptSig with an empty scriptPubKey; something needs to push a true value to the stack) So, this basically creates a chain that has been stamped over and over with the output of one tx being the input to the next one.
What happens if you have something like
Root -> none -> none -> Valid1 (from root) -> Valid2 (from valid1) -> Invalid1 (from valid2) -> stamped (from Invalid2) -> ....
Not quite. The txin is just there because a Bitcoin transaction is only valid if it has a txin; what is actually in the txin is totally irrelevant. The link to the previous "considered good" block has to be within the UTXO header and that nLockTime would best be used only as an auxillary bit of data to allow nodes to reproduce the UTXO chain block header after they deterministicly compute what the state of the UTXO tree would have been with that blocks transactions included. It's just a way of avoiding the pain of implementing the P2P network that really should be holding that data, and getting something working sooner. It's a solution that uniquely applies to a UTXO merge-mined alt-chain; no other type of chain would allow a trick like that.
|
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
February 13, 2013, 11:33:42 AM |
|
Not much time to respond now, but I wanted to point out the the PATRICIA tree concept has no balancing issues -- if it looks like it does, it's strictly an illusion. The reason is this: Take a standard, non-level-compressed trie, as shown in my previous post. There is no question that this is a O(1) data structure: if you use 256-way trie, it always takes exactly 20 hops to get from root to the leaf of a 20-byte address string. It has O(1) query, O(1) insertion, O(1) deletion. All operations on a PATRICIA tree are strictly less than the equivalent implementation of a trie. Basically, the upper bound of computation time for any PATRICIA tree operation is that of the non-level-compressed trie. It's just that the number of hops that you shortcut as a benefit of level-compression, is variable depending on the data in the tree/trie. The amount of operations you get to skip can be altered by an "attacker", but the worst they can do to you is require the performance of a trie -- which is O(1). Re: skip strings: there's no way you can use a basic trie for this -- the space overhead of all the intermediate (and unnecessary) nodes would overwhelm the data that is being stored. In fact, even with level compression and linked-list nodes, I'm still very concerned about the space overhead -- I suspect we may be storing something like 3x the size of the underlying UTXO set just to store this DB. I just pulled that number out of nowhere (3x), because I haven't rigorously explored it for various branching factors ... but it was one of the downsides of the trie-based structures compared to the BSTs.
|
|
|
|
TierNolan
Legendary
Offline
Activity: 1232
Merit: 1104
|
|
February 13, 2013, 02:39:06 PM |
|
Not much time to respond now, but I wanted to point out the the PATRICIA tree concept has no balancing issues -- if it looks like it does, it's strictly an illusion.
Right. Also, since the keys are crypt hashes, it is even hard to push the tree much deeper than the expected depth, even if you want to. Re: skip strings: there's no way you can use a basic trie for this -- the space overhead of all the intermediate (and unnecessary) nodes would overwhelm the data that is being stored. The skip strings are over complex, since everything is "random". I was thinking of having a "leaf" type node. The skip system is just over complex. A parent node has a pointers to its children. A leaf node has the 256 bit hash. If you have a 256 pointer array for every node, then you have to many of them. The bottom level of parent nodes will likely have a small number of children. If you implement it as a 256 pointer array, then for each leaf, you are adding 128 unneeded pointers (128 to each of the 2). On a 32-bit machine, that is 128 * 4 = 512 bytes. You data is a 32 byte hash, so the overhead is massive. However, that assumes that the 2nd to bottom nodes only have 2 sub-nodes. Not sure what the actual odds are. An more efficient structure would be. TreeNode { private final int level; // the depth of this node private final byte key; // the key byte = hash[level]
public boolean matchesKey(byte[] hash) { return key[level] == this.key; } }
ParentNode extends TreeNode {
private TreeNode[] children; // power of 2 array of children
public TreeNode getMatchingChild(byte[] hash) { TreeNode child = children[hash[level + 1] % children.length]; if (child == null) { return null; } return child.matchesKey(hash) ? child : null; }
}
LeafNode extends TreeNode {
byte[] data = new byte[32];
}
To add a node you use root.getMatchingChild() until null is returned. Then you have to add the hash to the last non-null Node. Having said that, that is the data structure. The merkle tree structure would also need to be specified.
|
1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
February 13, 2013, 03:08:57 PM |
|
The benefit of using PATRICIA/Brandais hybrid is that it's a "standard" datastructure. And it's one that seems to fit this use case pretty perfectly. I just followed you pseudocode and I think you are basically proposing the same thing as the PATRICIA tree, but using implicit skip strings, instead of explicit ones. It can be reasonably argued that the skip strings are not necessary to authenticate the structure, but I think you are introducing extra complexity into the algorithms for inserting and deleting nodes, in exchange for simplifying authentication. It's very difficult to describe without a well-thought-out example, but you can probably see it yourself if you try to work out the logic for worst-case-logic-complexity of an insert or delete operation. It involves uncompressing, inserting/removing nodes, recompressing on both sides, and then rearranging a ton of pointers. If the skip strings are not attached to the nodes, you have to recompute it from the children which involves a lot of string compares, and probably a bit slower. I had to program the insert function for a patricia tree in as a homework assignment in college, once. It was a mess... and kinda fun As for the comment about node overhead: I couldnt' follow your math (looks like you were missing 256-way trienodes with binary trie nodes), but I think something you're overlooking is the linked-lists for the pointers at each node (assuming some power of 2 branching factor more than 2^1). This means that nodes with only one child, only store one pointer. It is wasteful for the top levels where you have linked-list overhead for super-dense nodes, but the majority of data is near the leaves where the optimization saves you a ton of space. Plus, because the top nodes are constantly changing, you can optimize them in code pretty easily to be more pleasant for both query-time and space-consumption. I need to think a little harder about the benefits of using a binary-PATRICIA tree... it pretty much removes the necessity for the Brandais aspect of it (compressing node children into linked lists), and certainly adds a bit of compute-efficiency to it at the expense and taking more space (I think the binary trie will be faster at updating the authentication data, but will have more intermediate/branch nodes to be stored). Unfortunately, I have too many super-high priority things on Armory in the next month or two, so there's no way I can get around to playing with this, yet. However, I may eventually be doing some kind of electrum-style version Armory, in which I will split Armory both ways -- into an Armory supernode and Armory litenode. The supernode will do something like we're discussing here, and the lite nodes will depend on having access to a trusted supernode. It sounds like a good time to prototype this idea...
|
|
|
|
TierNolan
Legendary
Offline
Activity: 1232
Merit: 1104
|
|
February 13, 2013, 03:30:31 PM |
|
As for the comment about node overhead: I couldnt' follow your math (looks like you were missing 256-way trienodes with binary trie nodes),
I was just looking at the diagram . It shows that each node has an full sized array. With a binary tree, that is just 2 pointers so almost no cost. For a 256-way tree, this means that the 2nd to lowest nodes (which would be sparse) would be 256 element arrays and only have a small number of children. So, the cost per child is quite high. With the way I suggested it, you get the benefits of a 256 way tree, but still have small nodes. That is the different between 256 pointer lookups vs 32 pointer lookups. The size of a binary tree is approx double the number of children, which isn't that much larger than a 256 way one. Under a HashMap scheme, the toplevel nodes would have more than 128 children, so would all end up as a flat array. I need to think a little harder about the benefits of using a binary-PATRICIA tree... it pretty much removes the necessity for the Brandais aspect of it (compressing node children into linked lists), As I said, hashmaps should be used instead of the linked lists. Unfortunately, I have too many super-high priority things on Armory in the next month or two Yeah, days need more than 24 hours. I think the easiest is to define the merkle tree in a way that is easiest to visualize and leave implementation details until later. Ofc, that assumes the merkle structure doesn't cause implementation problems. If you defined the merkle hash of a node with only child as equal to that child's hash, then you can just define the tree as a full depth tree and leave pruning to the implementation. Hash(leaf-node) = leaf's key Hash(parent with one child) = Hash(child) Hash(parent with 2 children) = Merkle(Hash(child1), Hash(child2))
|
1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
|
|
|
TierNolan
Legendary
Offline
Activity: 1232
Merit: 1104
|
|
February 14, 2013, 10:33:03 AM |
|
I was thinking about proving that the alt chain is correct for the light weight clients.
There could be an additional "falsification" link where you show that a link was not valid.
A falsification node basically jumps the chain back to before that node.
A -> B -> C -> D -> False(C)
Chain becomes
A -> B -> C -> D -> False(C) -> C* -> D*
The false node wouldn't require any difficulty. Since C and D both met the difficulty requirements, this doesn't actually cause spam. Also, the false node should ideally propagate as fast as possible and not be mined in.
The false node could be of the format
Hash of last node Hash of false node Hash of proof
The proof could be an "attachment" but isn't used by C* when forming the "parent node" hash. That way light clients can download the headers only.
Light clients would just make sure that the alt chain forms a chain. False links don't have to be merge mined, since they can be directly proven. The link would incorporate all the data required to
Light nodes could do random checking as they download the chain. If each checker checks 0.1% of the updates, then with 1000 users, most historical errors would be detected. Nodes could also check 1% of all new updates.
Also, light nodes could check new links at random for the same reason.
The "expanded" data for the node would include all txs added and removed for that node. A light node could then check that this causes the update as claimed.
I think the tree should include the block in which the TXO was created as part of its "hash".
The "signature" of an UTXO would be {hash(tx),var_int(block number}, var_int(transaction number), var_int(output number)}. This would add 5 extra bytes to the 32 byte hash.
This allows the light node to query the main bitcoin chain to check the transaction inputs.
I wonder if an all light mode network would be possible under this system. Each node would only have to store some of the bitcoin blockchain data, as long as the total storage capacity was large enough to store it (with a large margin).
Random checking would pick up any double spend attempts very quickly.
|
1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
February 14, 2013, 12:38:44 PM |
|
If it looks like the sparse, lower-level nodes have 256 pointers, you're looking at the wrong picture (or I didn't explain it well enough). The Brandais part is where the list of pointers is compressed to a linked list and thus there is only as many pointers as there are nodes (well, a little more, for the linked-list forward-pointers). This does increase search time to have to forward-traverse the linked list at each node, but they will be ordered, which means that it can be further optimized, and those kinds of lookups should be fast (in fact, they can be easily stored as simple lists, replacing the list on every update, since it's probably just as fast to do that with disk-based DB accesses). The important part is that if a parent has 5 children, those 5 children are in lexicographical order, and only their 5 "fingerprints" will be used in the computation of the parent's "fingerprint."
I don't think a hashmap works for this. I guess it depends on the kind of hashmap you're talking about -- but if it's a "simple" one where there are lots of collisions, you end up with some non-determinism based on insert order, and removing elements is complicated. But maybe I don't totally understand what you're proposing.
|
|
|
|
TierNolan
Legendary
Offline
Activity: 1232
Merit: 1104
|
|
February 14, 2013, 01:41:06 PM |
|
If it looks like the sparse, lower-level nodes have 256 pointers, you're looking at the wrong picture (or I didn't explain it well enough). The Brandais part is where the list of pointers is compressed to a linked list and thus there is only as many pointers as there are nodes (well, a little more, for the linked-list forward-pointers). I think we are discussing different trees. I was effectively suggesting an improvement on the brandais modification, use a hashmap instead of a linked list. Basically, have a power of 2 array of pointers. Then you can see if there is a match with Child c = arr[keyByte & (length-1)]
if (c.getKeyByte() == keyByte) { // child matches }
If not, then it is a hash collision and you need to double the size of the array and re-hash. The array could be reduced in size on removal, say when it drops below 25%. I don't think a hashmap works for this. I guess it depends on the kind of hashmap you're talking about -- but if it's a "simple" one where there are lots of collisions, you end up with some non-determinism based on insert order, and removing elements is complicated. But maybe I don't totally understand what you're proposing.
The only non-determinism is that some insertions will be instant (no collision) and some will require re-hashing. This could slow things down. If that is an issue you could spread the re-hash out over many insertion and deletions. So, when you insert a key, it might does 2 re-hash steps per level, but is bounded. Also, removal would need a rule for when to shrink the array. Maybe, increase when a collision occurs and decrease if < 25% full. However, that could oscillate.
|
1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
|
|
|
TierNolan
Legendary
Offline
Activity: 1232
Merit: 1104
|
|
February 15, 2013, 10:04:18 PM |
|
Are transactions with 0 in 0 out allowed under the spec?
They sure are! The only thing a tx needs is one or more txin's and one or more txouts. Both scriptSigs and scriptPubKeys are allowed to be empty, and the value can be empty. (although you can't spend an empty scriptSig with an empty scriptPubKey; something needs to push a true value to the stack) So, either use a previous "open" one of these transactions, or create a new one. Are transactions handled sequentially in a block. Can you spend an output of transaction 1 in transaction 7? If so, then the miner could add a zero transaction as one of the coinbase outputs. Also, if the number of normal transactions was a power of 2, then the proof gets much less complex. Assuming 4 "real" transaction and the 5th one as M. Under the Merkle rules, you expand the number of transactions to a power of 2. A B C D M M M M A - D are normal M contains the merge minded signature. To prove M is in the block, you just need to provide Merkle-Hash(A, B, C, D) and M. The entire RHS of the tree is obtained by hashing M once for each level in the tree with the merkle rule. This can be performed pretty fast. I think the special transaction should have a magic number. This would make it slightly larger, but if you added a 32 byte random "official" magic number, then it is very unlikely that it would happen accidentally.
|
1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
|
|
|
TierNolan
Legendary
Offline
Activity: 1232
Merit: 1104
|
|
February 16, 2013, 03:24:53 AM |
|
As an aside, roughly, what is the total number of unspent transaction outputs at the moment?
|
1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
|
|
|
solex
Legendary
Offline
Activity: 1078
Merit: 1006
100 satoshis -> ISO code
|
|
March 08, 2013, 05:38:36 AM Last edit: March 09, 2013, 09:21:59 AM by solex |
|
As an aside, roughly, what is the total number of unspent transaction outputs at the moment?
It would also be interesting to know what percentage are <COIN_DUST.
|
|
|
|
gmaxwell
Moderator
Legendary
Offline
Activity: 4242
Merit: 8684
|
|
March 08, 2013, 06:46:50 AM |
|
So some discussions from the inflationproofing thread provided two additional requirements for the UTXO tree:
It needs to be a sum-tree over txout value so that the utxo root also shows the currency hasn't been inflated and allows stochastic utxo checks. Thats easy enough, just an implementation detail.
The other thing is that we need some way of testing that randomly selected transactions in a block are spending an input that was in the uxto set (not spending coins from thin air), thats easy— except when they're spending txouts in the same block. Is there a way to accommodate that without requiring a separate by-output lookup against transactions in the blocks as a p2p message?
|
|
|
|
apetersson
|
|
March 08, 2013, 07:55:46 AM |
|
The other thing is that we need some way of testing that randomly selected transactions in a block are spending an input that was in the uxto set (not spending coins from thin air), thats easy— except when they're spending txouts in the same block. Is there a way to accommodate that without requiring a separate by-output lookup against transactions in the blocks as a p2p message?
i would think there is an obvious, straightforward way. lets say a node requests matching transactions from a different node who has already recieved the latest block, using bloom filtering. if it has dependencies outside of the the UTXO, all parent transactions are also automatically provided until there are all included, even if they do not match the bloom filter. if he would wait until the next block, the case becomes "trivial" again.
|
|
|
|
TierNolan
Legendary
Offline
Activity: 1232
Merit: 1104
|
|
April 14, 2013, 09:49:06 AM Last edit: April 14, 2013, 12:23:31 PM by TierNolan |
|
It needs to be a sum-tree over txout value so that the utxo root also shows the currency hasn't been inflated and allows stochastic utxo checks. Thats easy enough, just an implementation detail.
So, the hash becomes 36 bytes instead of 32. The extra 4 bytes is the total in satoshis of UTXO. For a node with 1 childhash(node) = hash(child) For a node with 2 childrenhash().hash -> the 32 byte hash (sha256 squared) hash().value -> total value (unsigned int (max 21 million)) hash() -> hash().hash : hash().value hash(node).hash = sha256(sha256(hash(child1) : hash(child2))) hash(node).value = hash(child1).value + hash(child2).value The other thing is that we need some way of testing that randomly selected transactions in a block are spending an input that was in the uxto set (not spending coins from thin air), thats easy— except when they're spending txouts in the same block. Is there a way to accommodate that without requiring a separate by-output lookup against transactions in the blocks as a p2p message?
That actually seems to be a sub-problem of the main problem that we are trying to solve, but at the sub-block level. You can prove the TX out was created by transaction 10 in the current block, but if it is used in transaction 100, how do you know if it was spent between transactions 11 and 99? I think a merkle tree for all the intermediate states of the block would be required. This actually might not be that big a deal. The miner would already have to add and remove all the UTXOs from/to the the tree anyway. This would require adding them to an array and then computing the merkle root anyway. The block header (or alt chain header) would have 2 new fields Root of UTXO tree Merkle Root of roots of the UTXO tree for all transactions in this block
|
1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
|
|
|
Stampbit
|
|
April 14, 2013, 05:42:47 PM |
|
This certainly sounds like a very well engineered solution, any idea if this will ever be implemented?
|
|
|
|
TierNolan
Legendary
Offline
Activity: 1232
Merit: 1104
|
|
April 20, 2013, 06:24:48 PM |
|
I was thinking about this as a pure system that would allow dropping the blockchain storage (except for recent blocks) completely. Effectively, the data associated with coins would be moved to the owners of the coins.
You need to store the merkle path down to the transaction that created the coin. This won't change with time.
However, to spend it you also need live path from the latest root.
This tree could be stored in a distributed fashion. It only needs to be stored for a while, but the longer the better.
Your client could try to keep track of the tree down to your transactions. The amount of data depends on the number of transaction outputs. If there was 1 trillion outputs, then you would need to store around 40 levels until yours was the only transaction left on the path.
This works out at 40 * 32 bytes = 1280 bytes for per block per coin. This data rate could be decreased by combining all your coins into one and/or reducing the time between asking for the updated data.
The longer the network stores update data, the less often nodes need to update.
|
1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
April 20, 2013, 06:33:54 PM Last edit: April 20, 2013, 06:56:07 PM by etotheipi |
|
I was thinking about this as a pure system that would allow dropping the blockchain storage (except for recent blocks) completely. Effectively, the data associated with coins would be moved to the owners of the coins.
You need to store the merkle path down to the transaction that created the coin. This won't change with time.
However, to spend it you also need live path from the latest root.
This tree could be stored in a distributed fashion. It only needs to be stored for a while, but the longer the better.
Your client could try to keep track of the tree down to your transactions. The amount of data depends on the number of transaction outputs. If there was 1 trillion outputs, then you would need to store around 40 levels until yours was the only transaction left on the path.
This works out at 40 * 32 bytes = 1280 bytes for per block per coin. This data rate could be decreased by combining all your coins into one and/or reducing the time between asking for the updated data.
The longer the network stores update data, the less often nodes need to update.
That's a super interesting idea. I'll have to sleep on that one. Right now, everyone tracks everyone's coins, and you are responsible for maintaining your own private keys. In your system, you are maintaining your own private keys and the subtree information about the coins contained within. Nodes don't have to keep that information, because they only need it when the coins are being spent, and you are the only who ever spends the coins. Sure part of the UTXO set could be "lost" if your computer crashes, but those coins are lost, too, so it doesn't really matter...? If the coins will never be spent, the sub-tree never changes, and so nodes don't care what's inside that subtree, as they have its fingerprint. I have not had time to think through the implications (or complications) of it, but it's a super-interesting thought-experiment. P.S. -- This is yet another area where the trie-based structures win -- since there is branch independence in tries, peers don't have to store what's below a certain node as long what's below it is not changing (they only need the fingerprint of that node after the last update to it). If you use a BST, this doesn't work, because updates in neighboring branches can cause rebalances, forcing you to know what's in this branch so that nodes can be shuffled between branches.
|
|
|
|
TierNolan
Legendary
Offline
Activity: 1232
Merit: 1104
|
|
April 20, 2013, 06:40:30 PM |
|
This is yet another area where the trie-based structures win -- since there is branch independence in tries, peers don't have to store what's below a certain node as long what's below it is not changing (they only need the fingerprint of that node after the last update to it). Right. Just use the transaction hash directly as key and accept that there might be imbalances. However, the imbalances are not really going to happen because you are using a hash (effectively random) value as key. So the law of large numbers does tree balancing for you. The way I would see it is that the owner stores the historical path and the network stores the live state of the tree. However, when your client connects it boosts the local part of the tree near its coins. Telling nodes about the local area helps secure your own coins.
|
1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
|
|
|
|