socrates1024 (OP)
Full Member
Offline
Activity: 126
Merit: 110
Andrew Miller
|
|
August 07, 2012, 07:53:21 AM Last edit: August 19, 2012, 04:15:58 PM by socrates1024 |
|
Here's a simple proposal. Let me define the 'value' of a block as the number of leading zero bits in its hash. For every block, we already include the hash of the previous block. I propose to also include the hash of the most recent block with higher value than the previous. This data structure could be considered a 'merkle skip list'. A skip list lets you traverse in either of two directions - 'backwards' or 'up'. This would let you quickly follow a chain of hashes up to the highest-value we've ever found. This is a trivial modification to the current block chain, yet the significance of this data structure is absolutely staggering. I will just describe one application for now. --- Proposal for "eventually zero-trust" clients --- How does a lite client know which block chain is the largest? By iterating through the whole chain from front to back? Ick! Using the high-value-hash highway, you can skip 'up' to the largest hash found in O(log N) effort. That immediately gives you a rough estimate of how much work was performed. You can now branch-and bound to gradually improve your estimate. From the largest block (i.e., the 'top'), skip backwards, and then up to the top again. You're now at the highest-value-hash from the era that preceded the current-highest. You can traverse the 'proof-of-work' content of a chain directly, with the most-informative hashes first and working your way down. You could quickly select the largest out of a pool of potential block chains without having to iterate through them all. A lite client could easily use the high-value-hash highway to determine the largest chain very quickly and begin using it. The estimate improves as the process continues, and if the lite client eventually gets around to validating the whole thing, then that's great. It's important to check the work first, and only then to validate. This should apply to the entire block chain as well as to individual blocks. --- Experiment --- I can show you a neat trick. What's the highest value for a hash in the current blockchain? Write that number down. Now, count how many blocks in the chain have that value or higher. You probably counted '1', or at least very close to it. This trick works regardless of how much time has passed, how fast our computers run, how connected our network is, how the difficulty has adjusted over time, etc. It is completely scale invariant. The overall shape of the hash highway would be roughly scale invariant as well. See also: https://en.wikipedia.org/wiki/Benford's_law[EDIT] I finally got around to running this experiment, and it turned out I was correct. I also produced visualizations of the "overall shape of the hash highway," which I'm copying here because I think this illustration is very important. Read this post for an explanation.
|
|
|
|
Mike Hearn
Legendary
Offline
Activity: 1526
Merit: 1134
|
|
August 07, 2012, 10:06:41 AM |
|
Maybe I'm missing something obvious here, but lite/SPV clients need to process every block header anyway, it's fundamental to their operation. If a node presents you with a chain that is not the best chain, quickly calculating its difficulty is not particularly important:
1) The chain probably branches from your best chain not too far back anyway, so the practical effort saved is low
2) You need to filter and store any wallet-relevant transactions on the side chain even though it's not the best chain, so if/when it becomes the best chain you can update the wallets data structures
|
|
|
|
socrates1024 (OP)
Full Member
Offline
Activity: 126
Merit: 110
Andrew Miller
|
|
August 07, 2012, 04:52:55 PM |
|
Lite clients need to process every header of the correct chain. However, if a lite client is presented with several head nodes, only one of which is truly the largest, it would be nice to quickly skip iterating through the rest.
|
|
|
|
socrates1024 (OP)
Full Member
Offline
Activity: 126
Merit: 110
Andrew Miller
|
|
August 07, 2012, 05:19:54 PM |
|
Here's the next thing cool thing you can do with a high-value-hash highway: quickly discover exactly where two blockchains meet at a fork, even if the chains are very large and have many difficulty changes.
Quickly skip up to the highest-value-hash in each chain. If the chains share the highest-value-hash, then the fork point is more recent than that block. If one chain contains a higher-value hash than the other, then the fork point is behind that block. You can use a bisection search to find the true fork point.
I suppose I'm not being too clear on what the overarching goal is with this, so let me just say what my motivating principles are: - The blockchain should eventually converge to a single correct chain, regardless of the network behavior and changes in cpu power over time (except for a persistent 34% adversary, which is unavoidable) - Global parameters (like 10 minutes) should not need to be hard coded, but instead should automatically adjust according to demand and availability.
I'm going to build the argument that this is all possible using just a single additional back-link per block.
|
|
|
|
TheBitcoinChemist
Member
Offline
Activity: 70
Merit: 10
|
|
August 07, 2012, 05:44:27 PM |
|
It's an interesting proposal, but there are only a few types of client that would benefit from this, so to cut this into the main blockchain isn't ideal. Would it still work if an intergrated parrallel chain were produced, say, for a Stratum-like overlay network? I can see how this would be good for part-time/disconnected clients and fresh installs trying to get up into operation quickly, but I'm not sure that it really solves the bootstrapping problem that independent light clients have, which is mostly that they have to parse the blockchain at all, just to determine their own balance.
|
|
|
|
socrates1024 (OP)
Full Member
Offline
Activity: 126
Merit: 110
Andrew Miller
|
|
August 07, 2012, 06:08:55 PM |
|
Sure, it would also work as an overlay. It's also something an alt-chain could experiment with. The problem you're talking about where clients need to validate the entire chain in order to see their transactions - that is already solved (conceptually) using merkle trees. Read etotheipi's proposal for that here: https://bitcointalk.org/index.php?topic=88208.0 Of course, this only works if the client is able to correctly choose the right 'head'. That's what my current proposal is about.
|
|
|
|
ben-abuya
|
|
August 07, 2012, 06:27:42 PM |
|
I love this stuff. I was contemplating something similar, where you'd store n=log(block_height) hashes in each header instead of just the last block hash. So in the current block, you'd store the hash for the previous block, the hash for the block before that, the hash 4 blocks back, 8 blocks back, 16 blocks back, all the way to the genesis block. That's about 18 hashes per block currently. This would allow some kind of SPV client to only store log n headers instead of n. I haven't really thought through how this works yet, but it sounds similar to what you proposed. Your idea is more elegant and has some advantages, though.
|
|
|
|
kjj
Legendary
Offline
Activity: 1302
Merit: 1026
|
|
August 07, 2012, 06:42:38 PM |
|
I am completely missing the point of this.
Block X includes a hash of block (X-1) so that you can be sure that block X follows block (X-1). What is the point of including a hash of block (X-y) where y varies (but in practice will usually be either that one insanely lucky block from a while back or a block about halfway through the last difficulty bump)?
And how does this magically let you jump around the chain?
|
17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8 I routinely ignore posters with paid advertising in their sigs. You should too.
|
|
|
TheBitcoinChemist
Member
Offline
Activity: 70
Merit: 10
|
|
August 07, 2012, 06:54:55 PM |
|
I question whether a blockchain, even a merged-mined alt-chain, is necessary to communicate this data. There are significant downsides to including more data than is minimally necessary into any blockchain style structure. It would be better for light clients, since they need open Internet access anyway, to be able to fetch this kind of data from one (or more) of several trusted webservers. Something like a podcast feed, only (perhaps encrypted) data collected by a specialized full node. Any client should be able to verify that the data fed to it is valid in a number of different ways to protect itself from trickery.
|
|
|
|
socrates1024 (OP)
Full Member
Offline
Activity: 126
Merit: 110
Andrew Miller
|
|
August 07, 2012, 06:58:16 PM |
|
kjj:
Suppose the back-link from the current head looks like 000xxx (value 3). We also include an 'up'-link to the parent of the most recent block that looks like 0000xxxx (value 4). If you skip to that parent block, then it will have an 'up'-link to the parent of a block that looks like 00000xxxxxx (value 5). And so on, until you get to the parent of the largest block for which there's no up-link.
Suppose you start at that largest block, which looks like 00000xxxxx (value 5). It contains an up-link, either to a previous block with value 00000xxx (value 5), or to one with lower value. You can skip your way to the highest-value hash in every sub-interval of history.
Ben-abuya: right, this pretty much solves the same problem except with O(1) additional storage per block rather than O(log N).
Since you are busy in another thread about variable-difficulty blocks, I'd like to point out my approach to that here, although it will take a few steps to show how it relies on the hash-value-highway to be viable.
The "difficulty" of a block could be set by each individual miner who "bids" on the difficulty they want to get. So a winning block at difficulty 4 would be H(difficulty: 4 zeros || nonce || header) -> 0000xxxxxxxxx. Instead of taking the largest chain, we would take the chain that has the largest sum-difficulty (according to the difficulty bids, not the actual hash value).
There should be a disincentive for miners to spam the network with low-difficulty blocks. Each miner should have stake in the game - the miner himself could place down a bet, effectively a fee for the storage of his block header. If his block is accepted, then he gets the money back. If his fork gets swept up by a higher-difficulty fork, then he loses his bet and the fees he would have earned.
The hash-value-highway makes it efficient to evaluate the sum-difficulty of a fork, and also to merge forks by cherry picking valid transactions. Long chains of low-difficulty blocks will tend to get preempted by higher-difficulty forks. If each transaction includes a 'minimum difficulty', then you'll only be included in the difficult blocks which are less likely to be reordered.
I don't expect this part of the explanation to be totally clear, especially since I haven't elaborated on how a fork is 'swept up' by the main chain, but I will get to that later. But since you've been working on similar ideas, I think you might have the right intuition for how this works.
|
|
|
|
socrates1024 (OP)
Full Member
Offline
Activity: 126
Merit: 110
Andrew Miller
|
|
August 07, 2012, 06:59:18 PM |
|
I question whether a blockchain, even a merged-mined alt-chain, is necessary to communicate this data. There are significant downsides to including more data than is minimally necessary into any blockchain style structure. It would be better for light clients, since they need open Internet access anyway, to be able to fetch this kind of data from one (or more) of several trusted webservers. Something like a podcast feed, only (perhaps encrypted) data collected by a specialized full node. Any client should be able to verify that the data fed to it is valid in a number of different ways to protect itself from trickery.
Just to be clear, the additional data I am recommending is a single additional hash per block.
|
|
|
|
TheBitcoinChemist
Member
Offline
Activity: 70
Merit: 10
|
|
August 07, 2012, 07:10:34 PM |
|
I question whether a blockchain, even a merged-mined alt-chain, is necessary to communicate this data. There are significant downsides to including more data than is minimally necessary into any blockchain style structure. It would be better for light clients, since they need open Internet access anyway, to be able to fetch this kind of data from one (or more) of several trusted webservers. Something like a podcast feed, only (perhaps encrypted) data collected by a specialized full node. Any client should be able to verify that the data fed to it is valid in a number of different ways to protect itself from trickery.
Just to be clear, the additional data I am recommending is a single additional hash per block. That's true, which I believe is only another 4 bytes added onto the existing 80 byte header; but as I said, it doesn't really solve the bootstrapping problem itself. The 'light' clients still have to download entire blocks with inputs matching their addresses, whenever the network is upgraded to allow specific block requests. And it's not like just downloading the 80 byte headers is a significant burden, as all the headers to date would be less than 16 Megs. So adding 4 bytes to every header wouldn't be a burden to any full clients, but nor would it be particularly useful for them, and it would increase the blockchain headers total size by 5% for qustionable gains even for light clients that keep only headers. And if these blockchain header type light clients have to trust full nodes to deliver them blocks matching their addresses anyway, and they do, why can't those same full nodes provide the data these light clients need to be certain they are using the longest chain outside of the blockchain? 4 bytes is still 4 more bytes, replicated onto every full or header-only blockchain in the world, forever & ever.
|
|
|
|
socrates1024 (OP)
Full Member
Offline
Activity: 126
Merit: 110
Andrew Miller
|
|
August 07, 2012, 07:15:18 PM |
|
That's true, which I believe is only another 4 bytes added onto the existing 80 byte header; but as I said, it doesn't really solve the bootstrapping problem itself. The 'light' clients still have to download entire blocks with inputs matching their addresses, whenever the network is upgraded to allow specific block requests. And it's not like just downloading the 80 byte headers is a significant burden, as all the headers to date would be less than 16 Megs. So adding 4 bytes to every header wouldn't be a burden to any full clients, but nor would it be particularly useful for them, and it would increase the blockchain headers total size by 5% for qustionable gains even for light clients that keep only headers. And if these blockchain header type light clients have to trust full nodes to deliver them blocks matching their addresses anyway, and they do, why can't those same full nodes provide the data these light clients need to be certain they are using the longest chain outside of the blockchain? 4 bytes is still 4 more bytes, replicated onto every full or header-only blockchain in the world, forever & ever.
Light clients will not need to download blocks with inputs matching their address, let alone trust a full node to do it correctly. See etotheipi's proposal for zero-trust light clients using merkle trees. https://bitcointalk.org/index.php?topic=88208.0 The only remaining challenge for a lite client is (correctly) selecting the largest valid chain, which is what I am discussing.
|
|
|
|
kjj
Legendary
Offline
Activity: 1302
Merit: 1026
|
|
August 07, 2012, 07:20:16 PM |
|
Ok, I get it now. It lets you jump backwards through the chain, leaping over portions until you get back to the first block that had this feature enabled.
But what's the point?
You can't trust this chain unless you've verified all of the blocks the hard way. And if you've done that, why do you care about following these links back?
Notice that building a couple of phony blocks that look legit when viewed in this chain is trivial compared to building a couple of phony blocks to fool the real chain. And by trivial, I mean billions and billions of times easier.
|
17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8 I routinely ignore posters with paid advertising in their sigs. You should too.
|
|
|
socrates1024 (OP)
Full Member
Offline
Activity: 126
Merit: 110
Andrew Miller
|
|
August 07, 2012, 07:26:45 PM |
|
Notice that building a couple of phony blocks that look legit when viewed in this chain is trivial compared to building a couple of phony blocks to fool the real chain. And by trivial, I mean billions and billions of times easier.
Good question! That's exactly the concern that motivated me to work on this problem. If a chain is valid, you will eventually process it entirely. However if a chain is work-deficient, like you described, then you can reject it early. Think of it this way: it's a validation procedure that has zero false positives, but supports probabilistic false-negatives from early rejection.
|
|
|
|
nedbert9
Sr. Member
Offline
Activity: 252
Merit: 250
Inactive
|
|
August 07, 2012, 08:51:38 PM |
|
Interesting.
|
|
|
|
kjj
Legendary
Offline
Activity: 1302
Merit: 1026
|
|
August 07, 2012, 08:56:12 PM |
|
Notice that building a couple of phony blocks that look legit when viewed in this chain is trivial compared to building a couple of phony blocks to fool the real chain. And by trivial, I mean billions and billions of times easier.
Good question! That's exactly the concern that motivated me to work on this problem. If a chain is valid, you will eventually process it entirely. However if a chain is work-deficient, like you described, then you can reject it early. Think of it this way: it's a validation procedure that has zero false positives, but supports probabilistic false-negatives from early rejection. It has neither false positives nor false negatives, because it simply doesn't do ANY verification whatsoever. Consider this, you get a block, block Z. You do not have block (Z-1), so you check the skiphash and ask for block Y. You also don't have block (Y-1), so you check the skiphash again and ask for block X. Now block X is one that you do have, so you stop there. How much should you trust blocks Y and Z at this point? The answer is "not at all" because you don't have ANY context for them. You have NO idea what their difficulty should be, you have NO idea if they actually connect back to the real chain or not, and you have NO idea what their timestamps should be. In short, you have no idea if these are real blocks, or if they were prepared just to attack you, and you won't have any idea until you verify the chain the hard way. The skiphash chain means that when I create my attack chain, the difficulty values must follow some rules, but following those rules is still billions and billions of times easier than following the full network rules because you only need to make a couple of blocks. Unless I'm wrong, of course. It could be that the skiphash chain rules add up to a much stronger context than is immediately apparent to me.
|
17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8 I routinely ignore posters with paid advertising in their sigs. You should too.
|
|
|
BoardGameCoin
|
|
August 07, 2012, 09:14:19 PM |
|
Unless I'm wrong, of course. It could be that the skiphash chain rules add up to a much stronger context than is immediately apparent to me.
I'm not sure that you're wrong, but I do think its a stronger validation than you realize. When mining, we send solutions to lower-difficulty problems to the server to let it know that we're working. Most mining pools call these 'shares', and there are typically many shares per valid block because the difficulty is much higher. The rules socrates1024 is proposing have the net effect of superimposing a nested set of higher difficulty blocks. Manufacturing a block for the next level would be roughly 16 times as difficult as manufacturing a single block at present difficulty. I'm curious about this overlay, not sure about its usefulness but I'm at least intellectually interested in seeing it. It does seem to provide an interesting 'difficulty topology' on the blockchain...
|
I'm selling great Minion Games like The Manhattan Project, Kingdom of Solomon and Venture Forth at 4% off retail starting June 2012. PM me or go to my thread in the Marketplace if you're interested. For Settlers/Dominion/Carcassone etc., I do email gift cards on Amazon for a 5% fee. PM if you're interested.
|
|
|
socrates1024 (OP)
Full Member
Offline
Activity: 126
Merit: 110
Andrew Miller
|
|
August 07, 2012, 09:41:47 PM |
|
Consider this, you get a block, block Z. You do not have block (Z-1), so you check the skiphash and ask for block Y. You also don't have block (Y-1), so you check the skiphash again and ask for block X. Now block X is one that you do have, so you stop there. How much should you trust blocks Y and Z at this point?
...
Unless I'm wrong, of course. It could be that the skiphash chain rules add up to a much stronger context than is immediately apparent to me.
Branch and Bound! You've now established that block X is at least a common point in both chains (assuming you already have a chain, it's the genesis block if you're starting from scratch). Block Y is the largest hash in between X and Z. You can now repeat the process on either sub interval. Each time you go down a level, there are more blocks to check, and you get an increasingly precise estimate. If you keep going, eventually you will process every block. I imagine that there would be a lite client that says "Estimated work: 1 bajillion megahashes. Likelihood: 76%". The likelihood number increases, very rapidly at first, and slowing down at 99%, 99.99% and so on. If it hits 100, then you have validated the entire chain like a full node. This also allows you to quickly find the true fork point between two chains, even ones that have had difficulty changes along the way.
|
|
|
|
kjj
Legendary
Offline
Activity: 1302
Merit: 1026
|
|
August 07, 2012, 10:29:14 PM |
|
Consider this, you get a block, block Z. You do not have block (Z-1), so you check the skiphash and ask for block Y. You also don't have block (Y-1), so you check the skiphash again and ask for block X. Now block X is one that you do have, so you stop there. How much should you trust blocks Y and Z at this point?
...
Unless I'm wrong, of course. It could be that the skiphash chain rules add up to a much stronger context than is immediately apparent to me.
Branch and Bound! You've now established that block X is at least a common point in both chains (assuming you already have a chain, it's the genesis block if you're starting from scratch). Block Y is the largest hash in between X and Z. You can now repeat the process on either sub interval. Each time you go down a level, there are more blocks to check, and you get an increasingly precise estimate. If you keep going, eventually you will process every block. I imagine that there would be a lite client that says "Estimated work: 1 bajillion megahashes. Likelihood: 76%". The likelihood number increases, very rapidly at first, and slowing down at 99%, 99.99% and so on. If it hits 100, then you have validated the entire chain like a full node. This also allows you to quickly find the true fork point between two chains, even ones that have had difficulty changes along the way. What do you do with the subinterval exactly? You don't know any blocks between X and Y or Y and Z. You don't know how many there are, you don't know what their difficulty is, you don't know what their values are, you don't know anything at all about those intervals, except that if you get block Y-1 (or Z-1) and the value is higher than Y (or Z), one of the blocks is lying. But which one? And since you don't know anything at all about these intervals, your estimated work is going to be W>X+(y*Y)+(z*Z) where y and z are probably greater than 2. And I just don't see any way, using this method, to improve our estimates. Again, I might be missing something, but from what I see, it looks like you are using knowledge of the blockchain in your estimates, which is a problem, since this is intended to be a system for finding out information about the blockchain that we assume that you do not actually possess.
|
17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8 I routinely ignore posters with paid advertising in their sigs. You should too.
|
|
|
|