Print Page - The High-Value-Hash Highway

Here's a simple proposal. Let me define the 'value' of a block as the number of leading zero bits in its hash. For every block, we already include the hash of the previous block. I propose to also include the hash of the most recent block with higher value than the previous.

This data structure could be considered a 'merkle skip list'. A skip list lets you traverse in either of two directions - 'backwards' or 'up'. This would let you quickly follow a chain of hashes up to the highest-value we've ever found.

This is a trivial modification to the current block chain, yet the significance of this data structure is absolutely staggering. I will just describe one application for now.

--- Proposal for "eventually zero-trust" clients ---

How does a lite client know which block chain is the largest? By iterating through the whole chain from front to back? Ick!

Using the high-value-hash highway, you can skip 'up' to the largest hash found in O(log N) effort. That immediately gives you a rough estimate of how much work was performed. You can now branch-and bound to gradually improve your estimate. From the largest block (i.e., the 'top'), skip backwards, and then up to the top again. You're now at the highest-value-hash from the era that preceded the current-highest. You can traverse the 'proof-of-work' content of a chain directly, with the most-informative hashes first and working your way down. You could quickly select the largest out of a pool of potential block chains without having to iterate through them all.

A lite client could easily use the high-value-hash highway to determine the largest chain very quickly and begin using it. The estimate improves as the process continues, and if the lite client eventually gets around to validating the whole thing, then that's great.

It's important to check the work first, and only then to validate. This should apply to the entire block chain as well as to individual blocks.

--- Experiment ---

I can show you a neat trick. What's the highest value for a hash in the current blockchain? Write that number down. Now, count how many blocks in the chain have that value or higher.

You probably counted '1', or at least very close to it.

This trick works regardless of how much time has passed, how fast our computers run, how connected our network is, how the difficulty has adjusted over time, etc. It is completely scale invariant. The overall shape of the hash highway would be roughly scale invariant as well. See also: https://en.wikipedia.org/wiki/Benford's_law

[EDIT] I finally got around to running this experiment, and it turned out I was correct. I also produced visualizations of the "overall shape of the hash highway," which I'm copying here because I think this illustration is very important. Read this post (https://bitcointalk.org/index.php?topic=98986.msg1109747#msg1109747) for an explanation.

https://i.imgur.com/5KXC4.png?1

Here's the next thing cool thing you can do with a high-value-hash highway: quickly discover exactly where two blockchains meet at a fork, even if the chains are very large and have many difficulty changes.

Quickly skip up to the highest-value-hash in each chain. If the chains share the highest-value-hash, then the fork point is more recent than that block. If one chain contains a higher-value hash than the other, then the fork point is behind that block. You can use a bisection search to find the true fork point.

I suppose I'm not being too clear on what the overarching goal is with this, so let me just say what my motivating principles are:
- The blockchain should eventually converge to a single correct chain, regardless of the network behavior and changes in cpu power over time (except for a persistent 34% adversary, which is unavoidable)
- Global parameters (like 10 minutes) should not need to be hard coded, but instead should automatically adjust according to demand and availability.

I'm going to build the argument that this is all possible using just a single additional back-link per block.

I love this stuff. I was contemplating something similar, where you'd store n=log(block_height) hashes in each header instead of just the last block hash. So in the current block, you'd store the hash for the previous block, the hash for the block before that, the hash 4 blocks back, 8 blocks back, 16 blocks back, all the way to the genesis block. That's about 18 hashes per block currently. This would allow some kind of SPV client to only store log n headers instead of n. I haven't really thought through how this works yet, but it sounds similar to what you proposed. Your idea is more elegant and has some advantages, though.

I question whether a blockchain, even a merged-mined alt-chain, is necessary to communicate this data. There are significant downsides to including more data than is minimally necessary into any blockchain style structure. It would be better for light clients, since they need open Internet access anyway, to be able to fetch this kind of data from one (or more) of several trusted webservers. Something like a podcast feed, only (perhaps encrypted) data collected by a specialized full node. Any client should be able to verify that the data fed to it is valid in a number of different ways to protect itself from trickery.

kjj:

Suppose the back-link from the current head looks like 000xxx (value 3). We also include an 'up'-link to the parent of the most recent block that looks like 0000xxxx (value 4). If you skip to that parent block, then it will have an 'up'-link to the parent of a block that looks like 00000xxxxxx (value 5). And so on, until you get to the parent of the largest block for which there's no up-link.

Suppose you start at that largest block, which looks like 00000xxxxx (value 5). It contains an up-link, either to a previous block with value 00000xxx (value 5), or to one with lower value. You can skip your way to the highest-value hash in every sub-interval of history.

Ben-abuya: right, this pretty much solves the same problem except with O(1) additional storage per block rather than O(log N).

Since you are busy in another thread about variable-difficulty blocks, I'd like to point out my approach to that here, although it will take a few steps to show how it relies on the hash-value-highway to be viable.

The "difficulty" of a block could be set by each individual miner who "bids" on the difficulty they want to get. So a winning block at difficulty 4 would be H(difficulty: 4 zeros || nonce || header) -> 0000xxxxxxxxx. Instead of taking the largest chain, we would take the chain that has the largest sum-difficulty (according to the difficulty bids, not the actual hash value).

There should be a disincentive for miners to spam the network with low-difficulty blocks. Each miner should have stake in the game - the miner himself could place down a bet, effectively a fee for the storage of his block header. If his block is accepted, then he gets the money back. If his fork gets swept up by a higher-difficulty fork, then he loses his bet and the fees he would have earned.

The hash-value-highway makes it efficient to evaluate the sum-difficulty of a fork, and also to merge forks by cherry picking valid transactions. Long chains of low-difficulty blocks will tend to get preempted by higher-difficulty forks. If each transaction includes a 'minimum difficulty', then you'll only be included in the difficult blocks which are less likely to be reordered.

I don't expect this part of the explanation to be totally clear, especially since I haven't elaborated on how a fork is 'swept up' by the main chain, but I will get to that later. But since you've been working on similar ideas, I think you might have the right intuition for how this works.

Quote from: TheBitcoinChemist on August 07, 2012, 06:54:55 PM

Just to be clear, the additional data I am recommending is a single additional hash per block.

Quote from: socrates1024 on August 07, 2012, 06:59:18 PM

Quote from: TheBitcoinChemist on August 07, 2012, 06:54:55 PM

Just to be clear, the additional data I am recommending is a single additional hash per block.

That's true, which I believe is only another 4 bytes added onto the existing 80 byte header; but as I said, it doesn't really solve the bootstrapping problem itself. The 'light' clients still have to download entire blocks with inputs matching their addresses, whenever the network is upgraded to allow specific block requests. And it's not like just downloading the 80 byte headers is a significant burden, as all the headers to date would be less than 16 Megs. So adding 4 bytes to every header wouldn't be a burden to any full clients, but nor would it be particularly useful for them, and it would increase the blockchain headers total size by 5% for qustionable gains even for light clients that keep only headers. And if these blockchain header type light clients have to trust full nodes to deliver them blocks matching their addresses anyway, and they do, why can't those same full nodes provide the data these light clients need to be certain they are using the longest chain outside of the blockchain? 4 bytes is still 4 more bytes, replicated onto every full or header-only blockchain in the world, forever & ever.

Quote from: TheBitcoinChemist on August 07, 2012, 07:10:34 PM

Light clients will not need to download blocks with inputs matching their address, let alone trust a full node to do it correctly. See etotheipi's proposal for zero-trust light clients using merkle trees. https://bitcointalk.org/index.php?topic=88208.0 The only remaining challenge for a lite client is (correctly) selecting the largest valid chain, which is what I am discussing.

Quote from: kjj on August 07, 2012, 07:20:16 PM

Notice that building a couple of phony blocks that look legit when viewed in this chain is trivial compared to building a couple of phony blocks to fool the real chain. And by trivial, I mean billions and billions of times easier.

Good question! That's exactly the concern that motivated me to work on this problem. If a chain is valid, you will eventually process it entirely. However if a chain is work-deficient, like you described, then you can reject it early. Think of it this way: it's a validation procedure that has zero false positives, but supports probabilistic false-negatives from early rejection.

Quote from: socrates1024 on August 07, 2012, 07:26:45 PM

Quote from: kjj on August 07, 2012, 07:20:16 PM

It has neither false positives nor false negatives, because it simply doesn't do ANY verification whatsoever.

Consider this, you get a block, block Z. You do not have block (Z-1), so you check the skiphash and ask for block Y. You also don't have block (Y-1), so you check the skiphash again and ask for block X. Now block X is one that you do have, so you stop there. How much should you trust blocks Y and Z at this point?

The answer is "not at all" because you don't have ANY context for them. You have NO idea what their difficulty should be, you have NO idea if they actually connect back to the real chain or not, and you have NO idea what their timestamps should be. In short, you have no idea if these are real blocks, or if they were prepared just to attack you, and you won't have any idea until you verify the chain the hard way.

The skiphash chain means that when I create my attack chain, the difficulty values must follow some rules, but following those rules is still billions and billions of times easier than following the full network rules because you only need to make a couple of blocks.

Unless I'm wrong, of course. It could be that the skiphash chain rules add up to a much stronger context than is immediately apparent to me.

Quote from: kjj on August 07, 2012, 08:56:12 PM

Unless I'm wrong, of course. It could be that the skiphash chain rules add up to a much stronger context than is immediately apparent to me.

I'm not sure that you're wrong, but I do think its a stronger validation than you realize. When mining, we send solutions to lower-difficulty problems to the server to let it know that we're working. Most mining pools call these 'shares', and there are typically many shares per valid block because the difficulty is much higher. The rules socrates1024 is proposing have the net effect of superimposing a nested set of higher difficulty blocks. Manufacturing a block for the next level would be roughly 16 times as difficult as manufacturing a single block at present difficulty.

I'm curious about this overlay, not sure about its usefulness but I'm at least intellectually interested in seeing it. It does seem to provide an interesting 'difficulty topology' on the blockchain...

Quote from: kjj on August 07, 2012, 08:56:12 PM

Consider this, you get a block, block Z. You do not have block (Z-1), so you check the skiphash and ask for block Y. You also don't have block (Y-1), so you check the skiphash again and ask for block X. Now block X is one that you do have, so you stop there. How much should you trust blocks Y and Z at this point?

...

Unless I'm wrong, of course. It could be that the skiphash chain rules add up to a much stronger context than is immediately apparent to me.

Branch and Bound! You've now established that block X is at least a common point in both chains (assuming you already have a chain, it's the genesis block if you're starting from scratch). Block Y is the largest hash in between X and Z. You can now repeat the process on either sub interval. Each time you go down a level, there are more blocks to check, and you get an increasingly precise estimate. If you keep going, eventually you will process every block.

I imagine that there would be a lite client that says "Estimated work: 1 bajillion megahashes. Likelihood: 76%". The likelihood number increases, very rapidly at first, and slowing down at 99%, 99.99% and so on. If it hits 100, then you have validated the entire chain like a full node.

This also allows you to quickly find the true fork point between two chains, even ones that have had difficulty changes along the way.

Quote from: socrates1024 on August 07, 2012, 09:41:47 PM

Quote from: kjj on August 07, 2012, 08:56:12 PM

What do you do with the subinterval exactly?

You don't know any blocks between X and Y or Y and Z. You don't know how many there are, you don't know what their difficulty is, you don't know what their values are, you don't know anything at all about those intervals, except that if you get block Y-1 (or Z-1) and the value is higher than Y (or Z), one of the blocks is lying. But which one?

And since you don't know anything at all about these intervals, your estimated work is going to be W>X+(y*Y)+(z*Z) where y and z are probably greater than 2. And I just don't see any way, using this method, to improve our estimates.

Again, I might be missing something, but from what I see, it looks like you are using knowledge of the blockchain in your estimates, which is a problem, since this is intended to be a system for finding out information about the blockchain that we assume that you do not actually possess.

Quote from: kjj on August 07, 2012, 10:29:14 PM

Quote from: socrates1024 on August 07, 2012, 09:41:47 PM

Quote from: kjj on August 07, 2012, 08:56:12 PM

What do you do with the subinterval exactly?

You don't know any blocks between X and Y or Y and Z. You don't know how many there are,

Actually, you would know how many there are, because the current block number is part of the header.

I'm starting to see value in this idea.

EDIT: but what happens if the difficulty drops or levels off ofr a long peroid of time, what prevents a uselessly large interval?

I imagine that in each block, you store a concise report of the sum of the difficulties so far. This way you are never trying to make an estimate from scratch, instead you are trying to evaluate whether the self-reported estimate is plausible.

Suppose I tell you that my chain has bajillion hashes with sum-difficulty of 5.

You say "Sweet! I bet you have some cool 9's and 10's in there. Show me those."

If I say "Nope actually my best is just an 8", you are immediately suspicious. "Well show me all your 8s and 7s then." If I actually did a fraction of a bajillion hashes, then the jig will be up quite quickly.

Quote from: TheBitcoinChemist on August 07, 2012, 10:38:14 PM

EDIT: but what happens if the difficulty drops or levels off ofr a long peroid of time, what prevents a uselessly large interval?

Good question! There's a difference between difficulty and hash-value. The difficulty is just the minimum hash-value needed for a block. The skip list indexes the work according to hash-value, which corresponds to the total number hashes drawn over time, regardless of what the minimum thresholds were. So if there's a period of time where the difficulty is only 1 (and perhaps there are tons of blocks like this), but most of the time the difficulty was 5, then you will be estimating mostly from the 7's 6's and 5's. You'd only get to the 1s if the total is plausible all the way down.

Quote from: kjj on August 07, 2012, 10:29:14 PM

What do you do with the subinterval exactly?

You don't know any blocks between X and Y or Y and Z. You don't know how many there are,

I'm sorry, I wasn't very clear on this part.

Remember that each skip list node allows you to travel either "backwards" or "up". If you traveled "up" several times from Z to get to to Y, then you can start at any of those nodes and travel "back" instead. This lets you subdivide/partition the entire history in a way that's roughly half-as-dense at each level. Since there are two links in each block, you can traverse to any point quickly.

A full chain iteration would consist of "back back back back back back ... back back [genesis]". An efficient way to check the work is to go "up up up up up up up [genesis]". then do "up up up up back up up up up up [genesis]". Mostly ups. At the beginning, you'll spend most of your time among the peaks, where the air is clearer and the signal is stronger.

Hmm, how about this:

Let's hypothetically assume current difficulty for all blocks for the sake of this illustration. Your client chooses a random number (k) from 0 to current block height (n). That number is a binary number with log(n) digits (zero-padded). A zero means "look low", a 1 means "look high". You send me that number and I now have to give you all the blocks I get from traversing the highway according to those instructions. If I wanted to feed you fake blocks, I'd have to either dream up all log(n) blocks on the path for you on the fly, or have all possible paths ready to go, which is basically the same as hashing the entire blockchain from scratch.

In order to hash the blocks on the path in a matter of seconds, I'd need 60 * log(n) more hashing power than the entire network.

The problem with this is that I can try to say that the difficulty was historically ridiculously low. But since I'm feeding you all the highest value blocks, if you're not happy with the overall difficulty you just won't deal with me, because I'm probably a crook. Since you can be off by at least two orders of magnitude and still feel safe dealing with me, there probably won't be a lot of false negatives.

Quote from: ben-abuya on August 08, 2012, 12:01:51 AM

A zero means "look low", a 1 means "look high". You send me that number and I now have to give you all the blocks I get from traversing the highway according to those instructions. If I wanted to feed you fake blocks, I'd have to either dream up all log(n) blocks on the path for you on the fly, or have all possible paths ready to go, which is basically the same as hashing the entire blockchain from scratch.

Yup, random traversals of this structure are very interesting. I suspect that at some later date, we will be able to make a non-gpu proof-of-work scheme based on traversals of the previous work-graph. It's basically a way of randomly sampling blocks from history, weighted according to difficulty.

Quote from: kjj on August 08, 2012, 01:39:47 AM

Blocks don't currently have a concept of height. If you add that too, you have more information, but not that much more.

And backing up doesn't give a new walk, ever. Every block from X+1 to Y has a skiphash value of X, every block from Y+1 to Z has a skiphash value of Y. If you back up one block to Z-1 you don't get a new walk, you get the exact same one again.

kjj, you seem to be confusing two concepts here, the skip hash (most recent block with higher value than the previous) and the highest value block in the chain. In your example, X, Y, Z are skip hashes, but for the traversal stuff, you'd be working with highest value blocks. If I find the highest value block in the chain, clearly there are two intervals: blocks before the hvb in the chain and blocks after the hvb in the chain. If we look at each of these intervals, excluding the hvb, we will find a different hvb in each interval. We can choose to go either up or down. That's the second block in the path.

Ok, so I read this thread again and figured out the part that I had been missing.

When creating a new block, the skiphash points to the most recent block with a value higher than the previous block, not the block with the previous highest value ever.

I'm working through the implications of this, but at first glance, it does not appear to produce a chain of links that leads to a high value block.

Quote from: socrates1024 on August 07, 2012, 06:58:16 PM

Suppose the back-link from the current head looks like 000xxx (value 3). We also include an 'up'-link to the parent of the most recent block that looks like 0000xxxx (value 4). If you skip to that parent block, then it will have an 'up'-link to the parent of a block that looks like 00000xxxxxx (value 5). And so on, until you get to the parent of the largest block for which there's no up-link.

Suppose you start at that largest block, which looks like 00000xxxxx (value 5). It contains an up-link, either to a previous block with value 00000xxx (value 5), or to one with lower value. You can skip your way to the highest-value hash in every sub-interval of history.

The problem is that when you follow that first "up" link, the block you find there does not have a link to a value higher than itself, it has a link to a value higher than its predecessor. If you want to find a block with a value higher than the block you are on, you have to go forward and then "up" from there, and forward links aren't possible.

Quote from: kjj on August 08, 2012, 01:22:47 PM

Quote from: socrates1024 on August 07, 2012, 06:58:16 PM

Of course they are, just +1 on your block count, then "up". Or just change the higher last hash to one that is higher than both the last hash & the current one. The idea of choosing to be higher than the last hash is to avoid duplicating the existing liner structure of the blockchain.

Quote from: TheBitcoinChemist on August 08, 2012, 03:33:26 PM

Quote from: kjj on August 08, 2012, 01:22:47 PM

Quote from: socrates1024 on August 07, 2012, 06:58:16 PM

You don't know the value of the current block's hash until after you hash it, and once you do that, you can't go back in and figure out which block you should have linked to, because that would change the hash and value.

Adding a block index to the block gives you the ability to move forward, but it also gives you full random access. Why bother with a second link when you can instantly seek by index?

With random access, the only job left for hashes is to preserve ordering, that is to prove that Z came after Y came after X. Someone creating a fake skipchain only has to make one trivial fake block per high value fake block, and make them in the right order, which is still billions and billions of times easier than creating a whole fake chain, at least over intervals long enough for this system to make any sense at all.

Quote from: socrates1024 on August 07, 2012, 07:53:21 AM

How does a lite client know which block chain is the largest? By iterating through the whole chain from front to back? Ick!

I stopped reading at this point. Ten years of headers is about 40 megabytes. One hundred years is 400 megabytes. A boring computer _today_ could validate 100 years of headers in a second or two. And even this could be skipped up to a fixed reference point. There are interesting challenges in Bitcoin, reading headers isn't one of them.

Also, from this discussion it sounds like some people are confused wrt difficulty. The difficulty of a block is its target, not the hash value.

Quote from: gmaxwell on August 08, 2012, 04:18:39 PM

Quote from: socrates1024 on August 07, 2012, 07:53:21 AM

How does a lite client know which block chain is the largest? By iterating through the whole chain from front to back? Ick!

If you stopped reading, how do you know? But seriously, I believe the idea here isn't to address practical problems with the current Bitcoin parameters, but to explore concepts that will be needed if Bitcoin morphs in interesting ways. For instance, if average block solving time is reduced 60-fold to ten seconds, can your smartphone still keep up with all those headers?

Quote from: gmaxwell on August 08, 2012, 04:18:39 PM

This proposal is primarily about ddos-resistance. A protocol's ddos-resistance is related to the largest plausible validation effort, not the expected (or actual) effort. So if I give you a 20GB set of headers and tell you it's the largest chain... you'd apparently reject it without even looking, since larger than 400MB is implausible.

Quote from: gmaxwell on August 08, 2012, 04:18:39 PM

There are interesting challenges in Bitcoin, reading headers isn't one of them.

My mission is to eliminate every last hard-coded global parameter in Bitcoin, so that it grows into an indisputably scalable and universal protocol. On the chopping block are "10 minutes," and "difficulty adjustment every 2016 blocks."

Two of the things I'm going to propose next (absorbing orphaned forks, and self-adjusted difficulty without timestamps) are going to potentially to create a much larger number of headers, so I wanted to explain my solution to that first, especially starting with efficient validation for lite-clients. If it's not interesting, then no need to read ahead - but save your place.

Quote from: ben-abuya on August 08, 2012, 05:13:50 PM

Quote from: gmaxwell on August 08, 2012, 04:18:39 PM

Quote from: socrates1024 on August 07, 2012, 07:53:21 AM

How does a lite client know which block chain is the largest? By iterating through the whole chain from front to back? Ick!

Average block solving time will never be reduced to anywhere near 10 seconds.

Quote from: socrates1024 on August 08, 2012, 05:43:39 PM

Quote from: gmaxwell on August 08, 2012, 04:18:39 PM

There are interesting challenges in Bitcoin, reading headers isn't one of them.

I assure you, that neither of those parameters are up for debate.

Quote

Two of the things I'm going to propose next (absorbing orphaned forks, and self-adjusted difficulty without timestamps) are going to potentially to create a much larger number of headers, so I wanted to explain my solution to that first, especially starting with efficient validation for lite-clients. If it's not interesting, then no need to read ahead - but save your place.

Orphanced forks don't need to be 'absorbed' into the main chain, and self-adjusting difficulty without timestamps is too complicated to impliment across a massive, distributed p2p network with connections that vary by orders of magnitude. This will not fly, and your current proposal isn't likely to be incorporated into the main chain, either; without some overriding gain. Incremental gains need not apply here, it's about two years too late for such things.

Quote from: kjj on August 08, 2012, 04:02:35 PM

Adding a block index to the block gives you the ability to move forward, but it also gives you full random access. Why bother with a second link when you can instantly seek by index?

Are you talking about an index within a block? No, too big times too many full clients. A better idea would be what I mentioned earlier, a web accessible block index on a regular webserver. It's not like any particular client needs to trust the index, as it just references blocks that can be downloaded and verified over the p2p network. Also, the external index could do the 'last harder hash' index after the block is produced, and this wouldn't require any changes in the codebase or protocol. For that matter, every full client already indexes everything, and that is how blockexplorer.com works. The trick is just setting up a machine readable standard for light clients to query the indexing server, and then hunt those blocks down over the p2p network.

Quote from: socrates1024 on August 08, 2012, 05:43:39 PM

In that case, good luck, but I don't think you'll get much traction with it. A few people are greatly concerned with the magic numbers, but most of us don't care. I've personally read dozens of threads on that subject, and so far no one has ever come up with a good argument why removing them would be beneficial.

This isn't physics where we kept getting 1/137 but didn't know why. We already know why we have 600 and 2016, and we are ok with it.

Quote from: socrates1024 on August 08, 2012, 05:43:39 PM

I'm looking forward to reading this.

Quote from: ben-abuya on August 08, 2012, 06:43:40 PM

Quote from: TheBitcoinChemist on August 08, 2012, 05:45:07 PM

Average block solving time will never be reduced to anywhere near 10 seconds.

Care to elaborate? It'll be easier to do if we know why it's impossible.

It's not impossible, just astronomically improbable. Those are arbitrary parameters decided by Satoshi, and in all the time since there has not been any proposal for altering them that could offer more than a marginal improvement, if that. For example, the 10 minute block interval could have been 6 minutes, 12 minutes, 15 minutes or any other arbitrary number of seconds; but it could not have been a small number of seconds because the interval was chosen as a balance between transaction confirmation time and the expected future network propogation latency. 10 seconds, or even 120, is too fast and would result in a great deal many more network splits & orphaned blocks; potentially causing a congestion feedback.

Also, the block interval time cannot be variable, due to the interlocking structure of the blockchain's design & purpose. If block intervals were variable; 1) the block reward would either have to be variable also, which would make the task of verifying that the block reward was a valid amount quite difficult for an individual node to perform or the distribution rate of newly issued bitcoins into circulation would no longer be predictable, and that would be bad; and 2) the relatively common blockchain splits would be significantly more difficult to resolve in any automatic fashion, which is the only way it could be done.

Quote from: TheBitcoinChemist on August 08, 2012, 07:49:14 PM

Quote from: ben-abuya on August 08, 2012, 06:43:40 PM

Quote from: TheBitcoinChemist on August 08, 2012, 05:45:07 PM

Average block solving time will never be reduced to anywhere near 10 seconds.

Care to elaborate? It'll be easier to do if we know why it's impossible.

Of course your points make perfect sense, but isn't this circular reasoning? You're saying that the target time can't be reduced because of network splits and orphaned blocks, but those are exactly the kind of things these proposals are trying to address. Satoshi is a genius, and it's uncanny how well his somewhat arbitrary constants worked right off the bat. It would have been unwise of him to add in extra complexity to reduce those numbers. But time has passed, the system has proven itself stable, and it's time to consider even more amazing things that can be done with these core concepts. Of course, pressing issues shouldn't be neglected, but some people are going to want to work on more experimental stuff, and I think that's great.

Quote from: ben-abuya on August 08, 2012, 09:59:12 PM

Quote from: TheBitcoinChemist on August 08, 2012, 07:49:14 PM

Quote from: ben-abuya on August 08, 2012, 06:43:40 PM

Quote from: TheBitcoinChemist on August 08, 2012, 05:45:07 PM

Average block solving time will never be reduced to anywhere near 10 seconds.

Care to elaborate? It'll be easier to do if we know why it's impossible.

But those aren't actual problems, the protocol is self-healing as it is. You're trying to debate a new solution to a non-issue.

Quote

Satoshi is a genius, and it's uncanny how well his somewhat arbitrary constants worked right off the bat. It would have been unwise of him to add in extra complexity to reduce those numbers. But time has passed, the system has proven itself stable, and it's time to consider even more amazing things that can be done with these core concepts. Of course, pressing issues shouldn't be neglected, but some people are going to want to work on more experimental stuff, and I think that's great.

It's precisely because the system is so stable that modifications to that system are verboten in the majority of minds here. The old addage, "don't fix it if it an't broke" applies here.

Quote from: TheBitcoinChemist on August 08, 2012, 10:43:39 PM

It's precisely because the system is so stable that modifications to that system are verboten in the majority of minds here. The old addage, "don't fix it if it an't broke" applies here.

This is pretty antithetical to the hi tech world. Are hard drives broken now? They work amazingly well. Yet, lots of people are out there making improvements to them. I just don't see how it makes sense to say that a piece of technology works well, so it shouldn't be improved. Moreover, bitcoin is broken when it comes to using it in a commercial setting. Ten to thirty minute waits just aren't acceptable. Sure there are ways to deal with that, but those are hacks.

Bitcoin takes care of money. What about the rest of the financial transactions that manage economies, like exchanges, stocks, bonds, bets, options, futures? You need higher resolution timing for those, and probably lots of other improvements as well. I don't think anybody is recommending baking this stuff into the Satoshi client. There will be alternative coins, and they'll compete on the market.

Quote from: ben-abuya on August 09, 2012, 05:02:10 AM

Quote from: TheBitcoinChemist on August 08, 2012, 10:43:39 PM

It's precisely because the system is so stable that modifications to that system are verboten in the majority of minds here. The old addage, "don't fix it if it an't broke" applies here.

All of those 'hacks' can be independently improved without affecting the bitcoin protocol, and the additional functions of an economy are intended to be seperate from the bitcoin protocol. It's simply not necessary to include those things into the protocol. Sure, hard drive tech is always improving, but techies have been trying to get the Internet to upgrade to IP6 for well over a decade, but that is not going to happen until the majority of the users have personaly evidence that IP4 broken. The same is true with the bitcoin protocol, I don't want you screwing with a good thing & my opinion actually matters because you can't change anything without the consent of the majority of node owners.

Quote from: TheBitcoinChemist on August 09, 2012, 05:11:56 AM

Again, I don't want to mess with the current Bitcoin implementation at all, and I'm completely on-board with the careful and conservative changes the core devs are making. But there are alternative blockchains which don't jeopardize bitcoin in any way. If one of them scratches an important itch, many people will adopt it, and it will compete for some resources. I think the bitcoin ecosystem will eventually be much more robust if there are multiple competing blockchains around instead of just one.

The reason some of these economic functions will be based on bitcoin ideas, is that they need distributed chronological ordering, and the bitcoin ideas are the best way we've currently got to do that.

Quote from: socrates1024 on August 07, 2012, 06:58:16 PM

The "difficulty" of a block could be set by each individual miner who "bids" on the difficulty they want to get. So a winning block at difficulty 4 would be H(difficulty: 4 zeros || nonce || header) -> 0000xxxxxxxxx. Instead of taking the largest chain, we would take the chain that has the largest sum-difficulty (according to the difficulty bids, not the actual hash value).

There should be a disincentive for miners to spam the network with low-difficulty blocks. Each miner should have stake in the game - the miner himself could place down a bet, effectively a fee for the storage of his block header. If his block is accepted, then he gets the money back. If his fork gets swept up by a higher-difficulty fork, then he loses his bet and the fees he would have earned.

Isn't wasted hashing power disincentive enough?

Quote from: socrates1024 on August 07, 2012, 06:58:16 PM

The hash-value-highway makes it efficient to evaluate the sum-difficulty of a fork, and also to merge forks by cherry picking valid transactions. Long chains of low-difficulty blocks will tend to get preempted by higher-difficulty forks. If each transaction includes a 'minimum difficulty', then you'll only be included in the difficult blocks which are less likely to be reordered.

I don't expect this part of the explanation to be totally clear, especially since I haven't elaborated on how a fork is 'swept up' by the main chain, but I will get to that later. But since you've been working on similar ideas, I think you might have the right intuition for how this works.

So this "sum-difficulty" is the total difficulty of a chain, including blocks that were merged in from other forks? If the block is merged into another chain, how would the original miner lose his transaction fees from that solved block?

In the OP, I suggested an experiment. What's the smallest hash (highest value hash) we've ever found? How many hashes have we found with that value? My prediction is that we've found exactly one hash with that highest value. I finally learned to parse blockchain data, and as it turns out, my prediction was spot on.

The smallest hash we've ever found (as of block #188529) is 00000000000000001e8d6829a8a21adc5d38d0a473b144b6765798e61f98bd1d. (http://blockchain.info/block/00000000000000001e8d6829a8a21adc5d38d0a473b144b6765798e61f98bd1d) This hash has 67 leading zero bits. The next smallest hash has only 66 zero bits.

I would have predicted there should be two hashes with 66 zero bits, but in fact, there have been five:
http://blockchain.info/block-height/137445
http://blockchain.info/block-height/140105
http://blockchain.info/block-height/148246
http://blockchain.info/block-height/156636
http://blockchain.info/block-height/156980

I mentioned that this way of looking at hash values leads to a scale-invariant way of measuring time. Let me illustrate this with a couple of graphs.

First, these are the hash values (256 - log2(hash)) as a function of time, the way we normally measure it - in blocks (each block corresponds to roughly 10 minutes). The red line at the bottom represents the difficulty, i.e., the minimum acceptable hash value. You can compare this red line with mndrix's (dated) chart of log-difficulty over time (https://bitcointalk.org/index.php?topic=2345.msg31405#msg31405).
https://i.imgur.com/3nm31.png?1

To visualize the scale-invariant properties I'm so excited about, look what happens when we plot the hash values against the cumulative difficulty, (i.e., total work). This is also equal to the expected total number of hashes computed.
https://i.imgur.com/5KXC4.png?1

The key things to notice about this graph are:
- The density of points decreases exponentially as you go higher. If you split the data into rows (https://i.imgur.com/DMPch.jpg), then each row contains half as many points as the row below it.
- The density of points is independent of the position along the x-axis.
- The density of points is independent of the shape of the red difficulty line (except for the fact there's no data below it).
- The x-coordinate of each point is uniform random within the entire interval.

Cumulative difficulty, not block-height, is the true way we measure time in Bitcoin. It's a lovely mechanism, and the fact it has these qualities gives me hope that a future incarnation won't need any hard-coded time parameters. A great way of partitioning/subdividing time is to sample according to the hash values. Right off the bat, there are two time periods - the time before the 67-bit block, and the time after the 67-bit block. Within each interval, you can subdivide according to the 66-bit blocks, and then the 65-bit blocks, and so on.

(Here's the code I wrote (https://github.com/amiller/byzbit/blob/ad025ecc37d7a12bd345d0c9fdfed5eb62ea961f/markingtime.py) to prepare these illustrations)

As a slight reformulation of this basic idea, you can also build a skip list data structure by committing to back links in the current block, and then selecting links whose distance back is less than the apparent work of the block once you've solved it. This allows you to construct compact SPV proofs with an accurate summary of the cumulative work that went into the actual blocks, but in a non-interactive way and without having to reveal more than log N block headers (where N is the length of the chain being proven). This was recently written up in a little more detail on the mailing list:

http://sourceforge.net/p/bitcoin/mailman/message/32111357/

Thanks for the writeup, I'm glad there's some renewed interest in this.
I can't figure out what the difference is specifically, though, with what you've written up. The proposal in this thread already includes commitments to back links. The way I think of it is that each block contains a claim about how much cumulative work went into creating it. The proof process is basically a hypothesis test, for whether the actual work matches the claim. In fact it's a recursive hypothesis test, because if you follow the "up" links you quickly (O(log n)) arrive at a block that should divide the total cumulative work into two segments. Then you do the hypothesis test on each of these.

I'm currently trying to figure out how to phrase this as a formal security game, which should make it a little more clear how to evaluate whether either of our proposals actually achieves what it's supposed to.

I wrote down a rough outline of a security game definition, and a clearer description of the algorithms with pseudocode.
https://docs.google.com/document/d/12xl5eRtoE0uISUX188jnmGSb-bwG4yXByRBbdr2r97A

I also wrote out an implementation in OCaml of the key operations (using the Lambda-auth (http://amiller.github.io/lambda-auth) language/compiler I made especially for this sort of thing), for building the hash-value highway and creating/verifying the compact SPV proofs. It compiles but I haven't tested it, I plan to use it to create some illustrative graphs.
https://gist.github.com/amiller/9635632

Any of the above could be totally broken, but hopefully this gets the conversation moving along more concretely!

Okay, I think I get it.

A hash with N initial zero bits corresponds to a probability that something close to 2^N total hashes have been performed, because, on average that's how many hashes it takes to get one with N initial zero bits.

Within the same set of hashes, you "expect" to find around two hashes with N-1 initial zero bits. Each of these corresponds to a probability that something close to 2^(N-1) hashes have been performed. If you have two of them, that adds up to 2^N total hashes (same as the work presented by the parent node). If you have only one, though, your estimate is lowered (by half) and if you have five your estimate is increased (multiply by 5/2).

Each of these hashes with N-1 initial zero bits is then expected to dominate around two hashes with N-2 initial zero bits. Meaning, if you had two at the N-1 level, you expect around four, and if you had five at the N-1 level, you expect around ten. And each of these N-2 hashes represents around 2^(N-2) total hashes. If you expected ten and you find seven, you lower your estimate (multiply by 7/10). If you expected four and you found seven, you'd raise your estimate (multiply by 7/4).

And so on, so the more of the "high nodes" on this hash tree you process, the better your estimate gets. The law of large numbers means each "generation" verified downward from the root of the tree will lead to smaller and smaller adjustments to the estimate, and you probably have a very good idea (within 1%) of the total work by the time you've found the ~50 best hashes.

This procedure can be carried down to the level of difficulty as of the time the block was mined, below which there's no information so you just have to treat your estimate on the individual blocks as the "truth" or as close as you can get.

So this gives you a way of getting a rapid estimate of the total amount of hashing work that went into a particular blockchain, or into the portion of it discovered since the blockchain fork.

Okay, I see the value there.

It's also interesting that you could do the same kind of linking on any metric; for example the amount of coin age destroyed by a given block, if that were your measure of "best chain" instead of the number of hashes.

Quote from: Cryddit on March 20, 2014, 03:57:24 AM

Okay, I think I get it.

Quote from: Cryddit on March 20, 2014, 03:57:24 AM

It's also interesting that you could do the same kind of linking on any metric; for example the amount of coin age destroyed by a given block, if that were your measure of "best chain" instead of the number of hashes.

I don't agree with this part. The coin-age destroyed among a set of blocks mined by an adversary does not necessarily follow any particular distribution. However, you are right that there are many possible choices for determining "value" other than distance from 0. You could, for example, interpret the "value" (beyond difficulty) as the *least-significant bit* zeros instead, since conditioned on a mining attempt beating the difficulty target, the distribution of the value in this case is still as it should be.

Anyway since that's a really minor point, here's some more useful technical content. One surprising thing the attacker can do is to create a really *large* proof. The attacker does this by throwing away every block *except* the value=1 blocks, which don't exceed the difficulty target at all. This only requires throwing away 50% of the work, but it results in a proof where the "up" links are always null, and you have to traverse every damn block to validate his proof.

This isn't a problem, for the following reason. You can make progress on validating the SPV proof for multiple chains concurrently. So if one peer tries to send you a really slow proof, you should request different proofs from a different nodes and process them in parallel.

The option of this strategy is what motivates the performance goal I mentioned as a caveat at the end of the writeup. The "honest prover" should, with high probability, produce a proof that is reasonably compact. It's okay if a dishonest prover produces a long proof. Regardless of length, the prover should be unable (except with low probability) to produce a convincing proof that the amount of work done to bury some transaction exceeds the actual amount of work (including the attacker's work plus any honest work) by some fraction.

*** edit ***
Actually, now that I've written this response out, I realize another attack that's modeled in my security definition, but isn't adequately addressed in my scheme. The problem is that an attacker can, by rejecting 50% of his work, cause an honest proof to contain less useful back links. It's important to set the verification parameters loose enough so that even if the attacker does this, he can't cause honest chains to fail verification. But, if the verification parameters are suitably widened, it must also not make the ease of forging a proof much higher. All of this relates to analysis i haven't finished in my writeup, so these are things I need to address. I don't think this should affect the choice of data structure though (only the guidelines for choosing parameters).

Heh. You're right that hashes are a uniform distribution, which makes it spectacularly simple to do this trick.

Coin age destroyed is a Poisson distribution multiplied by total money supply -- which is different but if you normalize by dividing out the money supply (conveniently, meaning divide by block height) you can similar statistical tricks using different formulas. There are at least two perturbing factors though, which are both disturbances to the pattern and sources of interesting information.

First, the money supply itself is actually less than proportional to block height, depending on the rate at which bitcoins get lost. The "coin age destroyed" graph, normalized by the theoretical money supply, would have an overall downward slope that would reveal approximately the rate per year at which bitcoins vanish due to lost keys (or just really persistent hoarders...).

Also the distributions would get narrower as the velocity of bitcoins increase, so the variance of the Poisson distribution would increase or decrease depending on the health of the bitcoin economy as of a particular period. And that would be another interesting thing to study.

Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: socrates1024 on August 07, 2012, 07:53:21 AM