Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: Forp on July 09, 2011, 11:09:14 PM



Title: What EXACTLY means "longest" chain ?
Post by: Forp on July 09, 2011, 11:09:14 PM
Trying to work my way through the code  :) according to Richard Feynman's principle of learning "What I can't produce - I do not understand".   ;)

So, I reached another place I do not understand.  :o

Every node always considers the longest chain the correct one. Yes?

Ok. What happens if there are two longest chains.

Well, this is not a problem for the future of the entire system since, eventually, there will be a longest chain again. Odds are exponentially decreasing for a "two equally long longest chains" to survive in the long run.

That said: A single node (and, similarly, blockexplorer) has at any given moment in time a concept of what is the longest chain (even if there are two equally long chains). I checked the code. In the -printblock functionality there is no clear answer to this and I did not yet manage to digest the chain reorganization algorithms to understand this aspect.

So, assume there is this situation:

               B-C
              /
GENESIS-A
              \
               X-Y

What does the individual node or the block explorer consider the longest chain ? GENESIS-A-B-C or GENESIS-A-X-Y ?

This question might be considered academic, since in the long run one of the chains will stabilize and become the longest.

But there might be one interesting aspect here. I think, for quick chain stabilization, it is important that the nodes pick the same chain as the correct chain. Assume, 1/2 of the nodes pick GENESIS-A-B-C and the other 1/2 of the nodes pick GENESIS-A-X-Y. In this case the "battle" for the right correct chain might take much longer. Now assume that all nodes pick GENESIS-A-B-C. In this case the "battle" will be over quite quickly (not necessarily in favour of GENESIS-A-B-C though, but the chances of oscillating back and forth should be smaller).

Any ideas on this?






Title: Re: What EXACTLY means "longest" chain ?
Post by: Martin P. Hellwig on July 09, 2011, 11:15:20 PM
I sincerely hope that the amount of transactions in the block is a factor in defining the longest chain, but I don't know.


Title: Re: What EXACTLY means "longest" chain ?
Post by: etotheipi on July 10, 2011, 01:23:46 AM
Nodes process which ever "longest chain" they hear about first.  If two nodes on opposite sides of the world solve the next block at exactly the same time, then half the world will get word of one block and start working on it, while the other half the world gets word of the other block first and starts working on that.  Once either of those chains is extended, the solution is broadcast and all nodes will switch to it, seeing that that chain is the longest.  Sure, both halves of the world could solve the next block at exactly the same time and both will be temporarily extended again.  But statistically speaking, eventually one chain will "lose", and it's entirely up to luck which one that is.

More often than not, you would get 90% of the world that sees one block, and 10% of that gets the other block first.  Unless the 10% side gets extremely lucky, the 90% will be the one to become the real chain.  The orphaned block becomes "invalid."  I believe nodes that who were working on that branch, will see what transactions need to be re-included in the new branch, but I'm not sure how that part works exactly.







Title: Re: What EXACTLY means "longest" chain ?
Post by: JoelKatz on July 10, 2011, 01:36:50 AM
Ok. What happens if there are two longest chains.
Then the choice of chain is arbitrary. Eventually one chain will become longer and the problem will solve itsef.

Quote
Well, this is not a problem for the future of the entire system since, eventually, there will be a longest chain again. Odds are exponentially decreasing for a "two equally long longest chains" to survive in the long run.
Exactly.

Quote
What does the individual node or the block explorer consider the longest chain ? GENESIS-A-B-C or GENESIS-A-X-Y ?
In practice, whichever chain it saw first. In theory, it doesn't matter.

Quote
But there might be one interesting aspect here. I think, for quick chain stabilization, it is important that the nodes pick the same chain as the correct chain. Assume, 1/2 of the nodes pick GENESIS-A-B-C and the other 1/2 of the nodes pick GENESIS-A-X-Y. In this case the "battle" for the right correct chain might take much longer. Now assume that all nodes pick GENESIS-A-B-C. In this case the "battle" will be over quite quickly (not necessarily in favour of GENESIS-A-B-C though, but the chances of oscillating back and forth should be smaller).
The chain will already stabilize quickly because it will take, on average, several minutes to extend either chain. If you were designing a protocol that tried to add blocks more often, maybe it would be worth the cost of extra reorganizations.


Title: Re: What EXACTLY means "longest" chain ?
Post by: WakiMiko on July 10, 2011, 01:38:55 AM
Assume, 1/2 of the nodes pick GENESIS-A-B-C and the other 1/2 of the nodes pick GENESIS-A-X-Y. In this case the "battle" for the right correct chain might take much longer.

Not really, the "battle" should be over pretty fast in any case, as the next block settles the issue.
There is of course the chance that the next two blocks are found at proximately the same time and the split continues, but the chances for this to happen are pretty slim.


Title: Re: What EXACTLY means "longest" chain ?
Post by: Forp on July 10, 2011, 12:53:16 PM
...

Thank you, Joel, for your help ! It clarified the issue.


Title: Re: What EXACTLY means "longest" chain ?
Post by: Forp on July 10, 2011, 12:58:12 PM
Nodes process which ever "longest chain" they hear about first. 

Yup. Of course.

I did not phrase the question clearly enough. I was thinking of the following situation: A node is booted, bitcoind is started...reads the disc files and finds a block index where there are TWO longest chains. The program will prefer one of them. Which one will it prefer? I was not able to find that answer. I now realize, that it is in fact not really important. And I realize that I will have to spend a day or two understanding block reorganization... :-)

Actually, the terminology of longest chain is ill defined. There can be several chains of maximal length.

And again: If this thing (bitcoind) should eventually takle over management of my money - I want to make sure, as computer scientist, I understand every single bit of what it does.  :)


Title: Re: What EXACTLY means "longest" chain ?
Post by: Pieter Wuille on July 10, 2011, 02:20:31 PM
The longest chain is defined as the one with the most proof-of-work.

Technically, the "length" of each chain is the sum of (2^256 divided by the block's target) for a given block plus all its ancestors.
This number is the statistically expected number of hashes to have been performed in the chain, and is currently equal to about 4.1e19 (41 billion billion).


Title: Re: What EXACTLY means "longest" chain ?
Post by: Forp on July 10, 2011, 08:45:21 PM
The longest chain is defined as the one with the most proof-of-work.

Technically, the "length" of each chain is the sum of (2^256 divided by the block's target) for a given block plus all its ancestors.

Ah. This makes again much more sense! Of course. Guess  I will have to read some more satoshi c++...

Just curious: How is the longest chain picked if there are two chains with an equal amount of proof-of-work?


Title: Re: What EXACTLY means "longest" chain ?
Post by: Maged on July 10, 2011, 08:55:18 PM
Just curious: How is the longest chain picked if there are two chains with an equal amount of proof-of-work?
Exactly as we've already described: the client picks the one it received first.


Title: Re: What EXACTLY means "longest" chain ?
Post by: TierNolan on July 10, 2011, 10:13:25 PM
The longest chain is defined as the one with the most proof-of-work.

I am not sure that that's true.  

It is how it should be done, but may not be how it is actually done.

You could even define it as the sum of 1/<hash value> for all blocks in the chain.

The difficulty value only changes every 2160 blocks, so in most cases 2 chains will have the same difficulty for the blocks after the split.

If sum (1/<hash value>) was used, then it would be very unlikely that a tie would happen.

Alternatively, the rule could be that if 2 valid blocks are received, then the one with the lowest hash value wins.

               B-C
              /
GENESIS-A
              \
               X-Y

If B had a hash of 0000000123 and X had a hash of 0000000122, then everyone who had received the B hash first would switch to A as soon as they received the A block.  C would likely never be found at all.

This would short circuit the tie-break rule.

It wouldn't even require a network rule change.  Miners could just agree to do it that way.


Title: Re: What EXACTLY means "longest" chain ?
Post by: kjj on July 10, 2011, 11:05:34 PM
The longest chain is defined as the one with the most proof-of-work.

I am not sure that that's true. 

It is how it should be done, but may not be how it is actually done.

You could even define it as the sum of 1/<hash value> for all blocks in the chain.

The difficulty value only changes every 2160 blocks, so in most cases 2 chains will have the same difficulty for the blocks after the split.

If sum (1/<hash value>) was used, then it would be very unlikely that a tie would happen.

Alternatively, the rule could be that if 2 valid blocks are received, then the one with the lowest hash value wins.

               B-C
              /
GENESIS-A
              \
               X-Y

If B had a hash of 0000000123 and X had a hash of 0000000122, then everyone who had received the B hash first would switch to A as soon as they received the A block.  C would likely never be found at all.

This would short circuit the tie-break rule.

It wouldn't even require a network rule change.  Miners could just agree to do it that way.

Yes, but they won't have the same difficulty after the next block is received.

In your example, A-B-C and A-X-Y are equally valid and each node considers the one it saw first to be true.  The matter will be settled across the entire network when either D or Z is found next.

In practice, it almost never even gets this far.  Most of the time, either C or Y would have decided the question between B and X.


Title: Re: What EXACTLY means "longest" chain ?
Post by: TierNolan on July 11, 2011, 11:10:31 AM
Yes, but they won't have the same difficulty after the next block is received.

The point is that if there was a defined tie break rule, then one or other could be discarded.

When B and X were found, all nodes would agree on which one was first, even if they received them in different order.



Title: Re: What EXACTLY means "longest" chain ?
Post by: JoelKatz on July 11, 2011, 02:08:15 PM
The point is that if there was a defined tie break rule, then one or other could be discarded.

When B and X were found, all nodes would agree on which one was first, even if they received them in different order.
That would destabilize the network. Assume a block has been found and propagated to most of the network. Right now, to force a reorganization, I have to find two blocks before the rest of the network finds one. With this change, I only have to find one block and I have a 50/50 chance of upsetting the network.

With a well-defined tie break rule, the number of reorganizations would be much higher.


Title: Re: What EXACTLY means "longest" chain ?
Post by: TierNolan on July 11, 2011, 02:34:37 PM
With a well-defined tie break rule, the number of reorganizations would be much higher.

However, they would last for a shorter period of time.

Currently, if there is a fork, it will last for on average 10 minutes until one or other half of the network finds the next block.

Assuming most of the network switches immediately to the new best, then it is less wasted effort.  The benefit is that there is no need to wait for the next block to trigger the tie break.


Title: Re: What EXACTLY means "longest" chain ?
Post by: kjj on July 11, 2011, 02:55:56 PM
What exactly do you mean by wasted effort?

Also, you are doubling the amount of time an attacker has to find a superior block.


Title: Re: What EXACTLY means "longest" chain ?
Post by: JoelKatz on July 11, 2011, 03:09:16 PM
Currently, if there is a fork, it will last for on average 10 minutes until one or other half of the network finds the next block.
Yes, but it will tend to be a small portion of the network that is forkeds.
Quote
Assuming most of the network switches immediately to the new best, then it is less wasted effort.  The benefit is that there is no need to wait for the next block to trigger the tie break.
I think this is an incoherent argument. Either you consider attempts to create a block that don't produce a block wasted effort or not. But even if the network is split 50/50, it makes no sense to say the nodes working on the losing chain wasted effort. They had the same chance to produce a block as the nodes on the winning chain.

I don't like the idea that 95% of the network can be on a particular chain and one node produces a new block and has a 50% chance of forcing the network to switch.


Title: Re: What EXACTLY means "longest" chain ?
Post by: TierNolan on July 11, 2011, 07:09:12 PM
Yes, but it will tend to be a small portion of the network that is forkeds.

With a tie break, even that small part is eliminated.

Quote
I think this is an incoherent argument. Either you consider attempts to create a block that don't produce a block wasted effort or not. But even if the network is split 50/50, it makes no sense to say the nodes working on the losing chain wasted effort. They had the same chance to produce a block as the nodes on the winning chain.

True, I guess my objection is that you have a 1 block fork for longer.  With a tie break, the fork is instantly eliminated.

In the extreme, with chain = sum of proof of work, people could submit blocks with any amount of difficulty.  The more difficulty, the more likely it is to end up in the final chain.

Quote
I don't like the idea that 95% of the network can be on a particular chain and one node produces a new block and has a 50% chance of forcing the network to switch.

That can also happen with the fork case.  The 5% fork might win, and the 95% will have to reorganise.


Title: Re: What EXACTLY means "longest" chain ?
Post by: kjj on July 11, 2011, 07:28:30 PM
I don't like the idea that 95% of the network can be on a particular chain and one node produces a new block and has a 50% chance of forcing the network to switch.

That can also happen with the fork case.  The 5% fork might win, and the 95% will have to reorganise.

But only if the minority block is transmitted within seconds of the majority block.  In your system, the reversal can happen several minutes later, and it can cause a reorg of not just 95% of the network, but 100% (minus the attacker's mining node).


Title: Re: What EXACTLY means "longest" chain ?
Post by: TierNolan on July 12, 2011, 10:53:01 AM
In your system, the reversal can happen several minutes later, and it can cause a reorg of not just 95% of the network, but 100% (minus the attacker's mining node).

So, the issue is

Chain: ----> A

B is found

Chain: ----> A -> B

Transactions in B get "confirmed/1"

B' is found (with lower hash) before C is found.

Chain: ----> A -> B'

Transactions in B are reversed.

Under the current system, the attacker would need to find B' and C' before C is found, in order to clear B.

The trade-off is that under the current system, all of the nodes won't necessarily agree on what is the top node.  During a fork, some of the nodes will think a trade is still valid, while the majority of the network is working to reverse them.  With the tie break rule B all nodes will agree that B has been canceled, once it has been broadcast.


Title: Re: What EXACTLY means "longest" chain ?
Post by: Pieter Wuille on July 12, 2011, 11:13:15 AM
The weight of a block comes from its difficulty - namely the hash value it had to beat, not the actual value it had.

Since the difficulty is adjusted only every 2016 blocks, all blocks within a 2016-block section have the same difficulty.

If there is A->B, and someone finds a B', it will almost certainly (except on the 2016 block boundary) have the same difficulty as B, and not force anyone to switch. The switch only happens when a C would be found as successor to B' before a successor to B is found.

The point is simply this: if 95% of the network (measured by computing power) thinks B is the successor to A, B has 95% to be extended first, and thus has 95% chance to become part of the final chain.


Title: Re: What EXACTLY means "longest" chain ?
Post by: TierNolan on July 12, 2011, 12:39:34 PM
The weight of a block comes from its difficulty - namely the hash value it had to beat, not the actual value it had.

Well, the suggestion is that the value of the hash be used as a tie break, when there are 2 blocks that are the head of the chain.

The winning block in the case of a tie wouldn't be determined by network propagation delays.


Title: Re: What EXACTLY means "longest" chain ?
Post by: Pieter Wuille on July 12, 2011, 12:50:03 PM
The value of the block's hash is not a good measure for the amount of work that was necessary to produce it. The target is.


Title: Re: What EXACTLY means "longest" chain ?
Post by: TierNolan on July 12, 2011, 02:53:28 PM
The value of the block's hash is not a good measure for the amount of work that was necessary to produce it. The target is.

I don't agree.  The amount of work required to produce a hash is 1 hash worth of processing power, no matter what the value.

The question is how much processing power would be required to create a hash that is at least as good as the current one.

If you define "good" as lower hash value, then the value directly maps to how hard it is to create a hash that is at least as good.

However, it does mean that a miner can hit the jackpot and every 100 blocks, you will get a block that has an effective difficulty that is 100 times higher than the target.  That effectively would generate lock-in points on the chain naturally.

With the current system, you could in theory form an alternative chain based on forking back when the difficulty was very low and pretend that it was low for lots of blocks.  This is why the client does lock in. 


Title: Re: What EXACTLY means "longest" chain ?
Post by: Pieter Wuille on July 12, 2011, 03:23:21 PM
Each hash on itself is as hard as any other. The number of tries you need is proportional to the difficulty you had while hashing.



Title: Re: What EXACTLY means "longest" chain ?
Post by: TierNolan on July 12, 2011, 03:56:30 PM
Each hash on itself is as hard as any other. The number of tries you need is proportional to the difficulty you had while hashing.

There is no need for an in advance definition of difficulty. 

Work for a block = 2^256/(hash value)

Work for a chain = sum of the works for a block.

To produce a chain with a greater proof of work would require that you carry out the same number of hashes.


With a protocol change, difficulty would only be required to determine what the coin base for the block was.

Coinbase = 50*(target difficulty)/(hash value)    --- but capped at 50 per block.

With a variable coinbase, it would reduce the need for mining pools. 

Nodes could refuse to accept blocks with hashes greater than a threshold, in order to reduce spam.


I wasn't suggesting throwing away the rule that the longest chain is simply the one with the most blocks.  I was suggesting to use the hash value as a tie breaker when a fork happens.  This doesn't require a protocol change, just a change to how nodes forward blocks and how miners decide which block to mine.


Title: Re: What EXACTLY means "longest" chain ?
Post by: Pieter Wuille on July 12, 2011, 04:10:47 PM
I wasn't suggesting throwing away the rule that the longest chain is simply the one with the most blocks.  I was suggesting to use the hash value as a tie breaker when a fork happens.  This doesn't require a protocol change, just a change to how nodes forward blocks and how miners decide which block to mine.

There is no rule that the best chain is the one with the most blocks. It's the one with the most statistically expected number of hashes to have been performed by it.

The average number of hashes that have been performed for a given block is 2^48/65535*difficulty, or 2^256/target. Whether it is a hash much below the target or very close to it doesn't matter. Each attempt for that block had an independent chance of target/2^256 of being good enough, and good enough is all that counts.

Yes, you could move to another definition where you count 1/hash as value of a block, but the only thing this will cause is much more reorganisations, as a block that the majority had already agreed upon may be reverted.

Now there are two interpretations of your proposal: if you simply count the value of a chain as the sum of (1/hash) for each block in it, you could relatively easily create a block that reverts the past 100 blocks, causing much more reorganisations. The other interpretation is still letting the chain with the highest expected number of hashes win, but look at the 1/hash for the last block only for deciding which to pick. This is much more reasonable, but it doesn't gain you anything: instead of letting the majority of the network decide which side of the split wins, it becomes essentially random.


Title: Re: What EXACTLY means "longest" chain ?
Post by: TierNolan on July 12, 2011, 06:36:57 PM
There is no rule that the best chain is the one with the most blocks. It's the one with the most statistically expected number of hashes to have been performed by it.

I think the client takes it as most blocks.  In most cases, it's identical.

You make a good point about 1/hash.  If someone had 1% of the network and mined 5 blocks behind the head of the chain, then once every 500 blocks, they could reverse the 5 blocks ahead of them.

Quote
Now there are two interpretations of your proposal: if you simply count the value of a chain as the sum of (1/hash) for each block in it, you could relatively easily create a block that reverts the past 100 blocks, causing much more reorganisations. The other interpretation is still letting the chain with the highest expected number of hashes win, but look at the 1/hash for the last block only for deciding which to pick. This is much more reasonable, but it doesn't gain you anything: instead of letting the majority of the network decide which side of the split wins, it becomes essentially random.

The 2nd one was what I was suggesting.  The gain is that all honest nodes would (subject to network latency) all agree on what was the current head of the chain.  The disadvantage is that 1/confirmed would be weaker, since the attacker only needs to find 1 block instead of 2.