Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: jgarzik on March 14, 2013, 09:24:32 PM



Title: Block #225430 chain fork dataset available
Post by: jgarzik on March 14, 2013, 09:24:32 PM
For diagnostic purposes, here is a blockchain dataset built by a 0.7.2 bitcoind w/ db 4.8 + "-detachdb":

     http://gtf.org/garzik/bitcoin/chain-db48-h225429.tar.bz2 (http://gtf.org/garzik/bitcoin/chain-db48-h225429.tar.bz2)

It contains blockchain + index, up to height 225429, making it easy to reproduce an injection of too-large blocks at the precise juncture where the recent chain fork occurred.

Byte size: 5776366736 (5.3G)
MD5: f26deaaf05197bcbc73d33fed2443db3
SHA1: 743d1eaac3b590e996a22e707288fd9a21aa4c63
SHA256: 4dfd766c7cdfa346ad10e648900476dfc590605f78a78dff0c2608131c0f6c46



Title: Re: Block #225430 chain fork dataset available
Post by: deepceleron on March 14, 2013, 10:09:39 PM
Better would be the fork blockchain up to block 225453 (or another set of bad 225430-225453 that can be imported), so that it included the bad block and it can be fed to different version of clients and they can replicate the BDB freakout. We all have the blockchain up to 225429, but the bad chain went "poof" upon reorg.


Title: Re: Block #225430 chain fork dataset available
Post by: Gavin Andresen on March 14, 2013, 11:20:15 PM
The first part of the chain that got orphaned, starting at block 225,430, is here:

  http://skypaint.com/bitcoin/fork08.dat

The first three blocks in the 0.7-compatible chain starting at block 225,430 is:

  http://skypaint.com/bitcoin/fork07.dat


Title: Re: Block #225430 chain fork dataset available
Post by: jgarzik on March 15, 2013, 06:42:22 PM
Hold off on using this dataset.  Due to a local linking mistake, it was built with BDB 5.x.

Rebuilding the dataset with BDB 4.8 will complete in a few hours.



Title: Re: Block #225430 chain fork dataset available
Post by: jgarzik on March 16, 2013, 12:43:02 AM
Hold off on using this dataset.  Due to a local linking mistake, it was built with BDB 5.x.

Issue fixed.  Dataset updated.  OP updated with new hashes and byte size.



Title: Re: Block #225430 chain fork dataset available
Post by: commonancestor on March 17, 2013, 10:14:38 AM
Devs,

1. Thanks for getting us over this glitch safely.

2. It is beyond belief that validity of a block could be decided by such implementation specific matters like BerkeleyDB record locking.
And, of course, there is no specification of what is a valid block. The code is the specification? So make no changes to the code and we won't have forks?
Please take the code at some point and write the specification of what is a valid block. Then change the code as you like and test if it's ok with the specs.
Also you may find out that the block validity rules are too weird and could refactor them better.
As Mike Hearn says, money could be lost here.


Title: Re: Block #225430 chain fork dataset available
Post by: Peter Todd on March 17, 2013, 08:15:09 PM
2. It is beyond belief that validity of a block could be decided by such implementation specific matters like BerkeleyDB record locking.
And, of course, there is no specification of what is a valid block. The code is the specification? So make no changes to the code and we won't have forks?
Please take the code at some point and write the specification of what is a valid block. Then change the code as you like and test if it's ok with the specs.
Also you may find out that the block validity rules are too weird and could refactor them better.
As Mike Hearn says, money could be lost here.


Specifications aren't magic; they're just words on paper. I can put anything into a specification, but it doesn't magically make code actually follow the spec. I can also take the specification and write tests, but again, the tests don't magically make the code follow the specification.

Before commenting further on the topic you need to read the Bitcoin sourcecode yourself. If you can't read it, you have no business commenting on software development anyway. If you can, you'll find that while it isn't perfect and could use some refactorings, all in all understanding the intent of the different parts is fairly easy and thus the code itself acts as a perfectly good specification.





Title: Re: Block #225430 chain fork dataset available
Post by: commonancestor on March 17, 2013, 10:46:51 PM
Specifications aren't magic; they're just words on paper. I can put anything into a specification, but it doesn't magically make code actually follow the spec. I can also take the specification and write tests, but again, the tests don't magically make the code follow the specification.

Before commenting further on the topic you need to read the Bitcoin sourcecode yourself. If you can't read it, you have no business commenting on software development anyway. If you can, you'll find that while it isn't perfect and could use some refactorings, all in all understanding the intent of the different parts is fairly easy and thus the code itself acts as a perfectly good specification.

You are right, I should read the code indeed. A protocol defined in the code rather than in a specification - pros: no maintenance effort, no ambiguity (in theory); cons: difficult to read and understand, difficult to make other implementations (including new versions of the same program). The blockchain fork happened because devs forgot that Berkeley DB was part of the protocol. Without reading the code I find this bit messy.


Title: Re: Block #225430 chain fork dataset available
Post by: Peter Todd on March 17, 2013, 11:50:56 PM
You are right, I should read the code indeed. A protocol defined in the code rather than in a specification - pros: no maintenance effort, no ambiguity (in theory); cons: difficult to read and understand, difficult to make other implementations (including new versions of the same program). The blockchain fork happened because devs forgot that Berkeley DB was part of the protocol. Without reading the code I find this bit messy.

"no ambiguity" <- that's exactly what failed. In v0.7, db.h, there is the following line:

Code:
class CTxDB : public CDB

That means, create a CTxDB class, that extends the CDB class. That class is from an external library. What's CDB? What version? What does it do? All this stuff is ambiguous. Yet just "include every external" library doesn't work either; how far back do you go? While not as issue now, with really large blocks even subtle stuff like performance differences between different hardware implementations are can cause forks even with identical software.


Believe me, the developers understand the importance of the problem very well. As an example Pieter Wuille and others have been working to prevent OpenSSL differences from causing a fork with IsCanonicalSignature() and similar. I don't happen to agree with Gavin on everything, maybe even not on most things, but I can agree that he has been taking his roll in pushing testing and stability very seriously since he was hired by the Bitcoin Foundation, and for that matter, even further back than that.

There aren't easy solutions to specification problem, and I really think that writing yet another specification in addition to the imperfect one we already have is currently a waste of limited manpower. It might always be a waste of manpower - Bitcoin is in uncharted computer science territory with its extremely strict requirement for consensus.