Bitcoin Forum
May 03, 2024, 08:09:45 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 4  All
  Print  
Author Topic: Formalised Bitcoin Protocol Standard  (Read 10508 times)
MatthewLM (OP)
Legendary
*
Offline Offline

Activity: 1190
Merit: 1004


View Profile
January 02, 2013, 01:53:52 PM
Merited by ABCbits (2)
 #1

I've thought about this and I'm surprised I've not seen (or can find) very much discussion eluding to this. At the moment, for anyone that wants to understand the bitcoin protocol, they would be able to use the bitcoin wiki somewhat, as well as forums and other websites but ultimately have to look at the source code of bitcoin implementations, or rely on the knowledge of other people.

It would be very useful and wise, in my opinion, if there was a formalised document describing the protocol to every detail, but in a way that is easy for anyone to follow. It would be a document that would be used as a reference for developers and would reflect all of the agreed (In majority use/Majority mining power) protocol features. The protocol standards document would then be amended as the protocol is modified. A separated set of documents could describe other features which are not core to the protocol such as wallet formats or whatever.

I had a hunch this would be something the Bitcoin Foundation was set up for, but it seems not. Do other people think this would be very useful to work upon? Otherwise the information will continue to be disorganised and a nightmare to piece together.
1714766985
Hero Member
*
Offline Offline

Posts: 1714766985

View Profile Personal Message (Offline)

Ignore
1714766985
Reply with quote  #2

1714766985
Report to moderator
"I'm sure that in 20 years there will either be very large transaction volume or no volume." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714766985
Hero Member
*
Offline Offline

Posts: 1714766985

View Profile Personal Message (Offline)

Ignore
1714766985
Reply with quote  #2

1714766985
Report to moderator
grantbdev
Sr. Member
****
Offline Offline

Activity: 292
Merit: 250



View Profile
January 02, 2013, 03:24:24 PM
 #2

I've thought about this and I'm surprised I've not seen (or can find) very much discussion eluding to this. At the moment, for anyone that wants to understand the bitcoin protocol, they would be able to use the bitcoin wiki somewhat, as well as forums and other websites but ultimately have to look at the source code of bitcoin implementations, or rely on the knowledge of other people.

It would be very useful and wise, in my opinion, if there was a formalised document describing the protocol to every detail, but in a way that is easy for anyone to follow. It would be a document that would be used as a reference for developers and would reflect all of the agreed (In majority use/Majority mining power) protocol features. The protocol standards document would then be amended as the protocol is modified. A separated set of documents could describe other features which are not core to the protocol such as wallet formats or whatever.

I had a hunch this would be something the Bitcoin Foundation was set up for, but it seems not. Do other people think this would be very useful to work upon? Otherwise the information will continue to be disorganised and a nightmare to piece together.

What about Satoshi's paper on Bitcoin? Isn't that the official specification?

Don't use BIPS!
gmaxwell
Staff
Legendary
*
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
January 02, 2013, 03:32:15 PM
 #3

What about Satoshi's paper on Bitcoin? Isn't that the official specification?
The paper is a design overview, not a specification. It presents the argument that something like bitcoin can work at all, but doesn't tell you the details of building something compatible with it.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
January 02, 2013, 04:24:54 PM
 #4

It would be very useful and wise, in my opinion, if there was a formalised document describing the protocol to every detail, but in a way that is easy for anyone to follow.
Perhaps if you post an example of a specification that is both "formal" and "easy for anyone" we could make a better comments. Common way of thinking leans toward saying that those are polar opposites.

Anyway, the major points against are:

0) extremely expensive
1) a lot of work with comparatively little benefit
2) hard to prove internal consistency
3) hard to verify consistency with non-formal, but actual implementations

When asked for pitfalls of "formal modeling" I nowadays point towards the ARM Architecture Manual and the way how multi-million company with clearly clever and well motivated staff ended up with BE32 and BE8 (a.k.a. just plain BE): two largely incompatible ways to implement big-endianess.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
MatthewLM (OP)
Legendary
*
Offline Offline

Activity: 1190
Merit: 1004


View Profile
January 02, 2013, 04:40:22 PM
 #5

Well by formalised I meant, but together professionally into a specification document (as opposed to now). I din't mean much more than that. It doesn't necessarily have to go by any usual conventions, if that means the document can be both easy to follow and fully detailed.
Mike Hearn
Legendary
*
Offline Offline

Activity: 1526
Merit: 1129


View Profile
January 02, 2013, 05:05:08 PM
 #6

The reality of how Bitcoin works means that Satoshis code is the protocol definition.
DannyHamilton
Legendary
*
Offline Offline

Activity: 3388
Merit: 4615



View Profile
January 02, 2013, 06:19:48 PM
 #7

Isn't most of the protocol right here?

https://en.bitcoin.it/wiki/Protocol_specification
Gavin Andresen
Legendary
*
Offline Offline

Activity: 1652
Merit: 2216


Chief Scientist


View Profile WWW
January 02, 2013, 09:37:44 PM
 #8

In my experience, developers are really good at either ignoring documentation or interpreting it in a way different than the way the author intended.

And spec authors are really good at getting details wrong, no matter how careful they are. And they're really bad at keeping track of changes.

That's why I spent a lot of time over the past year developing test cases and tools that you can run your code against instead of writing specs.

I may just be cynical because I spent so much time in 1997 working on the ISO/IEC-14772-1 Official, Formal Standard.

How often do you get the chance to work on a potentially world-changing project?
Mike Hearn
Legendary
*
Offline Offline

Activity: 1526
Merit: 1129


View Profile
January 02, 2013, 11:09:08 PM
 #9

Flying 3D sharks from the past! Smiley
MatthewLM (OP)
Legendary
*
Offline Offline

Activity: 1190
Merit: 1004


View Profile
January 04, 2013, 03:01:03 PM
 #10

Isn't most of the protocol right here?

https://en.bitcoin.it/wiki/Protocol_specification

You have the format of the messages on there but nothing about the network operation, validation, scripts etc. THere are other wiki articles that have more information but the information is incomplete and scattered around.

In my experience, developers are really good at either ignoring documentation or interpreting it in a way different than the way the author intended.

Yes maybe, but surely it's better than developers trying to decipher source code and learn bits here and there?

And spec authors are really good at getting details wrong, no matter how careful they are. And they're really bad at keeping track of changes.

That's why I spent a lot of time over the past year developing test cases and tools that you can run your code against instead of writing specs.

I may just be cynical because I spent so much time in 1997 working on the ISO/IEC-14772-1 Official, Formal Standard.

Well since bitcoin is an open protocol, there can be any number of people contributing to a bitcoin protocol specification, and anyone could spot mistakes and suggest improvements. It doesn't have to be bureaucratic or closed in nature.
bullioner
Full Member
***
Offline Offline

Activity: 166
Merit: 101


View Profile
January 04, 2013, 08:59:35 PM
 #11

The reality of how Bitcoin works means that Satoshis code is the protocol definition.

That sounds like a statement that might apply at a particular moment in time.  It isn't an argument against specifying the protocol separately from a particular implementation in future, though it is obviously something to take account of while writing the specification.

There was probably a time in 1990 when the reality of how the web worked meant that the CERN httpd implementation was the protocol definition for HTTP (I don't know for sure, but this a pattern seen with many protocols and other interface elements: original PGP implementation -> OpenPGP protocol; this was more or less the way ssh went as well regarding initial implementation to decent public protocol definition too).  That doesn't mean it was a bad idea to create standardisation processes once the technology took off and there was interest in multiple compatible implementations, and in managing changes / extensions.

[...]
That's why I spent a lot of time over the past year developing test cases and tools that you can run your code against instead of writing specs.

That's good stuff too -- but is certainly not an argument against trying to get the protocol specified in some sort of, for example, IETF-RFC-like document.  Specifications and test suites go together really well, but are not alternatives for one another.  Test suites are sometimes good for clarifying intent where a spec's ambiguous, and as you say above they're also great aiding implementors with completeness and correctness.

Bitcoin needs a protocol spec for the technology to mature.  One doesn't want to do it while the design's in flux, but Bitcoin's past that stage now.  Any incompatible design changes would be brought about as a new crypto currency rather than as changes to existing Bitcoin.

It is frankly pretty worrying to see Gavin and Mike be so dismissive of MatthewLM's suggestion.  Hopefully some others involved have more wisdom and experience in protocol engineering at Internet scale.
stevep
Jr. Member
*
Offline Offline

Activity: 30
Merit: 4



View Profile
January 04, 2013, 09:31:06 PM
 #12

I'm also concerned with needing to refer to the reference client source code but the reference client is called reference for a reason  Smiley

My concerns are that as the reference client struggles to stay relevant for end users the core developers focus on performance rather than for use as a reference.
Performance and readability do not tend to go hand in hand.

Is the creation of better protocol documentation a better solution?
I personally think so, as the core aspects of the protocol are effectively set in stone we should be able to document them in an accessible/understandable manner.

As Gavin identified the creation and maintenance of specs however is time consuming.
The reference client developers are free to spend there time however they feel is best. There are always issues to be fixed and new features to be implemented. We wouldn't want to stop the reference client from moving forward.

I'd like to offer my help in updating/maintaining the documentation. I've made a few minor edits to the Bitcoin wiki for some of the under specified or unclear areas that I've found.

Where do you feel the content of the Wiki currently falls short?

In my experience I've found that the status of some of the BIPs are out of date and I've tracked a few of them down and updated their status.
Once a BIP is accepted I think we should aim to roll its implications into the base documentation.
This information is recoverable by comparing the reference implementation to the BIPs.

In what ways do you feel that the Reference client falls short as use as a reference?

In my experience something the reference client does not capture well are the "gotchas" that have been solved over time that are relevant to all Bitcoin peer implementations.
When reading the reference code you might not realize that a piece of code evolved to its current state to solve a serious issue and that the naive implementation wouldn't be sufficient.
Again this information isn't lost we can recover it from the history and issue tracker and present it in a more accessible way.
Steve
Hero Member
*****
Offline Offline

Activity: 868
Merit: 1007



View Profile WWW
January 05, 2013, 02:44:19 AM
 #13

I agree with Gavin's point of view.  Unit Tests are the final and most complete form of behavior specification and the implementation is the final and most complete form of design.  It's best when both are expressed in languages free of a lot of syntactic noise.  C++ is far from ideal in that regard, but you live with compromises born out of practicality.  In languages that are less encumbered by syntactic noise, this perspective is much more readily apparent.  The tests and the implementation are so easily comprehensible that other documentation isn't worth the effort to maintain (and can even be a detriment). 

Check out OMeta and some of the papers at vpri.org if you're really into this sort of thing...with OMeta, they managed to create a system that could almost directly execute TCP/IP from the RFCs.  It was a complete TCP/IP implementation in under 200 lines of code (including the parser specification for the RFC ascii art).  See this summary about it: http://www.moserware.com/2008/04/towards-moores-law-software-part-3-of-3.html  ...to me, this is a proof point that code really should be regarded as self documenting (with little more than annotations to accompany it)...if it's too challenging for people to easily comprehend, it points to a shortcoming of the language, not of the concept that the code is the documentation.

(gasteve on IRC) Does your website accept cash? https://bitpay.com
davout
Legendary
*
Offline Offline

Activity: 1372
Merit: 1007


1davout


View Profile WWW
January 05, 2013, 07:52:31 AM
 #14

I disagree with the folks that find tons of reasons not to document. I'm not really surprised though, this has come up a few times already and the answer was already pretty much along these lines.

As much as I understand that the core contributors don't really feel like doing it for various reasons (they already write tests and contribute code after all), I'm really surprised that no one really seems to encourage MatthewLM to go forward with it.

Yes, tests are good, but add a complete spec and it gets even better. Yes, it's a fact that the main implementation is currently both the specification and the implementation, nobody can argue that. However, arguing that it's a good thing, that it shouldn't change, that a full protocol documentation is unnecessary isn't quite the same thing IMHO.

Mike Hearn
Legendary
*
Offline Offline

Activity: 1526
Merit: 1129


View Profile
January 05, 2013, 11:40:15 AM
 #15

It bothers me that this topic keeps coming up. The fact that Bitcoin is different to other technologies isn't intuitive but by the time you're writing an actual implementation, it should be obvious. Maybe it's worth reading the thread with grau about his reimplementation also?

The fact is that re-implementing Bitcoin exposes not only you, but all participants, to a class of "chain splitting" bugs that don't really exist in other network technologies, or at least are nowhere near as severe. The browser wars of the 90s were bad, but at least developers could check which browser the user ran and adapt to it on the fly. The Bitcoin equivalent is dramatically worse.

When you reimplement Bitcoin, it's not enough to build things as you think they should work. You have to implement them exactly as Satoshi did, including all his bugs. And because some parts of the protocol are directly exposed to underlying libraries like OpenSSL, you have to match their behaviour exactly as well, including all their bugs. Failure to do so can lead to people losing money.

At some point, if you realize you have to match the behaviour of another codebase exactly, down to the tiniest detail, you realize that the only precise enough specification for that is the source code. Which means if you can't read C++ fluently you can't reimplement Bitcoin, yes, but who cares? If you can't keep up, don't step up.

Having detailed protocol documentation is something I'd agree with in any other project except this one. In this one, it will simply mislead people into thinking they can reimplement Bitcoin. Unless they are willing to make absolutely massive effort and take serious risks, they can't.

Note that SPV nodes are much less risky. But Matthew isn't implementing an SPV client.
davout
Legendary
*
Offline Offline

Activity: 1372
Merit: 1007


1davout


View Profile WWW
January 05, 2013, 12:44:46 PM
 #16

The fact is that re-implementing Bitcoin exposes not only you, but all participants, to a class of "chain splitting" bugs that don't really exist in other network technologies, or at least are nowhere near as severe. The browser wars of the 90s were bad, but at least developers could check which browser the user ran and adapt to it on the fly. The Bitcoin equivalent is dramatically worse.
For one that is true only if miners are to run alternative implementations. Secondly I find your statements a little FUDdy because even in the case of a chain split, most transactions would make it to both chains until it's resolved.

When you reimplement Bitcoin, it's not enough to build things as you think they should work. You have to implement them exactly as Satoshi did, including all his bugs. And because some parts of the protocol are directly exposed to underlying libraries like OpenSSL, you have to match their behaviour exactly as well, including all their bugs. Failure to do so can lead to people losing money.
Maybe documenting the protocol could lead to fixing said bugs at an agreed-upon block height leading to a clearer and more consistent protocol. I really don't see the harm in documenting what happens under the hood, bugs included.

At some point, if you realize you have to match the behaviour of another codebase exactly, down to the tiniest detail, you realize that the only precise enough specification for that is the source code. Which means if you can't read C++ fluently you can't reimplement Bitcoin, yes, but who cares? If you can't keep up, don't step up.
Maybe that's the sign that the specification-software is getting too convoluted, which will ultimately lead to unmaintainable poor quality software.

What I want to see are competing implementations of a clearly defined protocol, not a centralized black-box maintained by a few who know exactly which bugs should be treated as features.

Putting all your eggs in a single basket is never a good idea (especially when they're golden eggs), what happens the day a critical exploit is discovered in the reference implementation ? Does everything collapse ?

Oh, and there's a reason why Bitcoin is still not 1.0 Wink

MatthewLM (OP)
Legendary
*
Offline Offline

Activity: 1190
Merit: 1004


View Profile
January 05, 2013, 01:14:34 PM
 #17

Unit Tests are the final and most complete form of behavior specification

The unit tests are just there to ensure things are working as expected. They aren't designed to provide a reference to how things are supposed to work.

It's best when both are expressed in languages free of a lot of syntactic noise.  C++ is far from ideal in that regard, but you live with compromises born out of practicality.  In languages that are less encumbered by syntactic noise, this perspective is much more readily apparent.  The tests and the implementation are so easily comprehensible that other documentation isn't worth the effort to maintain (and can even be a detriment). 

The English language, combined with diagrams, tables etc. are designed for humans to understand and are thus the ideal format, as opposed to C++. And the Satoshi client is not very friendly to human eyes, so indeed another implementation could make it much easier to understand, but why waste time on doing that if you could write a human-friendly specification?

Quote
I'm not really surprised though, this has come up a few times already and the answer was already pretty much along these lines.

For some reason I couldn't find very much when I searched for it, except vague references.

Quote
The fact is that re-implementing Bitcoin exposes not only you, but all participants, to a class of "chain splitting" bugs that don't really exist in other network technologies, or at least are nowhere near as severe.

All the more reason to make the protocol as clear and easy to understand as possible.

Quote
The browser wars of the 90s were bad, but at least developers could check which browser the user ran and adapt to it on the fly. The Bitcoin equivalent is dramatically worse.

Things have gotten better when things have become more standardised.

Quote
When you reimplement Bitcoin, it's not enough to build things as you think they should work. You have to implement them exactly as Satoshi did, including all his bugs. And because some parts of the protocol are directly exposed to underlying libraries like OpenSSL, you have to match their behaviour exactly as well, including all their bugs. Failure to do so can lead to people losing money.

Once again, more reason to have clear documentation. Though you do not need to do everything the same way as the Satoshi client. You only need to conform to the protocol requirements.

Quote
In this one, it will simply mislead people into thinking they can reimplement Bitcoin

Sorry? People can re-implement bitcoin. You know that, obviously.

Quote
Note that SPV nodes are much less risky. But Matthew isn't implementing an SPV client.

I'm implementing code which can be used for full validation or headers only validation. My plans for a client will includes a mixture between the two, offering the best of both worlds. The block-chain will thus be checked by headers-only validation against full validation done by a server.
CIYAM
Legendary
*
Offline Offline

Activity: 1890
Merit: 1075


Ian Knowles - CIYAM Lead Developer


View Profile WWW
January 05, 2013, 01:20:11 PM
 #18

I really think that MatthewLM has a very valid point here - if the C++ standard was just "read GCC" then I think the language would never have even been used to write Bitcoin (or anything else of value just like languages such as D).

Although such documents are very hard to write there is a point in having such standards (or do you guys prefer Microsoft or Google to *make* standards in code rather than using ISO ones?).

Anyone thinking that "Bitcoin is an exception" is kidding themselves (or are wanting to become the next Microsoft or Google themselves most likely).

With CIYAM anyone can create 100% generated C++ web applications in literally minutes.

GPG Public Key | 1ciyam3htJit1feGa26p2wQ4aw6KFTejU
Mike Hearn
Legendary
*
Offline Offline

Activity: 1526
Merit: 1129


View Profile
January 05, 2013, 02:00:54 PM
 #19

For one that is true only if miners are to run alternative implementations.

No, there are cases where other people can suffer from chain splitting bugs too. Let's say you're a high volume merchant or payment processor that runs an alternative implementation. I can make a transaction which it believes is invalid but the rest of the network believes is valid, for whatever reason. Once that gets included into a block, your business will grind to a halt because it'll split off onto a chain that no longer gets extended, or only gets extended very slowly, meaning you can't process payments anymore until the problem is noticed and you find and fix the conformance bug.

If this causes you to lose X coins per hour of business, then I can try to anonymously extort you for a bit less than X by claiming I know of such a bug in your software. It's very hard to prove it's not there. You'd have to have a lot of confidence in the robustness of the testing of your implementation.

What about if you accept invalid transactions? If you're providing a good or service in return for unconfirmed transactions then this obviously can undermine your risk model because you'll receive a transaction that you believe is valid, not a double spend, you don't see any double spend alerts or conflicting transactions - but it'll never confirm and I can still spend the money. I don't have to wait and mine a block any more like I would if doing a Finney attack, so it's much cheaper.

Quote
Maybe documenting the protocol could lead to fixing said bugs at an agreed-upon block height leading to a clearer and more consistent protocol. I really don't see the harm in documenting what happens under the hood, bugs included.

Such a document would end up being nearly as long as the source code, and not much easier to read. I'm all for adding more detailed comments to the source, though.

Quote
What I want to see are competing implementations of a clearly defined protocol, not a centralized black-box maintained by a few who know exactly which bugs should be treated as features.

Unfortunately, what you've got is the latter and it's not really easy to fix. We keep discovering new odd edge cases where what the software does, isn't what you'd actually expect given a description of how it's meant to work.

Quote
The English language, combined with diagrams, tables etc. are designed for humans to understand and are thus the ideal format, as opposed to C++. And the Satoshi client is not very friendly to human eyes, so indeed another implementation could make it much easier to understand, but why waste time on doing that if you could write a human-friendly specification?

I think the Satoshi client is quite straightforward to read, for the most part. A few parts are somewhat inscrutable  because they're written very tightly, but unfortunately there is no substitute for just puzzling it out - if you write a description of what you think the code does it may not match reality. We have seen this demonstrated several times, like with the merkle tree calculation. What people thought it did, wasn't quite what it actually did. If you simply duplicated Satoshis algorithm, you would duplicate his bug too so no chain-split attack would have been present. If you re-implemented it based on an English description, you'd have introduced an exploit.

Quote
Sorry? People can re-implement bitcoin. You know that, obviously.

I've implemented the SPV mode (not re-implemented, as for a long time there was no other implementation of this). Matt Corallo went ahead and has extended my work to do full validation. He's done a TON of testing, very in depth testing, despite that I would be very concerned if I heard that a big mining pool or BitPay or whoever was using it. At least in it's current state. It's not clear to me how much work would be required until I felt comfortable with high value operations using bitcoinj in full mode, and the documentation when 0.7 is released will make that clear.

CIYAM
Legendary
*
Offline Offline

Activity: 1890
Merit: 1075


Ian Knowles - CIYAM Lead Developer


View Profile WWW
January 05, 2013, 02:11:06 PM
 #20

Quote
Maybe documenting the protocol could lead to fixing said bugs at an agreed-upon block height leading to a clearer and more consistent protocol. I really don't see the harm in documenting what happens under the hood, bugs included.

Such a document would end up being nearly as long as the source code, and not much easier to read. I'm all for adding more detailed comments to the source, though.

So you really think rather than using RFC 1939 one should read through someone's source code (and worse yet "comments" that the compiler ignores) to work out how to do POP3?

I am seriously now beginning to wonder whether anyone here has worked on large scale software projects at all (say > 100 devs and say > 100 million USD).

With CIYAM anyone can create 100% generated C++ web applications in literally minutes.

GPG Public Key | 1ciyam3htJit1feGa26p2wQ4aw6KFTejU
Pages: [1] 2 3 4  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!