Bitcoin Forum
April 19, 2024, 02:07:43 PM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2]  All
  Print  
Author Topic: Bitcoin Protocol Specification  (Read 12076 times)
RHorning
Full Member
***
Offline Offline

Activity: 224
Merit: 141


View Profile
November 24, 2010, 03:11:46 PM
 #21

[... , I] it think writing informal specifications documenting how bitcoin works right now is a great idea, and will be really helpful when it is time to go through some standardization process.

This is the most important thing to happen, IMHO, doing so would dramatically lower the barriers of entry of creating 2nd generation bitcoin clients independent of the reference implementation.

So if it would take many man_months of work to develop a formal specification, then how long would it take to develop a 'good enough' informal specification?

I think this is the wrong way to look at it, particularly given the mostly volunteer nature involve with the operation of Bitcoins at the moment.  There have been several attempts to start the documentation process, and the important thing to do now is to build upon those efforts and get what information anybody knows down into some usable form.  Documentation of Bitcoins all around is sort of weak, and even if you aren't a programmer it would still be useful to at least try to explain the concepts of Bitcoins in some way that perhaps even non geeks can understand them.

There is also a whole bunch of useful information which is now getting buried in these forum threats, so indexing these discussions would also be helpful in some way, although for the specific details of the operation of Bitcoins ultimately falls upon the source code of the reference implementation written by Satoshi.

Like trying to eat an elephant, it takes time and patience where you can only take one bite at a time.  If you can read the source code and understand even a portion of it, get that knowledge recorded or simplified if you can.  At that point we can debate the merit or lack there of for specific decisions in the current design.  My experience is also that once something is established and not challenged, that it tends to become something permanent in nature even on an "open source" project.  Right now, most people don't even know what to start challenging because the details are buried in code.  I'm hoping that a "good enough" documentation effort can at least bring some of those issues to the front.
1713535663
Hero Member
*
Offline Offline

Posts: 1713535663

View Profile Personal Message (Offline)

Ignore
1713535663
Reply with quote  #2

1713535663
Report to moderator
1713535663
Hero Member
*
Offline Offline

Posts: 1713535663

View Profile Personal Message (Offline)

Ignore
1713535663
Reply with quote  #2

1713535663
Report to moderator
1713535663
Hero Member
*
Offline Offline

Posts: 1713535663

View Profile Personal Message (Offline)

Ignore
1713535663
Reply with quote  #2

1713535663
Report to moderator
The forum was founded in 2009 by Satoshi and Sirius. It replaced a SourceForge forum.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713535663
Hero Member
*
Offline Offline

Posts: 1713535663

View Profile Personal Message (Offline)

Ignore
1713535663
Reply with quote  #2

1713535663
Report to moderator
RHorning
Full Member
***
Offline Offline

Activity: 224
Merit: 141


View Profile
November 24, 2010, 10:09:31 PM
 #22

For those familiar with the network level protocols, what is the difference between getblocks and getdata?  Both seem to be a list of hashes representing blocks which need to be sent to the requesting node.

One difference I can see is with the "getblocks" command/packet type will request a range of blocks, while getdata requests individual blocks.  Is this the only difference or is there something more significant that I'm missing here?  I'm trying to figure out when this particular packet type might be used instead or why there seems to be a duplication of block request methods seemingly doing the same thing.
theymos
Administrator
Legendary
*
Offline Offline

Activity: 5166
Merit: 12865


View Profile
November 24, 2010, 11:57:25 PM
 #23

For those familiar with the network level protocols, what is the difference between getblocks and getdata?  Both seem to be a list of hashes representing blocks which need to be sent to the requesting node.

One difference I can see is with the "getblocks" command/packet type will request a range of blocks, while getdata requests individual blocks.  Is this the only difference or is there something more significant that I'm missing here?  I'm trying to figure out when this particular packet type might be used instead or why there seems to be a duplication of block request methods seemingly doing the same thing.

Have you seen this?
http://www.bitcoin.org/wiki/doku.php?id=network

Getdata requests a specific block or transaction by hash. You generally only send a getdata after you receive an inv listing a block/tx that you don't already have. Getblocks requests an inv containing the hashes of all blocks in a range (max 500 at a time). It's used for initial block download and re-syncing after some downtime.

Getblocks (client) -> inv (server) -> getdata (client) -> block (server)
Send one getblocks, get an inv with 500 entries, send 500 getdata messages, receive 500 block messages. This sounds inefficient, but the download is actually very fast (it's the verification that eats up most of the "download" time).

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
RHorning
Full Member
***
Offline Offline

Activity: 224
Merit: 141


View Profile
November 25, 2010, 03:07:14 AM
 #24

For those familiar with the network level protocols, what is the difference between getblocks and getdata? 

Have you seen this?
http://www.bitcoin.org/wiki/doku.php?id=network


As a matter of fact, I missed that page.  Thank you so much for putting the effort into writing that explanation.  It really does make a difference.

As a side note, we really need to put together some menus or something that links deep into the wiki, or at least put references to it on other pages.

I've been trying to collect content related to the protocol for some time, so every little bit helps.  Again, thanks!
Cdecker
Hero Member
*****
Offline Offline

Activity: 489
Merit: 504



View Profile WWW
November 28, 2010, 12:20:51 AM
 #25

Let's try to keep this thread alive and unbury it with new findings while we go along. One fact that I stumbled over (for several hours today, hurting myself as I went) is that all numbers in the protocol are not encoded in network byteorder, but rely on little endian. I guess that would be pretty important if we are to create a documentation.

I think there are two ways to look at the protocol, a high level one, where everything is expressed in nice words and comparisons, and another dearly needed one that details the actual information and format on the wire.

One nice detail to add is for example that each messahe starts with a 4-byte magic
Code:
_magic = '\xf9\xbe\xb4\xd9'
.

Also in the original design a lot of attention went into how the size of a message is encoded:
Code:
    def getSize(self):
        first = self.getUByte()
        if first == 255: return self.getUInt64()
        elif first == 254: return self.getUInt()
        elif first == 253: return self.getUShort()
        else: return first

But message types are simply encoded with a padded 16 byte string. So I'm starting to wonder about the design choices. Why make the size field optimized when the other part of the message is large always? No offense intended, but this kind of things just make it hard to implement.

Oh and when using Java you might pay close attention on how to read unsigned data types (again, something I had to bang my head against before realizing my error  Roll Eyes)

Want to see what developers are chatting about? http://bitcoinstats.com/irc/bitcoin-dev/logs/
Bitcoin-OTC Rating
RHorning
Full Member
***
Offline Offline

Activity: 224
Merit: 141


View Profile
November 28, 2010, 12:49:47 AM
 #26

Let's try to keep this thread alive and unbury it with new findings while we go along. One fact that I stumbled over (for several hours today, hurting myself as I went) is that all numbers in the protocol are not encoded in network byteorder, but rely on little endian. I guess that would be pretty important if we are to create a documentation.

I think there are two ways to look at the protocol, a high level one, where everything is expressed in nice words and comparisons, and another dearly needed one that details the actual information and format on the wire.

I hope you've looked at the "draft spec" that I've been writing where I've put some of this information in, but your input is very much appreciated.  I forgot to mention the byte order as it is a huge detail, but something I've come to expect from projects like this.  About the only thing that is recorded in "network byte order" that I'm aware of at the moment is timestamp structure, and that is in part because the structure is defined in a library not written by Satoshi.  Nothing personal against Satoshi here either, as all that is going on is that he isn't re-ordering the bytes as the vast majority of the clients are using Intel architecture on their computers.  It simply makes the software a whole lot easier to write so far as transmitting the data.

This is also a pet peeve of mine as it opens up the whole little endian vs. big endian debate.  This is also where Intel going against the grain on this issue has sort of messed things up and a tale of how architecture decisions made decades ago continue to come back and impact everybody in sometimes significant ways.  For the most part, other than as a potential bug when you are trying to read/write data on a shared data format used by multiple computer systems (aka on a CD-ROM or via the internet) it rarely is even a problem.

At the moment I'm trying to wrap my head around the transaction and block formats in the network data sharing protocol.  A whole bunch is buried in there and isn't very well documented in terms of what it is doing.  If you could help in that regard, let me know too!
RHorning
Full Member
***
Offline Offline

Activity: 224
Merit: 141


View Profile
December 02, 2010, 12:31:19 AM
 #27

Going over the transaction specs, I noticed a "lock time" attribute on each transaction.  With this, there is apparently some sort of protocol envisioned for being able to push transactions to various nodes but also require them to be included at some future block instead of being processed immediately.  In other words, it is a request to miners to not include the transaction "no earlier than" some particular block number.  In addition, there is the ability for details about the transaction to be modified subsequent to its inclusion into a block.

My question is in a couple parts:  Is this in the roadmap for getting implemented in the future or is this simply an idea that hasn't really been completely thought through?  What kind of security issues are there in terms of a 3rd party "changing" the transaction information and simply updating to a new transaction version?  Or is this a "no later than" type of notification where the transaction expires after a certain block number has been created?

It is an interesting feature to Bitcoins if it could be pulled off.  Apparently most miners are not paying attention to this attribute as well, and it may be something to reconsider.
theymos
Administrator
Legendary
*
Offline Offline

Activity: 5166
Merit: 12865


View Profile
December 02, 2010, 01:15:39 AM
 #28

My question is in a couple parts:  Is this in the roadmap for getting implemented in the future or is this simply an idea that hasn't really been completely thought through?  What kind of security issues are there in terms of a 3rd party "changing" the transaction information and simply updating to a new transaction version?  Or is this a "no later than" type of notification where the transaction expires after a certain block number has been created?

It is an interesting feature to Bitcoins if it could be pulled off.  Apparently most miners are not paying attention to this attribute as well, and it may be something to reconsider.

A transaction can't be included in a block if its lock time is in the future. Even now blocks breaking this rule will be rejected.

The feature is designed to work with in-memory transaction replacement, which is currently disabled (it was enabled in older versions):
Code:
// Disable replacement feature for now
return false;

// Allow replacing with a newer version of the same transaction
if (i != 0)
    return false;
ptxOld = mapNextTx[outpoint].ptx;
if (!IsNewerThan(*ptxOld))
    return false;
for (int i = 0; i < vin.size(); i++)
{
    COutPoint outpoint = vin[i].prevout;
    if (!mapNextTx.count(outpoint) || mapNextTx[outpoint].ptx != ptxOld)
        return false;
}
break;
IsNewerThan() checks that the input's sequence number is lower than the other version. Lower sequence=newer.

This disabled feature is not network-enforced in any way, so it could be enabled at any time.

You can't replace a transaction unless you can sign it. So it should be safe. It might be unsafe if you're using inputs that can be redeemed by more than one person: the other person could make your transaction invalid (but not steal your other inputs).

It was probably disabled because it makes accepting transactions with 0 confirmations really unsafe. It could be safely re-enabled if transactions were only replaceable if they actually specify a non-zero lock time, and this was marked in the UI.

nTimeLock does the reverse.  It's an open transaction that can be replaced with new versions until the deadline.  It can't be recorded until it locks.  The highest version when the deadline hits gets recorded.  It could be used, for example, to write an escrow transaction that will automatically permanently lock and go through unless it is revoked before the deadline.  The feature isn't enabled or used yet, but the support is there so it could be implemented later.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
Cdecker
Hero Member
*****
Offline Offline

Activity: 489
Merit: 504



View Profile WWW
December 04, 2010, 01:04:55 PM
 #29

Sometimes I cant resist but question satoshis choices: a UINT64 size field? It's incredibly hard to implement in Java ( well not really BigInteger helps) and do we really need messages larger than 4GB (4 bytes)? UINT 64 would allow for messages of 18.45 Exabytes. That's more than all the world movies put together.

I think I'll simply drop messages requiring UINT64 sizes.

Want to see what developers are chatting about? http://bitcoinstats.com/irc/bitcoin-dev/logs/
Bitcoin-OTC Rating
RHorning
Full Member
***
Offline Offline

Activity: 224
Merit: 141


View Profile
December 04, 2010, 02:21:16 PM
Last edit: December 04, 2010, 07:45:44 PM by RHorning
 #30

Sometimes I cant resist but question satoshis choices: a UINT64 size field? It's incredibly hard to implement in Java ( well not really BigInteger helps) and do we really need messages larger than 4GB (4 bytes)? UINT 64 would allow for messages of 18.45 Exabytes. That's more than all the world movies put together.

I think I'll simply drop messages requiring UINT64 sizes.

Are you asking about the message header "size" field, indicating how large the message packet itself is?  I thought that was just a simple 4-byte int value followed by a 4-byte checksum.  That format information comes from main.cpp and also implemented in net.h:

Code:
    //

    // Message format

    //  (4) message start

    //  (12) command

    //  (4) size

    //  (4) checksum

    //  (x) data

    //

On the whole, most messages are quite small, with the exception of the transaction messages themselves which can grow to sizes on the order thousands of bytes (10k is the limit for a single script per input or output).  In theory some of the other messages could get fairly large, but still on that order of magnitude peaking at about 50k in extreme situations.  I can see where a shortint is perhaps too small and that a complex transaction with dozens of inputs and outputs might need more than 64k bytes, but you are correct that there is no need to get past the gigabyte range for message sizes.

*Edit* I also found this little snippet of code relevant to this discussion:

Code:
static const unsigned int MAX_SIZE = 0x02000000;

(from serialize.h)

This is the current maximum size for any single message on the network at the moment, as something larger than this is simply going to be rejected.
Cdecker
Hero Member
*****
Offline Offline

Activity: 489
Merit: 504



View Profile WWW
December 07, 2010, 11:45:52 AM
 #31

I wonder when they changed that one. It was incredibly hard to get the size out of the message, and we had to switch between size length according to the first byte.

Want to see what developers are chatting about? http://bitcoinstats.com/irc/bitcoin-dev/logs/
Bitcoin-OTC Rating
Cdecker
Hero Member
*****
Offline Offline

Activity: 489
Merit: 504



View Profile WWW
December 07, 2010, 12:04:07 PM
 #32

What exactly is this used for then: http://www.bitcoin.org/wiki/doku.php?id=bitcoins_draft_spec_0_0_1#variable_sized_data ?

Want to see what developers are chatting about? http://bitcoinstats.com/irc/bitcoin-dev/logs/
Bitcoin-OTC Rating
RHorning
Full Member
***
Offline Offline

Activity: 224
Merit: 141


View Profile
December 07, 2010, 02:21:05 PM
 #33


The only place I currently see that being used is in scripts.  Thanks to Theymos the ideas behind scripting are less opaque but it still is pretty arcane for those who really want to get into the gritty details of Bitcoin.

In theory it could be put into the protocol eventually as a way to save bandwidth, but so far I haven't seen it used in that way.  If that was a goal, it would seem that there would be some other concepts in place that would facilitate data compression more effectively and perhaps even be more extensible too.
Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1128


View Profile
January 07, 2011, 03:59:22 PM
 #34

The best suggestion I've seen for saving bandwidth is that completed blocks should not contain full transaction bodies but only the Merkle tree. Nodes that somehow missed the original tx broadcast for some nodes in the tree would then just getdata peers for them in the usual manner.

This isn't an issue today but it would make running nodes cheaper if BitCoin goes to extreme scales, like thousands of transactions a second.
Hal
VIP
Sr. Member
*
expert
Offline Offline

Activity: 314
Merit: 3853



View Profile
January 07, 2011, 07:48:25 PM
 #35

According to Gavin's https://github.com/gavinandresen/bitcointools/blob/master/NOTES.txt, serialization of any vector object gets preceded by a count of the number of elements in the vector, in the variable-length 1/3/5/9 byte format. I added this count field to the new wiki, e.g. to addr messages. Also, block messages contain a vector of their transactions, so that part is also preceded by a variable-length count.

Hal Finney
Gyrsur
Legendary
*
Offline Offline

Activity: 2856
Merit: 1518


Bitcoin Legal Tender Countries: 2 of 206


View Profile WWW
January 14, 2016, 05:46:01 PM
 #36

7 years in production and no official specification. it's a shame!

Pages: « 1 [2]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!