Bitcoin Forum
August 23, 2025, 07:37:58 PM *
News: Latest Bitcoin Core release: 29.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Protocol Buffers for Bitcoin  (Read 13230 times)
martin (OP)
Newbie
*
Offline Offline

Activity: 8
Merit: 100



View Profile
July 29, 2010, 11:29:31 PM
Last edit: August 19, 2025, 02:46:56 PM by martin
 #1

_
lachesis
Full Member
***
Offline Offline

Activity: 210
Merit: 105


View Profile
July 30, 2010, 01:04:06 AM
 #2

Some people have been suggesting that protocol buffers might be larger than the custom written packet layout. I suspect that actually it would be *smaller* due to some of the clever encoding used in protocol buffers.
I agree that it could be smaller; not necessarily because of clever encoding, but because it would allow us to drop reserved bytes and the like.

To resolve this, I think a test is in order, I shall encode a wallet file/network packet using protocol buffers and compare the size the packets in the current scheme. However, I have no idea what's in a packet, what data is stored in a packet, and in what format?
That would be the hard part, of course. If you want to test with the version packet (not really ideal, since it's only sent once per connection), I've decoded that fully:
http://bitcointalk.org/index.php?topic=231.msg6250#msg6250

Bitcoin Calculator | Scallion | GPG Key | WoT Rating | 1QGacAtYA7E8V3BAiM7sgvLg7PZHk5WnYc
Quantumplation
Sr. Member
****
Offline Offline

Activity: 308
Merit: 251



View Profile
July 30, 2010, 12:30:23 PM
 #3

Some people have been suggesting that protocol buffers might be larger than the custom written packet layout. I suspect that actually it would be *smaller* due to some of the clever encoding used in protocol buffers.
I agree that it could be smaller; not necessarily because of clever encoding, but because it would allow us to drop reserved bytes and the like.

Not only does it allow it to drop reserved fields, but it uses ZigZag encoding and some other tricks to keep integers and the like as absolutely small as possible.  So yea, it uses clever encoding. =P  It's also blazingly fast to process!

NOTE: This account was compromised from 2017 to 2021.  I'm in the process of deleting posts not made by me.
RHorning
Full Member
***
Offline Offline

Activity: 224
Merit: 141


View Profile
July 30, 2010, 02:13:27 PM
 #4

The encoded protocol buffer is just 55 bytes, wheras the bitcoin version is 85 0x00 sets (each one representing 2 bytes each I assume). This means that my badly designed protocol buffer is half the size of the hand built layout!

I realize that you are evangelizing for protocol buffers (and you seem to be doing a very good job of it too, I might add), but I will challenge that hand built data layouts are always bad.

Still, I hope this does give some food for thought and on a practical basis any improvement in the network protocol that shaves off a few bytes is always better.  This doesn't seem to sacrifice too much in terms of the overhead either.  More significantly, you are calling attention to an area of efficiency that needs to be addressed and is very helpful to the project.  Thank you for doing that.  I'm hoping to get caught up to where you are at now on this protocol business.
Gavin Andresen
Legendary
*
qt
Offline Offline

Activity: 1652
Merit: 2402


Chief Scientist


View Profile WWW
July 30, 2010, 02:28:23 PM
 #5

Speaking of the network...
... is there any really robust, generic, low-latency, open source p2p network "middleware" out there?

I think using protocol buffers as the serialization format is a good idea, but I don't think just switching to protocol buffers "buys" enough to be worth the effort (at least not now, when transaction volume is low).

I'd like to see some experimenting with running bitcoin on top of a different networking layer (and use protocol buffers, too).  Is there a p2p network that is designed to be extremely highly reliable and difficult to infiltrate or attack with malicious nodes?

How often do you get the chance to work on a potentially world-changing project?
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1142


View Profile
July 30, 2010, 03:47:33 PM
 #6

FYI, it is pointless to make a packet smaller than 60 bytes -- the minimum size of an Ethernet packet.  Packets are padded up to 60 bytes, if they are smaller.

Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
lachesis
Full Member
***
Offline Offline

Activity: 210
Merit: 105


View Profile
July 31, 2010, 01:45:23 PM
 #7

The encoded protocol buffer is just 55 bytes, wheras the bitcoin version is 85 0x00 sets (each one representing 2 bytes each I assume). This means that my badly designed protocol buffer is over half the size of the hand built layout!
The "0x00" groups each represent one byte. The length of the standard version packet is 87 bytes plus 20 for the header. The header could be massively optimized as well:
Code:
message start "magic bytes" - 0xF9 0xBE 0xB4 0xD9
command - name of command, 0 padded to 12 bytes "version\0\0\0\0\0"
size - 4 byte int
checksum (absent for messages without data and version messages) - 4 bytes
Obviously using proto buffers here, while absolutely a breaking change, would save a fair bit of space, especially because the "I've created a transaction" packet has the name "tx" meaning that there's at least 10 bytes of overhead in every one of those packets.

Bitcoin Calculator | Scallion | GPG Key | WoT Rating | 1QGacAtYA7E8V3BAiM7sgvLg7PZHk5WnYc
andrew
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
August 02, 2010, 02:43:43 AM
 #8

Why do you consider it a breaking change? There's no reason you couldn't first try with the new protocol and then retry using the old bitcoin serialization technique. Also I think this is a change that should be made sooner rather then later while the BitCoin community is still small. It's already been a major blocker in making new clients and delaying it is going to hamper bitcoin's adoption.
satoshi
Founder
Sr. Member
*
qt
Offline Offline

Activity: 364
Merit: 8096


View Profile
August 02, 2010, 08:22:08 PM
 #9

The reason I didn't use protocol buffers or boost serialization is because they looked too complex to make absolutely airtight and secure.  Their code is too large to read and be sure that there's no way to form an input that would do something unexpected.

I hate reinventing the wheel and only resorted to writing my own serialization routines reluctantly.  The serialization format we have is as dead simple and flat as possible.  There is no extra freedom in the way the input stream is formed.  At each point, the next field in the data structure is expected.  The only choices given are those that the receiver is expecting.  There is versioning so upgrades are possible.

CAddress is about the only object with significant reserved space in it.  (about 7 bytes for flags and 12 bytes for possible future IPv6 expansion)

The larger things we have like blocks and transactions can't be optimized much more for size.  The bulk of their data is hashes and keys and signatures, which are uncompressible.  The serialization overhead is very small, usually 1 byte for size fields.

On Gavin's idea about an existing P2P broadcast infrastructure, I doubt one exists.  There are few P2P systems that only need broadcast.  There are some libraries like Chord that try to provide a distributed hash table infrastructure, but that's a huge difficult problem that we don't need or want.  Those libraries are also much harder to install than ourselves.
lachesis
Full Member
***
Offline Offline

Activity: 210
Merit: 105


View Profile
August 03, 2010, 02:23:50 AM
 #10

The reason I didn't use protocol buffers or boost serialization is because they looked too complex to make absolutely airtight and secure.  Their code is too large to read and be sure that there's no way to form an input that would do something unexpected.
I hate to sound rude, but that sounds like the danger with the SCRIPT field in transactions. You're comfortable writing a whole evaluation language letting the blocks suggest operations to the client, but you're not comfortable using a library like protocol buffers?

Quote from: martin
Would you consider including an option to write the wallet file out in protocol buffer format instead of the custom format? That way the default can be the custom format which you trust more, and users can export their wallet to protobuf format if they want to move to a new client.
Why not use XML for that case? The size of the wallet file on disk isn't exactly a big concern when it comes to export, and XML compresses pretty well. Plus, it's completely human readable - it would help people to understand what is actually stored.

Bitcoin Calculator | Scallion | GPG Key | WoT Rating | 1QGacAtYA7E8V3BAiM7sgvLg7PZHk5WnYc
BeeCee1
Member
**
Offline Offline

Activity: 115
Merit: 10


View Profile
August 03, 2010, 02:46:33 PM
 #11


Indeed, but the version packet is probably the smallest packet of all the ones sent, so we'll gain more elsewhere. Also, keep an eye on the main point. The fact that protocol buffers are smaller is a nice aside to the fact that they're Forwards compatible and make bitcoin portable between languages.

Debugging is also easier with non-custom formats.  Instead of being the only one using it, you have many other people on different projects looking for and fixing bugs.  You also often get tools for decoding/displaying the packet to make it easier to see if something is wrong.  IMHO the size of the packet is the least important reason.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!