Hey everyone, I would love to hear feedback on whether this idea is worth pursuing, and, if so, how it might be improved.
Here's a link to a prettier version of this:
https://gist.github.com/paulkernfeld/7126c1307fd46561df9cPreface: Blockchain Storage Politics
---------------------------
The issue of whether the Bitcoin protocol should be used for data storage is [contentious](
https://github.com/bitcoin/bitcoin/pull/5286), and I apologize to anyone who thinks this post is in poor taste. If you don't mind, I'd like to limit this particular discussion to the technical merits of Vespucci, rather than the moral merits.
Vespucci
========
Vespucci is a proposed protocol that uses the Bitcoin blockchain for decentralized application discovery.
The Problem
===========
In order for an application to join a P2P network, the application must somehow find the IP address of one or more peers to connect to. This is called bootstrapping, and it can be difficult. BitTorrent and Bitcoin clients often include constant addresses of "bootstrap nodes," long-lived server nodes. This solution works only if these long-lived server nodes remain up, which creates a potential point of failure. So, this protocol is designed to provide discovery for applications with the following requirements:
1. The P2P application needs to download a global list of peers when it's first opened.
2. Anyone should be able to register bootstrapping peers for discovery.
3. Read performance, i.e. the time to read peers and join the network, is very important.
4. Write performance, i.e. the time to register a new peer, is very unimportant.
5. The ability to discover the Bitcoin network is assumed.
The Protocol
============
Addresses of peers will be stored using [OP_RETURN](
https://en.bitcoin.it/wiki/OP_RETURN) transaction outputs. In order to discover peers for the application, it will look through the blockchain, returning relevant transactions to the application.
What is stored?
---------------
The data pushed after OP_RETURN will consist of:
1. Two bytes, `V0` in ASCII, identifying the message as belonging to the Vespucci protocol.
2. A few additional bytes identifying the application.
3. A zero byte to mark the end of the application ID.
4. A list of compressed addresses.
Since the Blockchain is a shared resource, we want to be sure to use it wisely. We can use space efficiently by compressing addresses and allowing multiple addresses to be batched into the same transaction.
Addresses
---------
Uncompressed addresses will be zero-terminated generic URIs. This allows us to support IP addresses as well as hostnames.
`scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]`
Information to include or not include:
* We probably don't need to store `scheme` (e.g. `http` or `magnet`), because that information can probably be inferred by the application.
* It doesn't make much sense to include a `user` and `password` in Bitcoin, a publicly-viewable store!
* The `host` field will always be populated
* The `port`, `path`, `query`, and `fragment` fields may be populated
Compression
-----------
In order to compress URLs, we'll want to use a small-string compressing library specifically trained on URL-looking data. [shoco](
http://ed-von-schleck.github.io/shoco/) allows users to do [just this](
http://ed-von-schleck.github.io/shoco/#generating-compression-models).
Look order
==========
The Bitcoin blockchain is a log-structured data store, optimized for great write performance, but not designed for reading. The blockchain currently grows by about 25 GB/year, and is not indexed only by time. This presents a dilemma: a P2P application that's five years old would have to look through 125 GB of data if searching linearly, even in the unlikely event that the block size limit is not increased. That's a lot of data to look through just to get some addresses! So, how can we turn the write-optimized blockchain into a read-optimized discovery store?
To solve this, we look through blocks in an order that maximizes read performance while greatly sacrificing write performance. The algorithm will be as follows:
* WLOG, label the first block that we care about block 0. Label subsequent blocks in [height](
http://bitcoin.stackexchange.com/questions/18561/definition-of-blockchain-height) order: 1, 2, 3, ...
* Define the current maximum block height as `M`.
1. Set integer `K` such that `K := ceiling(log2(M))`.
2. Look at all non-looked-at blocks where block number `B` is a multiple of `2 ^ K`, in descending order.
3. If `K > 0`, set `K := K - 1`. Otherwise (`K = 0`), we have looked at all blocks.
This can also be thought of as writing block heights in binary and counting the number of zeros at the end. An example:
Blocks: 0 1 2 3 4 5 6 7 8 9 10 11 12 13
Binary: 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101
End zeros: 4 0 1 0 2 0 1 0 3 0 1 0 2 0
Look order: 0 8 12 4 10 6 2 13 11 9 7 5 3 1
Just as when picking port numbers, applications should try to avoid synchronizing initial block heights, in order to avoid crowding at important blocks. To optimize further, applications could stop looking at a certain value of `K`. For example, if we only look until `K = 10`, the minimum write time is a week, and the application is guaranteed to never have to look through more than 1/1024 of the blockchain, which would be 25 MB/year currently.
Spam prevention
---------------
Issues
======
Protocol interference
---------------------
It's possible that non-Vespucci messages might begin with the Vespucci and application prefixes, either by chance or by malice. Given this, Vespucci clients should tolerate:
* Malformed addresses
* Addresses pointing to malicious bootstrap nodes
* [Decompression attacks](
https://en.wikipedia.org/wiki/Zip_bomb) taking advantage of the compression protocol
Related Work
============
[Blockname](
https://github.com/telehash/blockname) is a similar project, a Bitcoin blockchain DNS cache. Blockname is trying to solve a slightly harder problem, that of creating a 1:1 mapping from domain name to server address. In Vespucci, each application may return a set of addresses.