Latest posts of: paulkernfeld

Show Posts
Pages: [1]

Bitcoin / Project Development / Re: Using the Bitcoin blockchain for file sharing

on: May 01, 2016, 12:57:10 PM

Oops, sorry everybody, I didn't see that there were responses to this! Thanks everybody for taking the time to read and think about this idea! Quote Unfortunately it sounds like a way to bloat the blockchain with all that data. And I think the amount of data would be enourmous. I agree that this is a very important issue. In Exandria, files and metadata aren't stored in the blockchain; I'm being careful to keep the amount of data stored in the blockchain to an absolute minimum. The blockchain only stores identities, which are pointers to metadata streams that are stored off the blockchain. Those are in turn pointers to files. Quote can you explain how your system is better than Storj The interesting thing that Exandria does is provide a global, searchable index of file metadata, which isn't something that STORJ is trying to do. So, STORJ is good for storing private files, and Exandria is for storing publicly-accessible files. Quote How it is different from torrent's ? Exandria uses torrents as the way to exchange the actual files. It's more like a substitute for a torrent index (e.g. Pirate Bay). Quote What are some major use cases for a library like this? I'm guessing file sharing, but it could also be used for advertising, exchanging academic papers, publicly sharing leaks as in Wikileaks, and probably a bunch of other things that I'm not specifically envisioning. Quote What you also need to keep an eye on is the possibility of illegal content (such as CP or whatever) being shared via the Blockchain and, as the Blockchain is downloaded automatically by Bitcoin clients, such content would spread rapidly and uncontrollably. That also applies to executable code, etc. Yeah, this is an interesting issue. I have a couple things to say about this: 1. The community will have the ability to moderate the content, so they can decide that CP or viruses should be downvoted. That's what I'm hoping to achieve with the "Eigentrust" scheme. 2. The actual files aren't stored in the blockchain, so nothing is downloaded automatically. 3. You can already write illegal content into the Bitcoin blockchain. Quote Cool idea, but the part I dislike is this ---> " One writes to the library by burning bitcoins. " .... Is this intentional or are there any other way to keep the Bitcoins spend on this in circulation? Yeah, this is intentional. In order to prevent spamming, the goal is to make people actually destroy their money to prove their commitment. Otherwise, anyone can write to the store however much they want. This is the same principle as requiring miners to waste electricity to mine Bitcoins. If the electricity could be reused for something else, then it wouldn't be a way to prove commitment. Quote Why do you refrain from talking about some people disagreeing with using Bitcoin as a data storage medium? This is one of the fundamental problems with your project and it should be discussed. That's a pretty reasonable perspective. My point of view is that this discussion has been had many times in many places, and it ultimately boils down to what you believe Bitcoin is. Most people believe that Bitcoin is a system for sending money. I believe that Bitcoin is a shared world-writable ledger. To me this seems more like a matter of opinion that something that can really be argued.

Bitcoin / Project Development / Using the Bitcoin blockchain for file sharing

on: April 10, 2016, 12:33:25 PM

Hey everyone, I would love to hear your feedback on this idea for using Bitcoin as the basis for a file library. I put a pretty version of this document as a gist: https://gist.github.com/paulkernfeld/4278533bf83887f6f0ee67765c66d54d. I am well aware that some people disagree with using Bitcoin as a data storage medium. I'm not particularly interested in having that discussion in this thread, though. Exandria: A decentralized public file library There are many great file repositories online: WikiLeaks and The Internet Archive are two prominent examples. Even these two notable sites are centralized, however. This means that censorship-happy governments can and do find ways to attack these sites by going after their founders, servers, or DNS records. This article describes Exandria, a design for a file library with [decentralized ownership](paulkernfeld.com/2016/02/19/world-writable.html). Any owner can vote to add or remove content, making Exandria extremely difficult to censor. Properties: - Searching: users can search filenames - Downloading: the ability to download files - Censorship-resistant: it's hard to censor files - Spam-resistant: it's hard to spam the system - Zero-configuration: a new user can join the network instantly, without an invite Many systems have some of these properties, but not all of them. If you know of another system that does or tries to do this, I would love to hear about it! Overview -------- * The Bitcoin blockchain is used as the canonical data source, containing links to file metadata. * One writes to the library by burning bitcoins. * Anyone can read from the library. * All data is transferred via P2P networks. Data Model ========== The file library is represented as a mutable weighted set of files. * Set: an unordered collection of files. * Mutable: the library can change over time (i.e. files can be added and removed). * Weighted: each file has a weight, to represent that some files are more important than others. Each file has a title, an extension, and a reference to the content of the file. Here's an example of what this set might look like. Note that the "magnet" field contains a hash of the content of the file, which doubles as a way to locate the file. ```json { "name": "Moby Dick", "ext": "epub", "magnet": "magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C", }, { "name": "Scott Joplin - The Entertainer", "ext": "mp3", "magnet": "magnet:?xt=urn:sha1:NAE52SJUQCZO5CYNCKHTQCWBTRNJIV4W", }, ... ``` Delta Encoding -------------- Delta encoding is a way to represent a set using an append-only log. Changes to the set are represented as additions and removals. A metadata stream "add" entry might look like this: ```json { "op": "add", "value": { "etx": "epub", "magnet": "magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C", "title": "Moby Dick" } } ``` Each add entry will have a hash, which will be a hash of the serialized JSON of `value`. Multiple identical add entries will have the same hash, although it's possible for two add entries to be functionally identical but to have different hashes, e.g. if the JSON has different whitespace. A metadata "remove" entry simply refers to the hash of an entry that has been added: ```json { "op": "remove", "hash": "c3f0fe269c05c3438d57c40526bd016d" } ``` Weights ------- Since this is a weighted multi-set, we need some way to make weights work with delta encoding. We can say that each "add" operation increments the weight of an element and each "remove" operation decrements the weight of the element. An element is only a member of the final set if it has a positive weight. This delta encoding: ```json { "op": "add", "value": "cat", "weight": 5 }, { "op": "add", "value": "cat", "weight": 8 }, { "op": "add", "value": "hat" "weight": 4 }, { "op": "remove", "hash": "80c7384a6339a053baee278cb13e578c", // the hash of "hat" "weight": 6 } ``` ...results in this set ("hat" has weight -2, so it's excluded): ``` { "value": "cat", "weight": 13 } ``` Architecture ============ [diagram 1] The protocol consists of three layers: 1. A layer which stores identities and commitments in the Bitcoin blockchain 2. A layer for storing streams of file metadata 3. A layer that can retrieve a file given a magnet link, using BitTorrent Identities and commitments ========================== Unlike in many other systems, identities are not stored in a centralized database. Instead, they are stored on the Bitcoin blockchain. Each identity is just a cryptographic key pair. Identity public keys are recorded onto the Bitcoin blockchain. You only need an identity to write to Exandria; you can read from it without an identity. There are no moderators or administrators; all users are of the same type. Identity weights ---------------- In order to become a contributor to the library, the user must [burn](https://en.bitcoin.it/wiki/Proof_of_burn) some bitcoins to prove their commitment. This prevents spamming. Each identity has an associated "weight," which is the total amount of bitcoins burned by that identity. This is used to relatively prioritize the contributions of each identity; the files posted by an identity with a higher weight will be ranked higher in search results. Writing ------- In order to create a new identity or increase the commitment of an existing identity, the user burns some bitcoins. Reading ------- In order to download a list of all identities, a client can look through the Bitcoin blockchain. This can be optimized to work with [simplified payment verification](https://en.bitcoin.it/wiki/Thin_Client_Security). Implementation -------------- Burns will use a [burn stream](https://github.com/paulkernfeld/burn-stream). Each message will be a single commitment. It will include a version byte (`0x00`) and then a public key. Metadata Streams ================ A file library requires storing a lot of data, but storing data on the Bitcoin blockchain is very expensive. Therefore, only identities are stored on the blockchain, and file metadata is stored separately, in metadata streams. Each identity has exactly one metadata stream, which is a distributed append-only log. The stream is identified and discovered by the public key of the identity, and the contents of the log must be signed by the identity's private key. This paradigm is already used in [secure-scuttlebutt](https://github.com/ssbc/secure-scuttlebutt) and [ppspp](https://tools.ietf.org/html/rfc7574). A metadata stream contains the add and remove operations allowing the user to make modifications to the global set of files. The "weight" of an add or remove entry is measured in bitcoins. It is the weight of the identity divided by the total number of entries in the identity's metadata stream. Searching --------- In order to search, the client will need to first download and index all metadata entries from all identities. Then, the client can perform the search locally. As new entries come in, the client will need to update its search index. Retrieving Files ================ Each metadata entry will contain the a magnet link to a file, as well as the extension of the file. Given this, the file can be retrieved using BitTorrent and saved to the user's file system. Issues ====== Incentives ---------- People with a legitimate interest in the store must be willing to burn Bitcoins, so this scheme relies on the goodwill of people. Here are some parties who might participate and affect the content of the library: - Individuals spending money on their hobbies - Non-profits donating money to fight censorship - Governments spending money to censor information - Corporations or lobby groups spending money on advertising Censorship is inevitable ------------------------ If we don't allow items to be removed from the file library, will we have a library without censorship? Not really. Even without removal, content can be effectively censored by creating spam entries with the correct metadata but incorrect magnet links. Therefore, instead of trying to eliminate censorship, the goal of this design is to create a library with a good signal:noise ratio. Future upgrades =============== Eigentrust ---------- An [Eigentrust](https://en.wikipedia.org/wiki/EigenTrust)-style trust graph would allow identities to testify for or against each other. The trust graph will make it so that control over content is given to a majority. Without a trust graph, a single bad actor can cause a lot of disruption by deleting popular content. With a trust graph, the community can vote to silence bad actors. Of course, this is a type of censorship, but censorship is inevitable and community censorship is probably the best kind of censorship. Each node in the trust graph is an identity. Each edge is directed and has a weight between -1 and 1. The starting weight for each node is the node's weight from burning bitcoins. The end weight of each node is computed by taking the eigenvalue of the graph. [diagram 2] Subjective trust filters ------------------------ Users could add custom client-side trust filters to improve the signal:noise ratio. For example, I could make a filter that downweights content from known scammers, and upweights content from sources that I find most interesting. Since these filters will be subjective, it's possible for many such filters to exist. Trust filters would be shared out-of-band. Search scalability ------------------ If each metadata JSON is limited to 1000 bytes, then we'll be able to store one million search records in 1 GB. This is reasonable for a personal computer, but at some point, the size of the metadata itself is going to become prohibitive. Eventually, a client should be able to search the metadata without storing all metadata locally. This is challenging, because the client must quickly retrieve data from other computers and verify that that data is valid. One potential solution would be to maintain a distributed and easy-to-verify trie of search terms, much like Bitcoin's [ultimate [Suspicious link removed]pression](https://bitcointalk.org/index.php?topic=88208.0) proposal. This would let nodes download information from the trie when performing a search. Another potential solution would be to have nodes only download identities (not file metadata) on startup, giving them information about which identities are trustworthy. Then, nodes could perform searches by querying trusted identities, using an RPC-style setup. Additional metadata ------------------- Many types of metadata could be added to files, e.g. book author, film/song length, or image resolution. This could add a lot of complexity, since not all types of file have well-standardized metadata specifications.

Bitcoin / Project Development / Looking for feedback on a simple decentralized username registration design

on: February 24, 2016, 06:01:08 PM

If you want to see an updated and nicely formatted version of this, see https://gist.github.com/paulkernfeld/c1411466c53d4bc17f8c. BurnName is a proposal for a simple and practical decentralized name registration system. In BurnName, users burn bitcoins to bid for names. The user who bids the most for a name owns that name, after a delay. BurnName uses the existing Bitcoin blockchain and network, and it can be used by various applications. Example applications * GitTorrent or a similar "decentralized GitHub" * Domain name registration: your username doubles as a domain name * Human-memorable Bitcoin addresses: map from your username to your Bitcoin address Background In early 2011, Aaron Swartz theorized that it is possible to build a decentralized system that provides secure, human-meaningful names. Since then, several decentralized naming systems have been built, but so far none have become widely used. So, why another system? I believe that BurnName has a novel combination of efficiency, practicality, and careful mechanism design. Properties * Efficient: BurnName works with simplified payment verification clients, which require 10,000x less bandwidth than a full Bitcoin client. * Simple: BurnName is a minimal layer on top of Bitcoin. * Egalitarian: There is no minimum price for registering a username; anyone can afford a username in BurnName. * Squat-resistant: You cannot make money by squatting a username. * Slow seizures: Seizing a BurnName username is slow. * Bitcoin-only: BurnName uses only the Bitcoin network. You can easily build a BurnName client using ordinary Bitcoin libraries. * Upgradeable: BurnName is designed to allow protocol upgrades. * Extensible: A BurnName name can be used for arbitrary applications. Design In BurnName, a name is a string of 1–16 characters, allowing only lower-case letters, numbers, and dashes. A dash may only occur between two alphanumeric characters. Names are designed to be used like usernames. To store data, BurnName uses burn-stream, which consists of a series of special Bitcoin transactions. There are two types of operations: a user can bid for a name and attach data to a Bitcoin address. To bid for a name, a user burns bitcoins. When you bid for a name, you're bidding for that name to be owned by a Bitcoin private key (presumably a private key that you own). Anyone may bid for any name at any time. The owner of a name is the key that has bid the most in total for that name, accounting for a waiting period. Names do not expire. To attach data to an address, the owner of that name publishes information to the Bitcoin blockchain. Only the owner of an address can attach data to that address. Any application can use these BurnName names by attaching its own data. To read the data associated with a name, a BurnName client first looks up the address for a name, then looks up the data associated with that address. Mechanism Design In Hanoi, under French colonial rule, a program paying people a bounty for each rat tail handed in was intended to exterminate rats. Instead, it led to the farming of rats. Designing incentives is difficult, and even a tiny detail can be the difference between a smoothly functioning system and chaos. BurnName takes mechanism design very seriously. The key tradeoff in decentralized namespace design is the struggle between squatting and seizure. If names are easy to keep for a long time, squatting is easy. If names are hard to keep for a long time, they can be easily seized. Burning In BurnName, there is no domain authority who gets paid when a bid occurs. Instead, the bitcoins spent to bid for a name are destroyed by being sent to an unspendable address. This is equivalent to distributing the payment amongst all holders of bitcoins. Name pricing BurnName uses an auction model for determining ownership, instead of a fixed-fee model. In BurnName, the owner of a name is the key who has paid the most for it. In fixed-fee models like DNS, Namecoin, and Blockchain ID, there are predetermined fees for registering and renewing names. Fixed-fee models attempt to prevent squatting by charging a registration price for names. By doing this, they price out poorer users from the system. A fixed fee is strange since names are worth vastly different amounts in practice. At any conceivable fee, it would still be worth it to squat the name "apple." In an auction model, students or users in countries with weak currencies can easily afford their own usernames. If I pick a username that I like but that no one else wants ("occupy-paul-st"), then I can get a great deal on it. Section 6.2 of Kalodner et al. describes some very interesting ideas for pricing names based on text frequency, name length, and other factors. Of course, this bakes fixed assumptions into the pricing model, which may or may not suit the market. For example, if names were priced using English text frequency, then Chinese names would be underpriced and vulnerable to squatting. This continuous auction scheme allows names to be taken over by higher bidders; BurnName's waiting period provides a chance to respond to this. Ownership formula In BurnName, the owner of a name is whoever has bid the most in total for that name, combined with a waiting period. Each bid has four important properties: * the name itself * the bidder, a Bitcoin key * the amount bid * the time of the bid (the block header timestamp) The first bid on any name is always valid. Subsequent bids on a name become valid after a delay of 14 days, to prevent surprising seizures. Simplicity BurnName's mechanism for ownership is extremely simple and easy to analyze; you know what you're getting. Buying a name is an� instance of a continuous all-pay auction. Squatting Squatting is common with DNS, Counterparty Assets, and Namecoin. Squatting increases name costs and decreases availability; instead of the auction happening on the primary market, squatters create ad hoc secondary markets in which they can auction names. BurnName is resistant to squatting with the intent to sell the name (squatting for other reasons is still possible). Here's the reasoning: A squatter can buy the name at price y and sell it to me at price z. Since the squatter paid y, I can outbid the squatter by paying y + ε. If y < z, then it doesn't make sense for me to buy the name from the squatter; I can outbid the squatter for cheaper (for the price y + ε). If y > z, then the squatter would lose money by paying more for the name than they're selling it for. This means that y = z, i.e. the squatter can, at best, break even. Of course this is only a break-even if the squatter is 100% sure that they can sell every squatted name; in practice, squatters would lose money due to unsold names. Due to the waiting period, it's possible that squatters could offer an "expedited name service," but this would still be risky for them. Seizure Paradoxically, seizing a name can be good or bad. If I have reserved the name "satoshi," I think you would agree that Satoshi Nakamoto has every right to take it from me! On the other hand, if I have reserved the name "occupy-paul-st," you probably wouldn't want the North Korean government to take it from me. Of course, the moral merits of these two situations cannot be encoded into an algorithm. Disallowing name seizures doesn't quite work: somewhat bizarrely, the Namecoin developers do not own "namecoin.bit" and they have no way of obtaining it, since Namecoin allows perpetual renewals. In BurnName, the community will "vote with its wallet" on a seizure. If someone attempts to seize the username of a respected activist, the community will contribute money to let the activist keep the name. On the other hand, if Satoshi Nakamoto takes the name "satoshi" from a squatter, no one will help the squatter. BurnName's waiting period is based on the beliefs: 1. It is good to have advance notice before a name you own is seized. 2. Waiting is acceptable in the case of a legitimate name takeover, whereas waiting is unacceptable in the case of attacks. Details Burn stream IDs To identify its burn stream, BurnName will use the address 1BurnNameXXXXXXXXXXXXXXXXXXXZtJEfN. The testnet address is mvBurnNameXXXXXXXXXXXXXXXXXXd6F2jp. BurnName's OP_RETURN prefix is 0xCDB6 (Ͷ in UTF-8). Message types Each BurnName OP_RETURN output script will include a message type after the burn stream prefix. This field will allow the BurnName protocol to be extended in the future with the addition of new message types. Code: Value Message type --------------------- 00 Bid 01 Data Bidding for a name Bidding for a name is a special burn stream string that asserts that a particular name should belong to a particular Bitcoin key. The bidder is determined to be the key associated with the first input of the transaction. To bid for a name, we'll set the message type to 0x00. Here's an example bidding output script that bids for the name paul. Code: Hex ASCII Purpose ------------------- 6a Bitcoin script: OP_RETURN 07 Bitcoin script: push 7 bytes cd BurnName prefix b6 BurnName prefix 00 Message type: bid 70 p Name I'm bidding for 61 a Name I'm bidding for 75 u Name I'm bidding for 6c l Name I'm bidding for Attaching data to an address You may attach data to a Bitcoin address. The data attached to an address is an append-only log of binary strings. Note that the data isn't associated directly with a name; instead, the address owns a name, and the data is attached to the address. The address to which to attach data is determined by the key of the first input to the transaction. Each application should identify its own data by including a prefix. Currently, the default Bitcoin client only relays OP_RETURN transactions of 80 bytes or less, so it is discouraged to attach strings of over 77 bytes. To attach data with a name, use the message type 0x01. Here's an example data-attaching Bitcoin script that attaches the data hello to an address. Code: Hex ASCII Purpose ------------------- 6a Bitcoin script: OP_RETURN 08 Bitcoin script: push 8 bytes cd BurnName prefix b6 BurnName prefix 01 Message type: data 68 h Data 65 e Data 6c l Data 6c l Data 6f o Data Comparison: Namecoin Namecoin, created in 2011, was supposed to provide secure, decentralized, human-meaningful names. It built a strong community and became one of the most famous and valuable altcoins. But when measured in 2015, all but 745 of 196,023 names were owned by squatters [Kalodner et al. 2015]. To build a successful decentralized name registration system, it's important to understand where Namecoin went wrong. Here are the most likely possibilities, in my opinion: 1. Mechanism design 2. Implementation complexity 3. Ease of use 4. No one wants a decentralized naming system Mechanism design: squatting vs. seizure In Namecoin, the price for name registration is fixed, and renewals are free. Unfortunately, the fees were too low, and squatters took advantage of this. BurnName uses a continuous auction system completely different from Namecoin's fixed-fee system. For more detail, see name pricing above. Implementation complexity Namecoin has its own independent blockchain with its own rules, using a fork of the Bitcoin client; it would be very difficult to build an independent Namecoin client. In contrast, BurnName is a layer on top of Bitcoin, so a new client could be built using standard Bitcoin libraries. Ease of use The BurnName reference implementation will use webcoin, meaning that it will work in the browser. This will make it easier for casual users to get started with BurnName. It should be easy to get started with writing to BurnName since bitcoins are used to write. To write with Namecoin, you need Namecoins, which are harder to get. Does anyone care? In my opinion, this is still an open question. Perhaps no one really needs decentralized naming! This question can only be answered after the design and implementation challenges have been surmounted. Comparison: Blockchain ID BurnName is largely inspired by Blockstack's Blockchain ID. Here are a few of the differences: * Efficiency: BurnName works with SPV clients, which use over 10,000x less bandwidth than full clients. This means that startup time will take a few minutes instead of a few days. That said, some people think that BurnName's SPV strategy is an abuse of the Bitcoin blockchain. * Scope: The Blockchain ID standard encompasses authentication, verification of other IDs, and user profiles. BurnName is only about registering usernames. * Pricing: Blockchain ID uses a fixed-fee username allocation model, whereas BurnName uses an auction. Comparison: blockname Blockname from telehash was the first system that proposed using the Bitcoin blockchain for storing name data. It also uses an auction mechanism for pricing. BurnName is very similar to blockname. Here's how it differs: * Efficiency: BurnName works with SPV clients, whereas blockname does not. * Scope: Blockname is designed as a full DNS system, while BurnName only associates names with data. * Pricing: BurnName uses the sum of all bids from an address, whereas blockname uses the single highest bid. Possible extensions These are things that don't seem crucial as part of the MVP, but might be nice later. Name transfer It might be good if names could be tranferred between owners. This would introduce some trickiness, since squatters could potentially expedite the process of name purchase. More valid characters Speakers of non-English languages, particularly those not using the Latin alphabet, would probably appreciate the availability of additional characters. This introduces security risks such as IDN homograph attacks. There are also payload size concerns, since unicode characters often require multiple bytes and space on the Bitcoin blockchain is very expensive. Name-at-time It should also be possible to query who owned a name at a particular time. For example, the string apple:2016-04-01 could resolve to the owner of the name Apple on April 1, 2016, rather than the current owner. This could be used to make links permanent while still human-memorable. Support for other inputs Theoretically, a name could be owned in other ways than with a Bitcoin public key. If pay to script hash inputs were allowed to bid for names, this would allow multisignature name ownership, among other things. Front-running protection If necessary, front-running protection like that of Namecoin could be implemented pretty easily, although I doubt that front-running will be a problem. Trust graph An EigenTrust-style trust graph could be constructed between names in order to allow name owners to reinforce each others' rights of ownership. This could prevent unpopular parties from seizing names. Of course, it would also allow mobs to more easily remove someone's rights to a name. Thanks BurnName steals countless ideas from Namecoin, blockname, and Blockchain ID. Thanks to @xloem on github, Rafi Shamim, and Jordan Lewis for feedback.

Bitcoin / Development & Technical Discussion / Re: Ultimate blockchain c*mpression w/ trust-free lite nodes

on: January 09, 2016, 02:09:03 PM

Is there a plan to deal with parties that want to store arbitrary data in the blockchain? Right now, OP_RETURN is used as a way to prevent non-financial data from bloating UTXO trees. However, if someone wants to store arbitrary data in the blockchain, ultimate blockchain c*mpression might encourage them to store this data by using fake addresses, because that way they would get much faster lookups and they could basically use this to build an efficient key-value store. Is this just accepted as something that will inevitably happen?

Bitcoin / Development & Technical Discussion / Re: Using the Bitcoin blockchain for P2P discovery?

on: January 02, 2016, 10:40:35 AM

Quote There is no way for the client to know which blocks contain the transactions with the ip addresses, so it would need to download all of the blocks. I suppose you could say that it would only need to download blocks from a certain point onward, but that would still grow to be very large after some time. This is exactly the problem with previous proposals: downloading the entire blockchain would just take too long. The way I'm proposing to solve this is to predetermine which blocks to look at as a part of the protocol. E.g. instead of looking at blocks Code: 0, 1, 2, 3, 4, 5, 6 , a client would look at blocks Code: 0, 4, 6, 2 . This increases write time, because you have to wait for the right block height in order to write. Thanks for the critiques! This is definitely helping me to improve the proposal. I have updated the Gist to include this stuff.

Bitcoin / Development & Technical Discussion / Re: Using the Bitcoin blockchain for P2P discovery?

on: January 01, 2016, 08:37:34 PM

Hmm, I was hoping to avoid downloading the entire blockchain. I agree with you that downloading the entire blockchain for discovery is impractical, because a new application would require hours or days to start up. With my design, I believe that the client only needs to download all block headers, not all blocks. The client then requests a subset of the actual blocks. In order to accomplish this, I'm making big sacrifices on write speed (e.g. you might wait months to write data).

Bitcoin / Development & Technical Discussion / Re: Using the Bitcoin blockchain for P2P discovery?

on: January 01, 2016, 08:23:21 PM

Ah! Sorry, I misunderstood your original question. I am assuming that the new node is able to connect to the Bitcoin network, and that it uses the Bitcoin network to find discovery information for another peer-to-peer network. Does that make any sense?

Bitcoin / Development & Technical Discussion / Re: Using the Bitcoin blockchain for P2P discovery?

on: January 01, 2016, 07:50:13 PM

Good catch! I definitely did not make that clear. Basically, the answer to that is that nodes would have to download all block headers first, in the manner of a Satoshi SPV client (https://en.bitcoin.it/wiki/Thin_Client_Security). Then, they would request full blocks in a specific order. Does that answer your question?

Bitcoin / Development & Technical Discussion / Using the Bitcoin blockchain for P2P discovery?

on: January 01, 2016, 01:18:59 PM

Hey everyone, I would love to hear feedback on whether this idea is worth pursuing, and, if so, how it might be improved. Here's a link to a prettier version of this: https://gist.github.com/paulkernfeld/7126c1307fd46561df9c Preface: Blockchain Storage Politics --------------------------- The issue of whether the Bitcoin protocol should be used for data storage is [contentious](https://github.com/bitcoin/bitcoin/pull/5286), and I apologize to anyone who thinks this post is in poor taste. If you don't mind, I'd like to limit this particular discussion to the technical merits of Vespucci, rather than the moral merits. Vespucci ======== Vespucci is a proposed protocol that uses the Bitcoin blockchain for decentralized application discovery. The Problem =========== In order for an application to join a P2P network, the application must somehow find the IP address of one or more peers to connect to. This is called bootstrapping, and it can be difficult. BitTorrent and Bitcoin clients often include constant addresses of "bootstrap nodes," long-lived server nodes. This solution works only if these long-lived server nodes remain up, which creates a potential point of failure. So, this protocol is designed to provide discovery for applications with the following requirements: 1. The P2P application needs to download a global list of peers when it's first opened. 2. Anyone should be able to register bootstrapping peers for discovery. 3. Read performance, i.e. the time to read peers and join the network, is very important. 4. Write performance, i.e. the time to register a new peer, is very unimportant. 5. The ability to discover the Bitcoin network is assumed. The Protocol ============ Addresses of peers will be stored using [OP_RETURN](https://en.bitcoin.it/wiki/OP_RETURN) transaction outputs. In order to discover peers for the application, it will look through the blockchain, returning relevant transactions to the application. What is stored? --------------- The data pushed after OP_RETURN will consist of: 1. Two bytes, `V0` in ASCII, identifying the message as belonging to the Vespucci protocol. 2. A few additional bytes identifying the application. 3. A zero byte to mark the end of the application ID. 4. A list of compressed addresses. Since the Blockchain is a shared resource, we want to be sure to use it wisely. We can use space efficiently by compressing addresses and allowing multiple addresses to be batched into the same transaction. Addresses --------- Uncompressed addresses will be zero-terminated generic URIs. This allows us to support IP addresses as well as hostnames. `scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]` Information to include or not include: * We probably don't need to store `scheme` (e.g. `http` or `magnet`), because that information can probably be inferred by the application. * It doesn't make much sense to include a `user` and `password` in Bitcoin, a publicly-viewable store! * The `host` field will always be populated * The `port`, `path`, `query`, and `fragment` fields may be populated Compression ----------- In order to compress URLs, we'll want to use a small-string compressing library specifically trained on URL-looking data. [shoco](http://ed-von-schleck.github.io/shoco/) allows users to do [just this](http://ed-von-schleck.github.io/shoco/#generating-compression-models). Look order ========== The Bitcoin blockchain is a log-structured data store, optimized for great write performance, but not designed for reading. The blockchain currently grows by about 25 GB/year, and is not indexed only by time. This presents a dilemma: a P2P application that's five years old would have to look through 125 GB of data if searching linearly, even in the unlikely event that the block size limit is not increased. That's a lot of data to look through just to get some addresses! So, how can we turn the write-optimized blockchain into a read-optimized discovery store? To solve this, we look through blocks in an order that maximizes read performance while greatly sacrificing write performance. The algorithm will be as follows: * WLOG, label the first block that we care about block 0. Label subsequent blocks in [height](http://bitcoin.stackexchange.com/questions/18561/definition-of-blockchain-height) order: 1, 2, 3, ... * Define the current maximum block height as `M`. 1. Set integer `K` such that `K := ceiling(log2(M))`. 2. Look at all non-looked-at blocks where block number `B` is a multiple of `2 ^ K`, in descending order. 3. If `K > 0`, set `K := K - 1`. Otherwise (`K = 0`), we have looked at all blocks. This can also be thought of as writing block heights in binary and counting the number of zeros at the end. An example: Code: Blocks: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Binary: 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 End zeros: 4 0 1 0 2 0 1 0 3 0 1 0 2 0 Look order: 0 8 12 4 10 6 2 13 11 9 7 5 3 1 Just as when picking port numbers, applications should try to avoid synchronizing initial block heights, in order to avoid crowding at important blocks. To optimize further, applications could stop looking at a certain value of `K`. For example, if we only look until `K = 10`, the minimum write time is a week, and the application is guaranteed to never have to look through more than 1/1024 of the blockchain, which would be 25 MB/year currently. Spam prevention --------------- Issues ====== Protocol interference --------------------- It's possible that non-Vespucci messages might begin with the Vespucci and application prefixes, either by chance or by malice. Given this, Vespucci clients should tolerate: * Malformed addresses * Addresses pointing to malicious bootstrap nodes * [Decompression attacks](https://en.wikipedia.org/wiki/Zip_bomb) taking advantage of the compression protocol Related Work ============ [Blockname](https://github.com/telehash/blockname) is a similar project, a Bitcoin blockchain DNS cache. Blockname is trying to solve a slightly harder problem, that of creating a 1:1 mapping from domain name to server address. In Vespucci, each application may return a set of addresses.

Bitcoin / Project Development / Bitcoin Script Explorer, a tool for learning about Bitcoin Script

on: May 11, 2014, 10:19:47 PM

I wrote this tool to help people learn about how Bitcoin Script works. It's a toy Bitcoin Script compiler/decompiler that runs in a sandbox in the browser. I used ClojureScript for the compiler logic, and normal Javascript for everything else. Hopefully this can help some people learn about the bizarre world of Bitcoin Script. The tool is here: http://paulkernfeld.com/bse/ The code is here: https://github.com/paulkernfeld/bitcoin-script-explorer Bug reports and other feedback are welcome!

Pages: [1]