Bitcoin Forum
December 14, 2024, 11:39:22 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Using the Bitcoin blockchain for file sharing  (Read 1992 times)
paulkernfeld (OP)
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile WWW
April 10, 2016, 12:33:25 PM
 #1

Hey everyone,

I would love to hear your feedback on this idea for using Bitcoin as the basis for a file library.

I put a pretty version of this document as a gist: https://gist.github.com/paulkernfeld/4278533bf83887f6f0ee67765c66d54d.

I am well aware that some people disagree with using Bitcoin as a data storage medium. I'm not particularly interested in having that discussion in this thread, though.



Exandria: A decentralized public file library

There are many great file repositories online: WikiLeaks and The Internet Archive are two prominent examples. Even these two notable sites are centralized, however. This means that censorship-happy governments can and do find ways to attack these sites by going after their founders, servers, or DNS records.

This article describes Exandria, a design for a file library with [decentralized ownership](paulkernfeld.com/2016/02/19/world-writable.html). Any owner can vote to add or remove content, making Exandria extremely difficult to censor.

Properties:

- Searching: users can search filenames
- Downloading: the ability to download files
- Censorship-resistant: it's hard to censor files
- Spam-resistant: it's hard to spam the system
- Zero-configuration: a new user can join the network instantly, without an invite

Many systems have some of these properties, but not all of them. If you know of another system that does or tries to do this, I would love to hear about it!

Overview
--------
* The Bitcoin blockchain is used as the canonical data source, containing links to file metadata.
* One writes to the library by burning bitcoins.
* Anyone can read from the library.
* All data is transferred via P2P networks.

Data Model
==========
The file library is represented as a mutable weighted set of files.

* Set: an unordered collection of files.
* Mutable: the library can change over time (i.e. files can be added and removed).
* Weighted: each file has a weight, to represent that some files are more important than others.

Each file has a title, an extension, and a reference to the content of the file.

Here's an example of what this set might look like. Note that the "magnet" field contains a hash of the content of the file, which doubles as a way to locate the file.

```json
{
    "name": "Moby Dick",
    "ext": "epub",
    "magnet": "magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C",
},
{
    "name": "Scott Joplin - The Entertainer",
    "ext": "mp3",
    "magnet": "magnet:?xt=urn:sha1:NAE52SJUQCZO5CYNCKHTQCWBTRNJIV4W",
},
...
```

Delta Encoding
--------------
Delta encoding is a way to represent a set using an append-only log. Changes to the set are represented as additions and removals.

A metadata stream "add" entry might look like this:

```json
{
    "op": "add",
    "value": {
        "etx": "epub",
        "magnet": "magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C",
        "title": "Moby Dick"
    }
}   
```

Each add entry will have a hash, which will be a hash of the serialized JSON of `value`. Multiple identical add entries will have the same hash, although it's possible for two add entries to be functionally identical but to have different hashes, e.g. if the JSON has different whitespace.

A metadata "remove" entry simply refers to the hash of an entry that has been added:

```json
{
    "op": "remove",
    "hash": "c3f0fe269c05c3438d57c40526bd016d"
}
```

Weights
-------
Since this is a weighted multi-set, we need some way to make weights work with delta encoding. We can say that each "add" operation increments the weight of an element and each "remove" operation decrements the weight of the element. An element is only a member of the final set if it has a positive weight.

This delta encoding:

```json
{
    "op": "add",
    "value": "cat",
    "weight": 5
},
{
    "op": "add",
    "value": "cat",
    "weight": 8
},
{
    "op": "add",
    "value": "hat"
    "weight": 4
},
{
    "op": "remove",
    "hash": "80c7384a6339a053baee278cb13e578c",  // the hash of "hat"
    "weight": 6
}
```

...results in this set ("hat" has weight -2, so it's excluded):

```
{
        "value": "cat",
        "weight": 13
}
```

Architecture
============
[diagram 1]

The protocol consists of three layers:

1. A layer which stores identities and commitments in the Bitcoin blockchain
2. A layer for storing streams of file metadata
3. A layer that can retrieve a file given a magnet link, using BitTorrent

Identities and commitments
==========================
Unlike in many other systems, identities are not stored in a centralized database. Instead, they are stored on the Bitcoin blockchain.

Each identity is just a cryptographic key pair. Identity public keys are recorded onto the Bitcoin blockchain.

You only need an identity to write to Exandria; you can read from it without an identity.

There are no moderators or administrators; all users are of the same type.

Identity weights
----------------
In order to become a contributor to the library, the user must [burn](https://en.bitcoin.it/wiki/Proof_of_burn) some bitcoins to prove their commitment. This prevents spamming.

Each identity has an associated "weight," which is the total amount of bitcoins burned by that identity. This is used to relatively prioritize the contributions of each identity; the files posted by an identity with a higher weight will be ranked higher in search results.

Writing
-------
In order to create a new identity or increase the commitment of an existing identity, the user burns some bitcoins.

Reading
-------
In order to download a list of all identities, a client can look through the Bitcoin blockchain. This can be optimized to work with [simplified payment verification](https://en.bitcoin.it/wiki/Thin_Client_Security).

Implementation
--------------
Burns will use a [burn stream](https://github.com/paulkernfeld/burn-stream).

Each message will be a single commitment. It will include a version byte (`0x00`) and then a public key.

Metadata Streams
================
A file library requires storing a lot of data, but storing data on the Bitcoin blockchain is very expensive. Therefore, only identities are stored on the blockchain, and file metadata is stored separately, in metadata streams.

Each identity has exactly one metadata stream, which is a distributed append-only log. The stream is identified and discovered by the public key of the identity, and the contents of the log must be signed by the identity's private key. This paradigm is already used in [secure-scuttlebutt](https://github.com/ssbc/secure-scuttlebutt) and [ppspp](https://tools.ietf.org/html/rfc7574).

A metadata stream contains the add and remove operations allowing the user to make modifications to the global set of files.

The "weight" of an add or remove entry is measured in bitcoins. It is the weight of the identity divided by the total number of entries in the identity's metadata stream.

Searching
---------
In order to search, the client will need to first download and index all metadata entries from all identities. Then, the client can perform the search locally. As new entries come in, the client will need to update its search index.

Retrieving Files
================
Each metadata entry will contain the a magnet link to a file, as well as the extension of the file. Given this, the file can be retrieved using BitTorrent and saved to the user's file system.

Issues
======

Incentives
----------
People with a legitimate interest in the store must be willing to burn Bitcoins, so this scheme relies on the goodwill of people.

Here are some parties who might participate and affect the content of the library:

- Individuals spending money on their hobbies
- Non-profits donating money to fight censorship
- Governments spending money to censor information
- Corporations or lobby groups spending money on advertising

Censorship is inevitable
------------------------
If we don't allow items to be removed from the file library, will we have a library without censorship?

Not really. Even without removal, content can be effectively censored by creating spam entries with the correct metadata but incorrect magnet links.

Therefore, instead of trying to eliminate censorship, the goal of this design is to create a library with a good signal:noise ratio.

Future upgrades
===============

Eigentrust
----------
An [Eigentrust](https://en.wikipedia.org/wiki/EigenTrust)-style trust graph would allow identities to testify for or against each other. The trust graph will make it so that control over content is given to a majority.

Without a trust graph, a single bad actor can cause a lot of disruption by deleting popular content. With a trust graph, the community can vote to silence bad actors. Of course, this is a type of censorship, but censorship is inevitable and community censorship is probably the best kind of censorship.

Each node in the trust graph is an identity. Each edge is directed and has a weight between -1 and 1. The starting weight for each node is the node's weight from burning bitcoins. The end weight of each node is computed by taking the eigenvalue of the graph.

[diagram 2]

Subjective trust filters
------------------------
Users could add custom client-side trust filters to improve the signal:noise ratio. For example, I could make a filter that downweights content from known scammers, and upweights content from sources that I find most interesting.

Since these filters will be subjective, it's possible for many such filters to exist. Trust filters would be shared out-of-band.

Search scalability
------------------
If each metadata JSON is limited to 1000 bytes, then we'll be able to store one million search records in 1 GB. This is reasonable for a personal computer, but at some point, the size of the metadata itself is going to become prohibitive.

Eventually, a client should be able to search the metadata without storing all metadata locally. This is challenging, because the client must quickly retrieve data from other computers and verify that that data is valid.

One potential solution would be to maintain a distributed and easy-to-verify trie of search terms, much like Bitcoin's [ultimate [Suspicious link removed]pression](https://bitcointalk.org/index.php?topic=88208.0) proposal. This would let nodes download information from the trie when performing a search.

Another potential solution would be to have nodes only download identities (not file metadata) on startup, giving them information about which identities are trustworthy. Then, nodes could perform searches by querying trusted identities, using an RPC-style setup.

Additional metadata
-------------------
Many types of metadata could be added to files, e.g. book author, film/song length, or image resolution. This could add a lot of complexity, since not all types of file have well-standardized metadata specifications.
SebastianJu
Legendary
*
Offline Offline

Activity: 2674
Merit: 1083


Legendary Escrow Service - Tip Jar in Profile


View Profile WWW
April 11, 2016, 09:55:22 PM
 #2

Unfortunately it sounds like a way to bloat the blockchain with all that data. And I think the amount of data would be enourmous. So I doubt there will be a big chance for that. There are free downloadways and p2p over bitcoin would cost something.

Please ALWAYS contact me through bitcointalk pm before sending someone coins.
popovicbit
Full Member
***
Offline Offline

Activity: 144
Merit: 101


View Profile
April 12, 2016, 05:40:41 PM
 #3

Interesting read.

So governments/people can pay to have files taken out...that's is kinda of cool.


What are some major use cases for a library like this?

Mondie
Newbie
*
Offline Offline

Activity: 1
Merit: 0


View Profile WWW
April 12, 2016, 10:21:14 PM
 #4

As the blockchain is already (has been some time since I looked that up, so I may have outdated information) 50+ GBs large, it would become pretty gigantic if used for file sharing purposes. What you also need to keep an eye on is the possibility of illegal content (such as CP or whatever) being shared via the Blockchain and, as the Blockchain is downloaded automatically by Bitcoin clients, such content would spread rapidly and uncontrollably. That also applies to executable code, etc.

It's a pretty bad idea, IMO.
Quartx
Hero Member
*****
Offline Offline

Activity: 1036
Merit: 504


Becoming legend, but I took merit to the knee :(


View Profile WWW
April 12, 2016, 10:30:49 PM
 #5

So if I want to store the block chain files in the blockchain

Imagine how big it will get.

Haha

Nice idea but no feasible with the already pretty humongous blockchain

btc_enigma
Hero Member
*****
Offline Offline

Activity: 692
Merit: 569


View Profile
April 13, 2016, 04:55:49 AM
 #6

Good idea, Have a look at Storj, can you explain how your system is better than Storj

RealBitcoin
Hero Member
*****
Offline Offline

Activity: 854
Merit: 1009


JAYCE DESIGNS - http://bit.ly/1tmgIwK


View Profile
April 13, 2016, 01:01:49 PM
 #7

I dont know why people want to cram everything into bitcoin.

The coins are task oriented, each one has different tak.

If you want file sharing use STORJ, or other similar ones!

Bitcoin is for wealth storage.

Kprawn
Legendary
*
Offline Offline

Activity: 1904
Merit: 1074


View Profile
April 13, 2016, 05:05:41 PM
 #8

Cool idea, but the part I dislike is this ---> " One writes to the library by burning bitcoins. " .... Is this intentional or are there any other way to keep the Bitcoins spend on

this in circulation? {Built in Mixer service} ? You will find some resistance to your idea, if people has to burn Bitcoin to use it. Also.. Why do you refrain from talking about some

people disagreeing with using Bitcoin as a data storage medium? This is one of the fundamental problems with your project and it should be discussed.  Huh

THE FIRST DECENTRALIZED & PLAYER-OWNED CASINO
.EARNBET..EARN BITCOIN: DIVIDENDS
FOR-LIFETIME & MUCH MORE.
. BET WITH: BTCETHEOSLTCBCHWAXXRPBNB
.JOIN US: GITLABTWITTERTELEGRAM
Patatas
Legendary
*
Offline Offline

Activity: 1750
Merit: 1115

Providing AI/ChatGpt Services - PM!


View Profile
April 13, 2016, 05:43:20 PM
 #9

How it is different from torrent's ? Doesn't torrent strongly stick to the idea of decentralization by establishing peer to peer networks ? Moreover ,block-chain simply stores the metadata and not the actual files so implementing kind of seems unfeasible .
paulkernfeld (OP)
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile WWW
May 01, 2016, 12:57:10 PM
 #10

Oops, sorry everybody, I didn't see that there were responses to this! Thanks everybody for taking the time to read and think about this idea!


Quote
Unfortunately it sounds like a way to bloat the blockchain with all that data. And I think the amount of data would be enourmous.

I agree that this is a very important issue. In Exandria, files and metadata aren't stored in the blockchain; I'm being careful to keep the amount of data stored in the blockchain to an absolute minimum. The blockchain only stores identities, which are pointers to metadata streams that are stored off the blockchain. Those are in turn pointers to files.

Quote
can you explain how your system is better than Storj

The interesting thing that Exandria does is provide a global, searchable index of file metadata, which isn't something that STORJ is trying to do. So, STORJ is good for storing private files, and Exandria is for storing publicly-accessible files.

Quote
How it is different from torrent's ?

Exandria uses torrents as the way to exchange the actual files. It's more like a substitute for a torrent index (e.g. Pirate Bay).

Quote
What are some major use cases for a library like this?

I'm guessing file sharing, but it could also be used for advertising, exchanging academic papers, publicly sharing leaks as in Wikileaks, and probably a bunch of other things that I'm not specifically envisioning.

Quote
What you also need to keep an eye on is the possibility of illegal content (such as CP or whatever) being shared via the Blockchain and, as the Blockchain is downloaded automatically by Bitcoin clients, such content would spread rapidly and uncontrollably. That also applies to executable code, etc.

Yeah, this is an interesting issue. I have a couple things to say about this:
1. The community will have the ability to moderate the content, so they can decide that CP or viruses should be downvoted. That's what I'm hoping to achieve with the "Eigentrust" scheme.
2. The actual files aren't stored in the blockchain, so nothing is downloaded automatically.
3. You can already write illegal content into the Bitcoin blockchain.

Quote
Cool idea, but the part I dislike is this ---> " One writes to the library by burning bitcoins. " .... Is this intentional or are there any other way to keep the Bitcoins spend on this in circulation?

Yeah, this is intentional. In order to prevent spamming, the goal is to make people actually destroy their money to prove their commitment. Otherwise, anyone can write to the store however much they want. This is the same principle as requiring miners to waste electricity to mine Bitcoins. If the electricity could be reused for something else, then it wouldn't be a way to prove commitment.

Quote
Why do you refrain from talking about some people disagreeing with using Bitcoin as a data storage medium? This is one of the fundamental problems with your project and it should be discussed.

That's a pretty reasonable perspective. My point of view is that this discussion has been had many times in many places, and it ultimately boils down to what you believe Bitcoin is. Most people believe that Bitcoin is a system for sending money. I believe that Bitcoin is a shared world-writable ledger. To me this seems more like a matter of opinion that something that can really be argued.
HeroCat
Hero Member
*****
Offline Offline

Activity: 658
Merit: 500


View Profile
May 03, 2016, 08:26:51 AM
 #11

I think it is not a good idea. Blockchain can be used only for BTC transfers. There are a lot of sites, which offer file sharing, and file sharing is not safe, you never know file is safe or not.  Wink
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!