I recommend anyone who wish to understand more about this phenomenal technology to read this guide, it is a 15-20 minute read, however after reading this you will get the essence of MaidSafe.
Introduction to MaidSafe: what it is, how it works, and how it compares to Bitcoin
Content:
Preface
Bitcoin
What is MaidSafe?
MaidSafe Advantages
How does MaidSafe work?
ConclusionPreface
Like many others, the discovery of Bitcoin in early 2013 led me down the rabbit hole of decentralized applications, and got me thinking of a world that enables true freedom of speech, personal privacy, sound money, and much more, all powered by trustless apps, free from human manipulation and corruption. The only problem is that current decentralized applications, including Bitcoin, are built upon an insecure, centralized system: the internet as we know it, and thus they are putting band-aids on a much bigger problem. MaidSafe3 is a project I came across recently whose lofty goal is to reverse this approach: build a decentralized, secure internet infrastructure upon which all apps will automatically be secure and decentralized.
I couldn't find any good easy-to-understand resource that explains the ins-and-outs of MaidSafe, and the information that does exist is dispersed on various websites, forums, and videos. I've seen a lot of misinformation about it floating around, so I hope to not only clarify how it works to others, but to also solidify my own understanding while writing this. Since many researching MaidSafe come from a Bitcoin background, I will bring it up throughout the post and will explain the major differences. Let's get started.
BitcoinLet's have a quick primer for how bitcoin works (skip this section if you're already well-versed). Bitcoin is a currency that solves difficult problems inherent to all digital currencies: how could it be that a digital currency is not controlled by any person or organization? If it's just software, what's to stop someone from hacking it to make it seem like they have more than they have, or making infinite copies of coins? The genius technology behind it is a public ledger (the blockchain) that everyone has a copy of. In other words, every node in the network is aware of every wallet balance and transaction. So if you attempt to send more money than your wallet (or address) has, not a single node in the Bitcoin network will approve the transaction or include it in the blockchain.
But what if your bitcoin address has 1 bitcoin in it, and you simultaneously send it to two addresses at once? Which one will be approved by the network? It is the Bitcoin miners' job to approve the transactions and add them to the ledger. Each miner will take a list of recent transactions (and only one of yours) and continuously add random characters to it (a nonce) to compute a unique hash (digital fingerprint) that meets a certain criteria. On average it takes miners 10 minutes of generating literally quadrillions of hashes per second around the world to find one that meets the minimum criteria (difficulty). The consequence is that the transactions that were used to calculate the correct hash are now approved by all the nodes of the network and added to the ledger, and the reward to the miner that discovered the hash is all the transaction fees, as well as freshly minted bitcoins. Because it takes time and hard work for computers to do this, there is enough time for all nodes to reach a consensus on the current state of the ledger, and it's practically impossible to maliciously reverse transactions that are in it.
What is MaidSafe?
While Bitcoin's purpose is a currency, MaidSafe, according to its homepage, is "a fully decentralized platform on which application developers can build decentralized applications. The network is made up by individual users who contribute storage, computing power and bandwidth to form a world-wide autonomous system." Users will be able to store any kind of information in a decentralized manner on the network, whether files, text, websites, or applications. An example application is a basic file-storage service like Dropbox. When you upload a file using an app powered by MaidSafe, behind the scenes it will break up your file into small chunks, encrypt2 each chunk such that no one knows what they are, and send them to the network. At least 4 copies of each chunk will be stored on nodes on the network around the globe. If a node or chunk becomes unavailable, the nodes connected to it immediately detect this and make another copy from one of the others. This same process is used for all data stored on the network. This also means that there are not any centralized hosting providers, such as AWS, where your data will be stored. MaidSafe is your cloud provider.
MaidSafe will have a single app already built-in: a currency named Safecoin. Like Bitcoin, it can be used to make any kinds of purchases, but more importantly, to facilitate payments for storage and services on the MaidSafe network. Unlike Bitcoin, however, there is no blockchain. It takes an entirely new approach that will be discussed in detail in the next section. There are absolutely no transaction fees, and transactions are confirmed and irreversible at network speed, usually less than a second. Any users can become farmers that will offer a portion of their hard drives as storage for the network, and will get paid in Safecoin when files are added and retrieved from their computers. The rank of a farmer's node(s) increases over time using the factors of availability, disk space, cpu, and speed (bandwidth). In this network, the farmer also acts as the miner of safecoins: the higher the rank of a farmer, the more he will earn freshly minted safecoins. Thus, similarly to Bitcoin, the network is bootstrapped by monetary incentives to provide value to the network. The total number of safecoins that will be created is capped at 4.3 billion.
Although mining plays an important role in Bitcoin, a major downside is that a) it requires expensive, specialized hardware, b) it continuously expends a lot of energy to compute hashes, and c) most importantly, it serves no other function than to validate Bitcoin transactions. On MaidSafe, anyone can join the network with existing computers and earn money for providing real value to the network by sharing their hard drives and bandwidth.
File storage is but a single use for the network. Most of what can be done with the current internet can be done with MaidSafe. Facebook, Twitter, and LinkedIn-type social websites can be built upon MaidSafe, as can real-time communication (chat/email), e-commerce stores (eBay/Amazon), media/streaming (YouTube/Twitch), news (CNN), mobile/desktop apps, and anything else you can think of. The MaidSafe network inherently provides the decentralized database, authentication system (logging in/out of apps), and security system (automatically encrypting data at rest and in transit).
Another major innovation is the incentive to create useful applications on the network. In addition to earning safecoins for proving storage and bandwidth, application developers will earn safecoin simply for creating applications used by others. In other words, it will now be possible to earn money for creating useful open-source applications. The more an application is utilized on the network, the more the developer is rewarded safecoin. Additionally, there are no ongoing hosting costs that the developer needs to worry about. All of that is already taken care of by the network, making it much more compelling to innovate on MaidSafe than the conventional internet. This opens the gates to an entirely new business model that rewards creators who love what they do.
MaidSafe Advantages
How do we benefit from all of this?
Privacy. Everything is automatically encrypted end-to-end. Developers are not burdened with encryption overhead inside their applications. It is already taken care of for them. The network is resistant to IP address identification.
Reliability. Due to the redundancy built into the network, the chances of your data being erased is near zero. This is currently not the case where your data is stored on centralized computers that are prone to malfunction or become corrupt/erased due to human intervention and error.
Easy authentication to services. It will no longer be necessary to sign up separately for every website you use; your authentication on the MaidSafe network itself will work for most services.
New business model that rewards application developers.
Sony-type hacks2 are not possible.
Your data is yours. The Facebook-like website you use on MaidSafe cannot use your private information to track you or sell your information to others.
You will not experience any down-time for popular websites and files, such as the famous "reddit hug of death" (no possibility of overloading centralized servers).
Due to the decentralized nature of the network, all apps are inherently censorship-free. Because there are no centralized servers and all information is encrypted before it even touches the network, it is impossible for an oppressive government to shut it down, let alone pinpoint the location of any data on the network. A major step-up for freedom of information and speech in the 21st century.
How does MaidSafe work?
Who owns what data? How it is possible to have instant-confirmation, zero-fee safecoin transactions? How is double-spending prevented without a blockchain? What if a file undergoes a DDoS attack? Let's dive in and look into how the network works in order to answer these questions.
When you join the MaidSafe network, a public key-pair2 is created for you. Each node you create will have its own public key-pair based off this one (so that they all belong to "you"), and your master key-pair can invalidate any of them at any time. For the average user, your personal computers and devices will be your only nodes. As soon as one of your node comes online, it will automatically be assigned a completely random ID in addition to the key-pair. The pool of available IDs is astronomically large: (2512 - 1) to choose from! That's more than all the atoms in the universe combined! Your personal identity is not tied to this ID in any way, thus you remain anonymous.
When you log into the network, a virtual hard drive will be mounted on your computer. All the files that you add to the network will display here. Although it will look like all your data is there, in reality it is all broken up, encrypted, and dispersed throughout the network, ready for you to call it up when you need it. All data will be shared at the file system level, which means there will be no need for HTTP, FTP, SMTP, etc.
You will have the option to specify how much of the network's data you're willing to store on your hard drive, thus becoming a farmer and turning your node into a vault. You will earn safecoin in two ways: 1) by responding to PUT (store) and GET (retrieve) requests on your hard drive, and 2) by creating mining requests to the network. Farmers will generate safecoins according to a set algorithm, whose speed will fluctuate according to the demands of the network. This algorithm is set to decrease the rate of mining as time goes by, eventually stopping at the 4.3 billion safecoins mark. As time goes by, your vault will be ranked higher by the network according to your uptime, cpu, diskspace, and bandwidth (speed). The higher your rank, the more safecoins you will be paid. If you provide less resources to the network than you consume for your own data, you will need to pay for the excess using purchased safecoins. There are discussions under way deciding whether or not to give away some free space for new accounts.
When your node is ranked highly enough, it is considered to be validated as a trustworthy node. At this time, your vault can take on one or more other personas: a client manager, data manager, vault manager, or transaction manager. All personas manage each other in the network. Let's go over exactly what happens when you upload a file to the network:
The MaidSafe software on your computer will split up your file into chunks no larger than 1MB in size. Each one is hashed and encrypted. To further obfuscate each chunk, every chunk is passed through an xor function1 using the hashes of other chunks. Each chunk is then broken into 32 pieces in a smart way that requires any 28 pieces to recompile the chunk. Key -> value pairs are added to a table on your computer that serves as a data map, i.e. that described the locations of each chunk on the network. The key is the original hash of a chunk, and the value is the xor'ed value, which I'll refer to as the chunk's ID (which also acts as its location, as you'll see). At this point, the files cannot be accessed by anyone except the holder of the private key (you). All of this happened before it even left your computer.
All of your pieces are then passed to your 32 client manager nodes. These are 32 machines that are the closest to your node ID in xor distance. In layman's terms, if your node has an ID of, say, 100, the existing nodes closest to you may be nodes with IDs of 96, 98, 99, 101, 103, etc. (Note that when we talk about distance here, we mean it in the mathematical, not geographical, sense. Nodes 100 and 101 may actually be on opposite sides of the globe.) Their job is to take your chunks and send them out to the network.
A minimum of 28 out of 32 of these client manager nodes will then pass their chunk pieces to groups of 32 data managers whose IDs most closely match the chunk IDs (which is why the chunk ID also acts as its location, as mentioned above), using xor networking2 (described later). In this way the transfer of info can withstand small loss (up to 4 pieces) without retransmitting the whole data again (this is used in many places). This process is called the scatter <-> gather approach and uses Rabin's Information Dispersal Algorithm5. The data managers' job is to distribute the chunk they received to nodes on the network (with vault ranks always being under consideration), and to continuously make sure there are always at least 4 copies available. At this point, each broken up chunk is now in its own data manager group. The data managers recompile the pieces into whole chunks at this point.
The data managers will choose four vaults to send their chunk to, but not before getting a 28-of-32 consensus from group. Instead of communicating directly with the vaults, however, the data managers will communicate and send the chunk to each vaults' group of 32 vault managers that are responsible for the chosen vault (again, they are the nodes closest to the vault in xor distance). All 32 data managers store the IDs of the four vaults holding their chunks. Only they know the locations of the chunks; not even you!
The 32 vault managers' jobs are to send the chunk to the vault for storage and continuously communicate with it to make sure it's online, and that the file has not become corrupt. They do this by asking for the hashes of random chunks, which are created using the chunk's hash + random string. The correct value can only be returned if the correct version of the file exists. As soon as the vault managers detect that the node or a chunk has gone offline, they immediately inform the chunk's data managers, who will proceed to duplicate one of the other copies to another vault.
The vault receives the chunk and gets paid in safecoin for every GET request on the chunk. There are now 4 copies of each chunk distributed throughout the network.
All of the above happens seamlessly in the background. Retrieving the uploaded file will follow the same kind of route. To the average user, it'll look like a file is uploaded or retrieved in a matter of seconds, or less.
Quick summary: 1. Your machine (client) breaks a file into chunks (let's say 3 chunks) which are encrypted and broken up. 2. All pieces are passed to client managers (nodes closest to you). 3. They send the appropriate chunk pieces to their respective data managers (each chunks will each be sent to its own group of data managers.) 4. Each data manager group compiles pieces and chooses four vaults to store their chunk on. 5. They send the chunk to the chosen vault's vault managers, who then forward the chunks for storage on the vaults. 6. From now on, the vault managers will be keeping an eye on the vault and the chunk. If it disappears, they tell the data managers to make another copy.
As you've probably noticed, there is a pattern here: all communications are done through groups of 32 nodes. This prevents a rogue node from creating problems on its own. This is the foundation of security on the MaidSafe network. It is impossible to choose the ID of your own nodes, or to decide which data you store on them, as that is all decided with the help of the network. Every time a node disconnects from the network and reconnects, it is assigned a totally new, random ID. Again, a) it takes a 28-of-32 node consensus to do anything with data, and b) it's impossible to decide which roles and IDs your nodes take on. It is for this reason that you'd need to control 88% of the network in order to reliably attack it (compared with Bitcoin's 51% attack). The larger the network, the stronger it becomes.
By now, you may already have an understanding of how safecoin works without a blockchain. Instead of everyone having a copy of every transaction like in Bitcoin, every address will be handled by a group of 32 transaction managers. The only difference with transaction managers is that there is an additional layer of security: there is a 7-group chain; the first group of transaction managers must get permission from another group of 32 nodes, and so on. This means that, to get a balance of an address or send money, a limited amount of nodes needs to be involved for each step of the process. This methodology is extremely scalable, as scalable as the network itself. While bitcoin is currently limited to 7 transactions per second, safecoin is only limited by the number of nodes in the network, and can easily scale to the thousands, and eventually, hundreds of thousands per second.
Further enhancements are implemented in MaidSafe which make the network faster and increase overall security. Some include:
Network caching. Intermediate nodes continuously retrieving the same chunk (due to popularity) will cache the chunk themselves, bringing it closer to requesting nodes.
Flood prevention. A node that sends too much information to overwhelm other nodes gets disconnected by the nodes closest to them. This helps prevent DDoS attacks.
Churn is an advantage. Nodes throughout the network are constantly going offline and coming online. This increases security as the IDs of nodes throughout the network are constantly changing.
Protocol rule enforcement. If any node, no matter the persona, breaks the rules by trying to do something that's not allowed, it is immediately de-ranked or disconnected.
Holes fill quickly. When any node becomes overwhelmed and unreachable, it is immediately replaced with another one. This also mitigates DDoS attacks. As you can see, the network is very capable of self-healing!
It may seem that sending and receiving files over the network would be a time consuming process with so much going on, but routing over the network is quite efficient using a Kademlia3-like distributed hash table (see below). With millions of nodes, it seems impossible to find a node closest to a certain ID. However, the amount of hops required to find a node closest to a particular address is O(log n), where n is the total number of nodes in the network. Put simply, in the absolute worst case scenario it will take 23 hops to locate a node with a particular ID in a network of 10 million nodes! Once located, they can communicate directly with each other.
A brief explanation on how a node can find any other node in the network quickly, using Kademlia: every node has its own list of nodes at varying, increasing degrees of distance from it. For example, node #1 will have the information for nodes #2, #4, #8, #16, and #32. If he's looking for node #19, he'll contact the closest node in his list, #16, and ask him if he knows about #19. Node #16 may have #17 and #20 in his list (since he's closer to them), so although he doesn't have information on #19, he'll ask #20 for the info. #20 will have #19's information (due to being as close as possible to the node). Thus, in only a few hops #19's information is returned to #1.
More Resources
MaidSafe.org
http://maidSafe.org - Official MaidSafe discussion forums. Feel free to join in and ask questions!
MaidSafe.net
http://maidSafe.net - Official MaidSafe website.
SystemDocs
http://maidsafe.net/SystemDocs/ - A book that explains the details and benefits of using the SAFE Network.
Conclusion
I hope this article gave you a good understanding of what MaidSafe is and how it functions, and at least one precious "aha" moment. Although we covered a lot, there is still a lot to learn before the network goes live in early 2015. Just for fun, here are a couple bullet points to think about for the (distant?) future:
When the network becomes large, it can be made to function similarly to AWS: you'll be able to buy a virtually unlimited supply of on-demand servers.
Connections currently rely on IPv4/6, but eventually, as mesh networks1 become larger and more usable, nodes will be able to communicate solely using MaidSafe IDs.
A big thank you to Nick Lambert and David Irvine from MaidSafe for proofreading this post and explaining many of the details, and a big thanks to the community for the ongoing support!
This post was taken from my blog at
http://blanshey.com12. I will try to keep both the blog and this post updated as time goes on.
Credits: eblanshey from Maidsafe.org