Bitcoin Forum
November 08, 2024, 09:24:36 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 »  All
  Print  
Author Topic: Proof of Storage to make distributed resource consumption costly.  (Read 24042 times)
gmaxwell (OP)
Moderator
Legendary
*
expert
Offline Offline

Activity: 4270
Merit: 8805



View Profile WWW
October 13, 2013, 11:28:40 PM
Last edit: October 24, 2013, 04:43:03 AM by gmaxwell
 #1

Motivation:
Consider a mutually distrusting decentralized network where each node has some finite capacity, like Bitcoin's P2P network, with 100,000 nodes.

An attacker wants to soak up all the connection capacity of the all the nodes in order to funnel new peers to malicious nodes the attacker controls.  Each node can perform actions to make resource usage costly. For example, each node could reject additional connections from single IPs or subnets, requiring the attacker to have access to more address space (a scarce resource) in order to use up that nodes' capacity. Or they could prioritize peers that present a costly Bitcoin bond (e.g. a Fidelity bond/SIN).

The problem there is that the attackers effectiveness scales with the number of victims— e.g. each IP or bond or whatever lets them use 100k sockets—, since the nodes are mutually distrusting and thus can't collaborate to reject an attacker who makes one connection per IP to every single node in the network.

The network could respond by requiring a scarce resource to connect— e.g. a POW or a Bitcoin fee. But an on-connect fee doesn't really map well to a ongoing resource utilization.  You could require a periodic fee (e.g. intermittent POW), but setting the rates so that it doesn't favor attackers (e.g. an attacker might not mind spending a half bitcoin to attack the whole network, but regular users might not tolerate a 0.0001/per connect cost).

Also, most POW schemes are subject to huge speedups by more efficient implementations. E.g. a FPGA or even GPU implementation will be enormously faster than a java one used in most clients.  A memory hard function may narrow the gap, but common memory hard functions are just as memory hard to validate as they are to produce (though it's possible to do otherwise), and increasing the server's resource utilization is not a great way to address a resource exhaustion attack.

What might be useful is a POW function which maps well to the kind of resource usage that holding a connection represents and is also less vulnerable to speedup for specialized attackers.

One idea that has been suggested for this would be to have a storage hard function: Some function that forces you to store a bunch of data and continue to prove you're still storing it while the connection is up.  Making one that is storage hard for the client but not the server seemed pretty tricky and eluded a number of people for some time. I had a prior idea, which required fully homorphic encryption to create a sequential function with a trap-door property to allow random access... which might have worked but it was so complicated that no one would ever bother with it. Fortunately I've since had a better idea:

A proof of storage system:

Take some fast cryptographically strong pseudorandom function, e.g. like Blake2 or another hash function H().

The server gives the client a random seed and a target size. The client repeatedly applies H() in a tree, e.g.  {Left, Right} = H(seed) and so on, to build up the target size worth of data.  The client takes the final leaves of this tree-structured pseudo-random function and appends their index and sorts the data (or, equivalently, stores it in a binary search tree).

The server then can periodically pick a random index and perform log2(size) hash operations to determine the value at that index and then can challenge the client to provide the corresponding index for that value.

The tree structuring of the function means that you can cheaply lookup the value for a given index,  but looking up the index for a given value requires computing the whole thing (and storing it, if you want to efficiently perform multiple lookups).

(There are some possible optimizations in the client, like common prefix compression, but they're only small optimizations because the function is strongly pseudorandom)

A gigabyte table wouldn't be too insanely burdensome for many kinds of clients, but would require a terabyte for an attacker to use resources on 1000 nodes. Such a scheme could be offered optionally to clients (e.g. if you perform a storage proof you get a preferred connection slot) and so it might provide some value in making some attacks ineffective.
justusranvier
Legendary
*
Offline Offline

Activity: 1400
Merit: 1013



View Profile
October 13, 2013, 11:38:32 PM
 #2

So each node requires a gigabyte for their own storage table, plus one gigabyte for each peer they connect to?

Would a node only require a storage proof for new nodes and drop the requirement after they have established a certain amount of good behaviour?
gmaxwell (OP)
Moderator
Legendary
*
expert
Offline Offline

Activity: 4270
Merit: 8805



View Profile WWW
October 14, 2013, 12:07:04 AM
 #3

So each node requires a gigabyte for their own storage table, plus one gigabyte for each peer they connect to?
One gigabyte (assuming that security level, it was a random number) per peer that they are using a storage proof to get priority access to. So they don't necessarily need to have one per outbound peer.  The idea would be the your peers would prioritize inbound connections that were providing a storage proof, everyone else's connectivity would depend on other criteria.

There is no "their own" storage requirement the whole basis of this idea is to be memorless on the server.

Node could pick their favorite couple peers to use storage proofs on, and if they get bumped off other peers during a dos attack, it's not the end of the world.

Quote
Would a node only require a storage proof for new nodes and drop the requirement after they have established a certain amount of good behaviour?
The problem there is that an attacker could just slowly build up proof free connections to the whole network and still saturate things.

Long term I expect our behavior with inbound connections to look something like:  When a new connection comes in and we're full, randomly drop a peer (including, possibly, the new one) based on a priority scheme.  The priority scheme could reserve a number of slots for each of several different kinds of priority.  For example, some slots should be reserved for the longest connected nodes— since having a long uptime connection is a kind of scarce resource and some attackers come in bursts.  Other slots should be reserved for peers which have the lowest minimum round trip time (because being close to you on the network can't be faked), come from the same subnet as you, come from many distinct subnets, are often the first to relay you new blocks, have an IP that when hashed with a secret is a low value, etc.  You can list out a bunch of different groups to protect... then randomly (weighed by goodness or whatever) kick nodes that don't make the cut.

One problem with that kind of litany of criteria is that you'll still end up with clients that just don't meet any of the protected classes— they won't appear to be highly useful good nodes, because they're not highly useful (esp SPV clients). They may not be near any of their peers on the network, they may be freshly started and so have low uptime.  So it would be good to have a way for nodes to opt into priority without creating an easy avenue for attackers.... thus ideas like a proof of storage.
Sergio_Demian_Lerner
Hero Member
*****
expert
Offline Offline

Activity: 555
Merit: 654


View Profile WWW
October 15, 2013, 01:36:36 PM
 #4

Gmaxwell: Your idea is excellent, but needs some changes to prevent an obvious query forwarding attack.

Suppose client C wants to prove the server S that he keeps the data. When client C connects to server S, it also connects to some client D, acting as a server. Then server S sends the seed to client C, client C forwards the same seed to client D. Also each query is forwarded.
S cannot detect C is cheating, because he is in fact checking D storage. Then the seed must be chosen by a fair coin-flip or similar protocol.

New proposed Setup Protocol

C chooses a seed c and sends it to S.
S chooses a seed s and sends it to C.
Both agree that seed = H (c | s)

Also it's important to note that C may not store all the tree, but only a precomputed top. To effectively detect if C is in fact storing almost all the tree or not, S would need to make many time measurements of C responses, averaging the elapsed time.

d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
October 15, 2013, 10:15:21 PM
 #5

Also it's important to note that C may not store all the tree, but only a precomputed top. To effectively detect if C is in fact storing almost all the tree or not, S would need to make many time measurements of C responses, averaging the elapsed time.
IIUC, for an n level tree, you'd save yourself from storing

(1 - 2-n ) / (2 - 2-n)   ~  1/2

the nodes in the tree.  So could you could just assume everybody is cheating and require an extra level?
gmaxwell (OP)
Moderator
Legendary
*
expert
Offline Offline

Activity: 4270
Merit: 8805



View Profile WWW
October 15, 2013, 10:28:24 PM
 #6

Also it's important to note that C may not store all the tree, but only a precomputed top. To effectively detect if C is in fact storing almost all the tree or not, S would need to make many time measurements of C responses, averaging the elapsed time.
IIUC, for an n level tree, you'd save yourself from storing

(1 - 2-n ) / (2 - 2-n)   ~  1/2

the nodes in the tree.  So could you could just assume everybody is cheating and require an extra level?

Yea this was what I was referring to with "There are some possible optimizations in the client, like common prefix compression, but they're only small optimizations because the function is strongly pseudorandom".

The timing comment leaves me feeling a bit confused.  I don't hope to prevent outsourcing: if someone is storing the data then that creates the desired scarcity.   If the client instead wants to store no data but perform a billion computations (or what have you) per query, then I think thats okay:  The queries are super cheap  for the server and honest clients so they can be performed frequently. I think the parameters can be chosen so that the storage-less solution of recomputing most of the answer space for every query is pretty costly (which would also achieve the goal of limiting abuse).
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1160


View Profile
October 15, 2013, 11:03:39 PM
 #7

The timing comment leaves me feeling a bit confused.  I don't hope to prevent outsourcing: if someone is storing the data then that creates the desired scarcity.   If the client instead wants to store no data but perform a billion computations (or what have you) per query, then I think thats okay:  The queries are super cheap  for the server and honest clients so they can be performed frequently. I think the parameters can be chosen so that the storage-less solution of recomputing most of the answer space for every query is pretty costly (which would also achieve the goal of limiting abuse).

I think Sergio means the client could outsource the computation to another client.

gmaxwell (OP)
Moderator
Legendary
*
expert
Offline Offline

Activity: 4270
Merit: 8805



View Profile WWW
October 15, 2013, 11:15:13 PM
 #8

I think Sergio means the client could outsource the computation to another client.
Okay, I thought perhaps. But Sergio's protocol improvement prevents unwitting outsourcing.  Voluntarily outsourcing is okay I think, as a scarce resource is still being consumed by someone involved in the activity.
Gavin Andresen
Legendary
*
qt
Offline Offline

Activity: 1652
Merit: 2301


Chief Scientist


View Profile WWW
October 16, 2013, 03:32:51 AM
 #9

Great idea!

If implemented, it would probably make sense to create a little protocol for the server and client to negotiate the amount of storage needed to "rent" a connection slot. Maybe the server reports something like "I've got 0 slots free, min/median/max proof-of-storage for my other connected nodes is 0/1MB/11MB."

How often do you get the chance to work on a potentially world-changing project?
Sergio_Demian_Lerner
Hero Member
*****
expert
Offline Offline

Activity: 555
Merit: 654


View Profile WWW
October 16, 2013, 12:49:16 PM
 #10

side-topic: Also a node could prioritize nodes which store the full block-chain. This can be done with a variety of protocols known as  Proof of Storage and Proof of Retreivability.
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1100


View Profile
October 17, 2013, 08:17:32 AM
 #11

Related:

Similar to each bitcoin P2P peer's nonce value, which serves as a node's unique id, I've occasionally desired for each side of the P2P protocol to offer a random seed in "version" and "verack."

(a DH exchange is probably too much to wish for, in the context of bitcoin P2P, heh heh)


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
October 17, 2013, 08:27:58 AM
 #12

Note I invented Proof-of-Harddrive in my Bitcoin : The Digital Kill Switch thread several months prior. Thanks for finding a useful application of it, even if you may prefer DRAM. I had to abandon it as a substitute for PoW because it doesn't provide the entropy for selecting which peer is next.

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
Sergio_Demian_Lerner
Hero Member
*****
expert
Offline Offline

Activity: 555
Merit: 654


View Profile WWW
October 17, 2013, 12:53:53 PM
 #13

Note I invented Proof-of-Harddrive in my Bitcoin : The Digital Kill Switch thread several months prior. Thanks for finding a useful application of it, even if you may prefer DRAM. I had to abandon it as a substitute for PoW because it doesn't provide the entropy for selecting which peer is next.

Could you provide a link to the exact post where you elaborate the idea of Proof-of-Harddrive ?
ShadowOfHarbringer
Legendary
*
Offline Offline

Activity: 1470
Merit: 1006


Bringing Legendary Har® to you since 1952


View Profile
October 19, 2013, 11:23:30 PM
 #14

A proof of storage system:

Take some fast cryptographically strong pseudorandom function, e.g. like Blake2 or another hash function H().

The server gives the client a random seed and a target size. The client repeatedly applies H() in a tree, e.g.  {Left, Right} = H(seed) and so on, to build up the target size worth of data.  The client takes the final leaves of the tree and appends their index and sorts the data (or, equivalently, stores it in a binary search tree).

The server then can periodically pick a random index and perform log2(size) hash operations to determine the value at that index and then can challenge the client to provide the corresponding index for that value.

The tree structuring of the function means that you can cheaply lookup the value for a given index,  but looking up the index for a given value requires computing the whole thing (and storing it, if you want to efficiently perform multiple lookups).

I have just found time to read this post thoughtfully, and...
Mother of Hash... Another genius invention.

I mean how brilliant can the Bitcoin community be ? I am now convinced that most (if not all) of current humanity's problems can be solved by genius mathematical algorithms.

PS.
I wonder if this couldn't somehow be used in file sharing by the way ?

d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
October 20, 2013, 01:28:29 AM
 #15

I have just found time to read this post thoughtfully, and...
Mother of Hash... Another genius invention.

I mean how brilliant can the Bitcoin community be ?
Hey, give credit where it's due Smiley  This guy in particular oozes genius ideas.
Sukrim
Legendary
*
Offline Offline

Activity: 2618
Merit: 1007


View Profile
October 20, 2013, 02:27:34 AM
 #16

PS.
I wonder if this couldn't somehow be used in file sharing by the way ?
Likely not, as you are probably more interested in having counterparties that also share files with you than ones that provably waste hard drive space.

I initially thought "Proof of Storage" would mean that A has a large file that B previously had, then A periodically prooves to B that she still has the full file available.

The current idea is rather along the lines of "Proof of wasted HDD-space" (until CPUs catch up and it becomes faster to just generate the tables on the fly then to look them up from disk) and is intended to be used for deciding which connections to drop based on how "invested" other nodes are in you. This liely needs to be a longer time relationship than the usual file sharing time frame to pay off.

https://www.coinlend.org <-- automated lending at various exchanges.
https://www.bitfinex.com <-- Trade BTC for other currencies and vice versa.
gmaxwell (OP)
Moderator
Legendary
*
expert
Offline Offline

Activity: 4270
Merit: 8805



View Profile WWW
October 20, 2013, 02:34:48 AM
 #17

I initially thought "Proof of Storage" would mean that A has a large file that B previously had, then A periodically prooves to B that she still has the full file available.
It's utterly trivial to do that. Just ask that they tell you parts of the file from time to time. You can even do this without remembering the file yourself if you just compute a hashtree over it, and remember the root, and they give you the hash fragments.

But that wouldn't be very useful as a decentralized anti-dos mechanism: one copy of the file would allow you to prove to as many concurrent peers as you want.  If, instead, the peers ask you to store some unique file for them to address that issue— they have to undertake the cost of actually sending you that file which is contrary to the goal of using this as anti-dos (and also you have the risk that Sergio addressed of someone doing that in order to invisibly delegate their proof work to someone else).

What this adds to that idea is the ability to send a few bytes that the peer can use to make a file of arbitrary size, and then they can convince you that they have actually done so... without you having to do non-trivial communication, computation, or storage. The price is, indeed, that the information stored isn't terribly useful. (You could perhaps replace the hash with some other kind of abstractly useful function "folding proteins" or something— but in general double-use for proof-of-whatever subsidizes the cost of attacking, since now your attack can be paid for by the secondary benefit of the barrier function, and other useful functions have some risk of surprising optimizations that strong cryptographic hashes lack).
super3
Legendary
*
Offline Offline

Activity: 1094
Merit: 1006


View Profile WWW
October 20, 2013, 02:47:50 AM
 #18

A proof of storage system:

Take some fast cryptographically strong pseudorandom function, e.g. like Blake2 or another hash function H().

The server gives the client a random seed and a target size. The client repeatedly applies H() in a tree, e.g.  {Left, Right} = H(seed) and so on, to build up the target size worth of data.  The client takes the final leaves of the tree and appends their index and sorts the data (or, equivalently, stores it in a binary search tree).

The server then can periodically pick a random index and perform log2(size) hash operations to determine the value at that index and then can challenge the client to provide the corresponding index for that value.

The tree structuring of the function means that you can cheaply lookup the value for a given index,  but looking up the index for a given value requires computing the whole thing (and storing it, if you want to efficiently perform multiple lookups).

I have just found time to read this post thoughtfully, and...
Mother of Hash... Another genius invention.

I mean how brilliant can the Bitcoin community be ? I am now convinced that most (if not all) of current humanity's problems can be solved by genius mathematical algorithms.

PS.
I wonder if this couldn't somehow be used in file sharing by the way ?
This could provide a basis for some sort of distributed file sharing, but you still need much more to make it work. 

Bitcoin Dev / Storj - Decentralized Cloud Storage. Winner of Texas Bitcoin Conference Hackathon 2014. / Peercoin Web Lead / Primecoin Web Lead / Armory Guide Author / "Am I the only one that trusts Dogecoin more than the Federal Reserve?"
Sukrim
Legendary
*
Offline Offline

Activity: 2618
Merit: 1007


View Profile
October 20, 2013, 12:37:23 PM
 #19

This could provide a basis for some sort of distributed file sharing, but you still need much more to make it work. 
How?

This is only useful to rank/score peers, the data generated is quite random... you might be able (by bruteforcing it) to generate some meaningful data from time to time (then this would be kind of a compression mechanism) but the content of the data definitely is NOT the focus here, rather the guaranteed waste of ressources, thus being "invested" into the relationship with you. This could help with bootstrapping issues, as a completely new peer has no data to offer and will have to leech initially. With Proof of Storage as described here, he can make sure that he will stay connected longer, as it otherwise would be quite a waste of ressources for him to leech + leave.

Doing a hash tree over a file would then mean that the construction of my tree needs to be randomized too, otherwise the counterparty would compute the same tree and throw away the file. I thought that you want a mechanism to make sure that a random client actually DOES have the full block chain stored... This proposal is something different and more along the lines of the NashX proposal, where both parties show something of high value before proceeding further. Before it was ressources like data on one end, nothing for a new peer on the other end. Now you can at least "invest" ressource exhaustion with this approch on the other end of the deal which shows that you are committed to this relationship.

I'm not too sure if that can beat Tit-for-Tat though, as after some time you probably want to transfer from Proof of Storage (exhaustion) to something like seed ratios or similar.

Also I'm not too sure that you can infer from the willingness to waste ressources the willingness to play fair.
Imagine this scheme in BitTorrent: Seeds accept new leechers only if they either have at least 10% of a file available or agree to waste 1 GB (random numbers). Now I as a new node can connect to a handful of seeds and load my first 10%. This does not mean that I'm willing at all to contribute anything back to the swarm in sense of upload. With tit-for-tat I'd get the whole file eventually even without uploading anything but I'll be severely choked by any peer out there quite fast. With ressource exhaustion, I just need to lie that I am a new fresh node and will get anything. My cost is the disk space + energy used for the calculations, BUT there is no gain on the seeder's end and also not for the network! The only way to detract me from doing these things would be to make it more expensive to run this attack than to participate. Given current electricity prices and HDD prices, there will be a lot of controversy about the "correct" size + computational requirements (you could build a tree where you have a much more "difficult" hash).

https://www.coinlend.org <-- automated lending at various exchanges.
https://www.bitfinex.com <-- Trade BTC for other currencies and vice versa.
super3
Legendary
*
Offline Offline

Activity: 1094
Merit: 1006


View Profile WWW
October 20, 2013, 04:53:22 PM
 #20

This could provide a basis for some sort of distributed file sharing, but you still need much more to make it work. 
How?

This is only useful to rank/score peers, the data generated is quite random... you might be able (by bruteforcing it) to generate some meaningful data from time to time (then this would be kind of a compression mechanism) but the content of the data definitely is NOT the focus here, rather the guaranteed waste of ressources, thus being "invested" into the relationship with you. This could help with bootstrapping issues, as a completely new peer has no data to offer and will have to leech initially. With Proof of Storage as described here, he can make sure that he will stay connected longer, as it otherwise would be quite a waste of ressources for him to leech + leave.

Doing a hash tree over a file would then mean that the construction of my tree needs to be randomized too, otherwise the counterparty would compute the same tree and throw away the file. I thought that you want a mechanism to make sure that a random client actually DOES have the full block chain stored... This proposal is something different and more along the lines of the NashX proposal, where both parties show something of high value before proceeding further. Before it was ressources like data on one end, nothing for a new peer on the other end. Now you can at least "invest" ressource exhaustion with this approch on the other end of the deal which shows that you are committed to this relationship.

I'm not too sure if that can beat Tit-for-Tat though, as after some time you probably want to transfer from Proof of Storage (exhaustion) to something like seed ratios or similar.

Also I'm not too sure that you can infer from the willingness to waste ressources the willingness to play fair.
Imagine this scheme in BitTorrent: Seeds accept new leechers only if they either have at least 10% of a file available or agree to waste 1 GB (random numbers). Now I as a new node can connect to a handful of seeds and load my first 10%. This does not mean that I'm willing at all to contribute anything back to the swarm in sense of upload. With tit-for-tat I'd get the whole file eventually even without uploading anything but I'll be severely choked by any peer out there quite fast. With ressource exhaustion, I just need to lie that I am a new fresh node and will get anything. My cost is the disk space + energy used for the calculations, BUT there is no gain on the seeder's end and also not for the network! The only way to detract me from doing these things would be to make it more expensive to run this attack than to participate. Given current electricity prices and HDD prices, there will be a lot of controversy about the "correct" size + computational requirements (you could build a tree where you have a much more "difficult" hash).
Could the seed of the tree perhaps be a file or the hash of the file that you want to verify is stored?

Indeed "correct" size will be very difficult to find. My desktop computer has 3x the hard drive space of my laptop. A large size might be negligible for my desktop, but costly for my laptop. Not to mention those portables with small SSD drives. Unlike GPUs and ASICs, cloud storage could be rented on demand for very cheap. As long as its not a sustained attack you could easily get significant space for pennies.

Bitcoin Dev / Storj - Decentralized Cloud Storage. Winner of Texas Bitcoin Conference Hackathon 2014. / Peercoin Web Lead / Primecoin Web Lead / Armory Guide Author / "Am I the only one that trusts Dogecoin more than the Federal Reserve?"
Pages: [1] 2 3 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!