Bitcoin Forum
June 01, 2024, 04:53:02 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Using k-SUM to Create an ASIC/FPGA Resistant PoW  (Read 1401 times)
optimiz3 (OP)
Newbie
*
Offline Offline

Activity: 37
Merit: 0



View Profile
June 20, 2014, 10:43:54 PM
 #1

Following up on this thread, I went back and looked at Tromp's Cuckoo Cycle, dga's response, and Momentum.  A commenter suggested investigating k-SUM (also known as the Subset sum problem) as the basis for a PoW.

So here's what I've come up with:

1. Let d be the current difficulty
2. Let k be the number of hashes to SUM


    (k-SUM(hash(block, nonce)) % d) == 0


This function has the following properties:

1a. In 2-SUM, memory increases linearly as d increases (since d/2 hashes are stored on average until you find (hash_stored + hash_current) % d == 0).
1b. In 3-SUM, memory increases quadratically d increases (there are solutions that claim subquadratic execution reduction but none that claim subquadratic memory reduction, which is what we care about for ASIC/FPGA resistance).
1c. Generalized, we end up with a d^k memory requirement.

2. A verifier only needs constant memory, since they only have to check the k submitted nonce values.
3. It is easy to understand and implement, and there is a massive body of work around k-SUM.
4. It fits easily into existing infrastructure as the hash function could very well be SHA256d.
4a. Bitcoin's PoW can be generalized in terms of k-SUM:
  - SHA256d(block, nonce) < Target is essentially equivalent to 1SUM(SHA256d(block, nonce)) % d, as d ~= Target/Max_Target, with a starting d value of 2^32.
5. Simpler than anything else I've seen proposed so far.


Setting aside whether people think an ASIC/FPGA resistant PoW is a good thing, I'd like to put this one out there for folks to pick apart.


Thoughts?
tromp
Legendary
*
Offline Offline

Activity: 983
Merit: 1091


View Profile
June 20, 2014, 11:15:18 PM
Last edit: June 20, 2014, 11:33:55 PM by tromp
 #2


    (k-SUM(hash(block, nonce)) % d) == 0

1a. In 2-SUM, memory increases linearly as d increases (since d/2 hashes are stored on average until you find (hash_stored + hash_current) % d == 0).

Interesting idea...

But your analysis seems wrong.

With O(sqrt(d)) hashes you can form O(d) different 2-SUMs, and expect a solution,
just like with the birthday paradox.

For 3-SUM, O(d^{1/3}) hashes using O(d^{2/3}) memory should suffice.
optimiz3 (OP)
Newbie
*
Offline Offline

Activity: 37
Merit: 0



View Profile
June 20, 2014, 11:18:41 PM
 #3

Tromp - appreciate the comment, was hoping you'd reply Smiley

Yes - it's quite likely there are improvements over naive mining algos, very curious to see what the best in the community can come up with to break/improve upon it.
tromp
Legendary
*
Offline Offline

Activity: 983
Merit: 1091


View Profile
June 20, 2014, 11:44:01 PM
 #4

2. A verifier only needs constant memory, since they only have to check the k submitted nonce values.
5. Simpler than anything else I've seen proposed so far.

Cuckoo Cycle verification checks that k nonces form a cycle on k hash values;
I'd say that's comparably simple (with no need for modular arithmetic).
tromp
Legendary
*
Offline Offline

Activity: 983
Merit: 1091


View Profile
June 21, 2014, 01:44:17 AM
 #5

3 more observations:

To make the PoW progress-free, one must limit the number of nonces that can be used,
e.g to d^{1/k}, and then put a further constraint H(k nonces) <= target
on any solution. So d is more of a size parameter than a difficulty parameter.

For maximum memory-hardness, running time should be linear in memory use,
which rules out some values of k, such as 3.

The 2-sum PoW suffers from the same time-memory trade-offs that plague Momentum.
optimiz3 (OP)
Newbie
*
Offline Offline

Activity: 37
Merit: 0



View Profile
June 21, 2014, 03:53:09 PM
Last edit: June 21, 2014, 05:15:33 PM by optimiz3
 #6

When considering modulus by d, I did so because modulus by a fixed value has pretty well known optimizations.

Quote
With O(sqrt(d)) hashes you can form O(d) different 2-SUMs, and expect a solution,
just like with the birthday paradox.

For 3-SUM, O(d^{1/3}) hashes using O(d^{2/3}) memory should suffice.

I don't think this is an instance of the birthday paradox as we're not trying to find collisions (like in Momentum), rather we're trying to find a subset sum where P is large.

The number of unique subsets is d choose k.  To make things concrete let d = 16, sqrt(16) = 4, so 4 hashes picked.  Assume all are unique values and the sums don't collide. We can get all possible 2-sums of the 4 hashes via 4 choose 2 or 6.  

If O(sqrt(d)) gave us O(d) different 2-SUMs, then we'd expect for 4 hashes to get 16 unique 2-SUMS, but instead we have 6.

Am I off here?  How are you getting O(sqrt(d))?

To be clearer, the PoW I'm suggesting is:

SUM = 0;
for (i to k) {
  SUM += hash(block, nonce{i});
}
return (SUM % d) == 0;

Collisions make it less likely that values will be found, not more likely.


EDIT: Going in to this, I did read your paper and momentum first - the multi-nonce submission was inspired from your work.  Structurally one can think of kSUM as a form of graph generation; every unique number hashed adds a fully connected vertex, but the edges have weights; find a path of weight d and length k.
tromp
Legendary
*
Offline Offline

Activity: 983
Merit: 1091


View Profile
June 21, 2014, 05:16:39 PM
Last edit: June 21, 2014, 06:11:17 PM by tromp
 #7

Am I off here?  How are you getting O(sqrt(d))?

You seem somewhat unfamiliar with O() notation.
Loosely speaking, it ignores constant factors.
So for instance, 2*sqrt(d) = O(sqrt(d)). With that many nonces, you can form up to
  2*sqrt(d) * (2*sqrt(d)-1) / 2 = 2*d-sqrt(d)
unique 2-sums, making it pretty likely that one of them is 0 mod d.

Although not an exact instance of the birthday paradox,
it behaves similarly, in that you're looking for 2 nonces whose hash
is opposite (rather than identical) in value;
one hash is h mod d and the other is (-h) mod d.
Vertcoin
Full Member
***
Offline Offline

Activity: 168
Merit: 100


View Profile
June 21, 2014, 05:56:56 PM
 #8

An interesting topic, I will keep an eye on.

VTC Stealth Address : vJmt8sF4iySr2RnJdZJdqk7CbJMQzwPwQwUsQwKF27qPE7qv9gfhjYqD6VapALi6jv8j6VKUvXYEto6 xmtxoq9oUyBXbV9XsYdt6sA
Please contact us via contact[at]vertcoin.org only, do not PM.
bitwho
Legendary
*
Offline Offline

Activity: 1274
Merit: 1000



View Profile
June 21, 2014, 06:08:48 PM
 #9

very interested. please continue.

It would be nice to see scrypt get modified and remove the ASIC effect
optimiz3 (OP)
Newbie
*
Offline Offline

Activity: 37
Merit: 0



View Profile
June 21, 2014, 07:23:34 PM
Last edit: June 21, 2014, 07:40:58 PM by optimiz3
 #10

It wasn't obvious to me how you were getting your O estimate of O(sqrt(d)) - I get that it represents complexity as d approaches infinity.

Quote
So for instance, 2*sqrt(d) = O(d). With that many nonces, you can form up to
  2*sqrt(d) * (2*sqrt(d)-1) / 2 = 2*d-sqrt(d)
2-sums, making it pretty likely that one of them is 0 mod d.

If I rephrase this,

2*sqrt(d) * (2*sqrt(d)-1) / 2 == Choose(2 * sqrt(d), 2) == 2*d-sqrt(d)

O(2*d - sqrt(d)) == O(2*d - d^{1/2}) == O(d)

However concretely we also know that we have to sample d/2 unique hashes before we are guaranteed a 2SUM solution - also O(d).

So I *think* we're agreeing, but are taking different routes to the destination.  Were you referring to O(sqrt(d)) before applying Choose, or something else?  I agree that d/2 hashes on average was incorrect - it should have been d/2 hashes worst case - I'll need to do a bit more thorough complexity analysis.



Separately - I'd like to dig deeper into making this progress-free, as right now I think both a modified kSUM and Cuckoo Cycle suffer from the same thing:

Consider if one adds a final hash step construction to kSUM similar to Cuckoo Cycle:

HASH(concat(nonce{i})) <= Target

Such a construction really just constrains the domain to the nonce range.  It's not obvious how adding the final HASH < Target buys anything for either kSUM or Cuckoo Cycle and may be a vector for simplification.  I say this because one can arbitrarily pick nonces from potentially many partial solutions for a nonce range, and the likelihood of a partial solution gets greater as both kSUM and Cuckoo Cycle populate their sets/graphs.
tromp
Legendary
*
Offline Offline

Activity: 983
Merit: 1091


View Profile
June 21, 2014, 07:48:42 PM
 #11

So I *think* we're agreeing, but are taking different routes to the destination.  Were you referring to O(sqrt(d)) before applying Choose, or something else?

Yes, we agree it takes on average on the order of sqrt(d) samples to find a 2-sum solution.
(btw, the text you quote had some typos that I since fixed).

Consider if one adds a final hash step construction to kSUM similar to Cuckoo Cycle:

HASH(concat(nonce{i})) <= Target

As my paper explains this is needed for precise difficulty adjustment. If you need the PoW to
become, say, exactly 3.8 times more difficult you can just divide Target by that much.

If instead you can only control PoW difficulty by restricting the nonce-range, then it
may not be clear by how much to restrict it in order to reduce the chance of finding
a solution by exactly 3.8
optimiz3 (OP)
Newbie
*
Offline Offline

Activity: 37
Merit: 0



View Profile
June 21, 2014, 08:12:46 PM
 #12

Yes, while thinking about the design of memory-hard functions I realized there is a hard tradeoff here:

1. Progress-Free (scrypt; i.e. sequential memory hard)
2. Fast-To-Verify (cuckoo, ksum, momentum)

The Momentum paper somewhat identifies the mechanism in #2 indirectly by calling out the necessity of independent intermediate solutions as being required for fast verification.  The very presence of independent intermediate solutions implies "progress" since they are remembered in order to construct a result solution.

Based on this, an ideal Fast-To-Verify memory-hard PoW would forces a 1:1 mapping between intermediate results and input domain values.  An optimal one would only do this.

Does this sound like a correct design goal?




tromp
Legendary
*
Offline Offline

Activity: 983
Merit: 1091


View Profile
June 21, 2014, 08:28:10 PM
 #13

Yes, while thinking about the design of memory-hard functions I realized there is a hard tradeoff here:
1. Progress-Free (scrypt; i.e. sequential memory hard)

There is no trade-off. Cuckoo Cycle is progress free.
Each attempt only has 2.5% chance of finding a 42-cycle.
When an attempt fails, either by not finding a 42-cycle,
or by not satisfying the additional hash target constraint,
you're back at square one.

If you like you can view each cycle finding attempt as
an scrypt computation that has a 97.5% chance of aborting.
optimiz3 (OP)
Newbie
*
Offline Offline

Activity: 37
Merit: 0



View Profile
June 21, 2014, 08:40:45 PM
 #14

How are you defining an attempt?  If it is a single nonce range scan then we agree.  I was referring to progress as state within a nonce range scan.
tromp
Legendary
*
Offline Offline

Activity: 983
Merit: 1091


View Profile
June 21, 2014, 09:00:04 PM
 #15

How are you defining an attempt?  If it is a single nonce range scan then we agree.

An attempt is looking for a cycle in a single graph defined by a specific header.
There is some confusion here as the word nonce is used in two senses.
There is a "macro"-nonce that's used to add entropy into the header,
and for some PoWs like Momentum and Cuckoo Cycle there's also micro-nonces
used in building a proof.
An attempt corresponds to a macro-nonce.

I was referring to progress as state within a nonce range scan.

That's not what progress-free means in the context of PoWs.

Progress free means that you need to make many (hundreds or more)
attempts of finding a proof, and that you gain nothing from each failing attempt.

It implies that having 10 slow computers is about as good
as having a single one that's 10 times faster.
optimiz3 (OP)
Newbie
*
Offline Offline

Activity: 37
Merit: 0



View Profile
June 21, 2014, 09:02:23 PM
 #16

OK - there isn't any fundamental disagreement then.
tromp
Legendary
*
Offline Offline

Activity: 983
Merit: 1091


View Profile
June 21, 2014, 11:19:49 PM
 #17

OK - there isn't any fundamental disagreement then.

Next step: choose values of k and d and write down the algorithm for solving such k-SUM.
Then we can look at runtime and memory usage and possible time-memory trade-offs
in detail...
optimiz3 (OP)
Newbie
*
Offline Offline

Activity: 37
Merit: 0



View Profile
June 22, 2014, 06:27:24 AM
Last edit: June 22, 2014, 06:41:24 AM by optimiz3
 #18

So the first thing I'd assert is any value of d should be a power of 2.  The reasoning being it avoids biasing any k-SUM when applying modulo d.  A convenient side benefit is the modulo can be done via bit mask.

3SUM is interesting because it's relatively prime with any power of 2 that could be d.  Also it simplifies discussion because we know the addition of the 3rd number "shifts" the target.  

I.e. for SUM2 we have

(k-SUM(hash(block, nonce)) % d) == 0  

and for SUM3 we roughly have

(k-SUM(hash(block, nonce)) % d) == -(3rd value % d)

However, the extra input in 3SUM doesn't really achieve much, and actually works against us complexity wise.  In normal 3SUM the domain is +-Infinity, but here it is GF(d), and by wrapping around at d we know once we pick a term, we have a 1/d chance the sum of whatever is chosen for all additional terms will "wrap us around" back to 0 and yield a solution.  Plus, we can then reuse any items found as part of the solution for any other item.  It's relatively intuitive to extend this reasoning from 3SUM to kSUM.

Net: k should == 2 like you hinted at earlier, at least in this construction, since > 2 doesn't buy us anything.

I also pondered an alternate construction to try to mitigate the wrap-around effect of GF(d):

(k-SUM(hash(block, nonce) % d)) == 0  

Here we apply modulus before k-SUM (assume the MSB of hash is a sign bit here since we don't have modulo wrap-around), but this has the adverse effect of making values closer to zero more valuable.  I.e. if k=3, d=8, and you draw a 7, you can form far fewer solutions than if you had drawn a 1 due a lack of an outer modulus.  So this alternate construction is out.


Incorporating your observation that 2SUM is roughly equivalent to Momentum complexity wise, I went back to the drawing board...


Taking the original 3SUM construction, I realized a significant source of weakness was the lack of ordering.  If we change the construction to something like:

SUM(
  hash(block, key0, nonce0),
  hash(block, key1, nonce1),
  hash(block, key2, nonce2)) % d == 0

(key0, key1, and key2 are unique/different)

Then it becomes much harder memory-wise.  Since we've established the average memory requirements for 2SUM are O(sqrt(d)), and we know 3SUM can be re-expressed as 2SUM(hash0, 2SUM(hash1, hash2)), 3SUM in this form should have complexity of O(sqrt(d)) * O(sqrt(d)) = O(d) average case, with a worst case of O(n^2).  Such a massive variance in potential memory use is interesting, but I would expect a miner to optimize for the 50% case at sqrt(d).  Assuming this is correct, it's pretty neat that we could tie the average memory usage linearly to d.


What do you think?  Does this construction seem correct/more robust?
tromp
Legendary
*
Offline Offline

Activity: 983
Merit: 1091


View Profile
June 22, 2014, 02:31:49 PM
 #19

Then it becomes much harder memory-wise.  Since we've established the average memory requirements for 2SUM are O(sqrt(d)), and we know 3SUM can be re-expressed as 2SUM(hash0, 2SUM(hash1, hash2)), 3SUM in this form should have complexity of O(sqrt(d)) * O(sqrt(d)) = O(d) average case, with a worst case of O(n^2).  Such a massive variance in potential memory use is interesting, but I would expect a miner to optimize for the 50% case at sqrt(d).  Assuming this is correct, it's pretty neat that we could tie the average memory usage linearly to d.

What do you think?  Does this construction seem correct/more robust?

Let n be the maximum number of nonces, h_1...h_n be the n hashes modulo d,
and assume d = O(n^3).
Then 3-SUM can be solved in O(n) memory and O(n^2) time, by storing all h_i
as keys in a set S, and running

for i = 1 to n
   for j =i+1 to n
      if S contains -(h_i + h_j) mod d
        then 3-SUM found

So your memory usage is only sqrt of your running time. This is why I wrote earler:

"For maximum memory-hardness, running time should be linear in memory use,
which rules out some values of k, such as 3."
optimiz3 (OP)
Newbie
*
Offline Offline

Activity: 37
Merit: 0



View Profile
June 22, 2014, 03:12:48 PM
 #20

Quote
Let n be the maximum number of nonces, h_1...h_n be the n hashes modulo d,
and assume d = O(n^3).
Then 3-SUM can be solved in O(n) memory and O(n^2) time, by storing all h_i
as keys in a set S, and running

for i = 1 to n
   for j =i+1 to n
      if S contains -(h_i + h_j) mod d
        then 3-SUM found

So your memory usage is only sqrt of your running time. This is why I wrote earler:

"For maximum memory-hardness, running time should be linear in memory use,
which rules out some values of k, such as 3."


Agree the upper bound on running time is O(n^2), but isn't the expected running time O(n)?  We know the time to lookup h_i and h_j is O(1), and expect to find a collision 50% of the time in O(sqrt(n)).  I'm using the assumption that d == n.











Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!