Latest posts of: optimiz3

Show Posts
Pages: [1] 2 »

Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle: a new memory-hard proof-of-work system

on: June 26, 2014, 10:15:37 PM

We explored this when looking at k-SUM, where it became clear there was an amortized O(sqrt(n)) compute cost and an O(n) space cost, rooted in the probability of a collision which has been proven via the birthday paradox. AFAIK the probability of there being a k-cycle in a bipartite undirected G(n,p) graph, where k > 3 has not been proven, so we don't have a strong basis to judge the bounds of an ideal solution for Cuckoo Cycle. This is were I would love to be proven wrong, as it would lay to rest the possibility of there being unknown backdoors in Cuckoo Cycle.

Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle: a new memory-hard proof-of-work system

on: June 26, 2014, 02:07:47 PM

Quote Quote from: optimiz3 on Today at 12:26:18 AM Quote What mathematical statement about Momentum is trivial to proof? It is an instance of the birthday problem, where complexity is understood. I don't think this is understood. Why don't you think the complexity of Momentum understood/proven? From your Cuckoo Cycle paper: Quote "For example, for L = 2 the problem reduces to finding a birthday collision as in the Momentum proof- of-work."

Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle: a new memory-hard proof-of-work system

on: June 26, 2014, 12:26:18 AM

What doesn't make sense? Quote What mathematical statement about Momentum is trivial to proof? It is an instance of the birthday problem, where complexity is understood. Quote What constitutes a vulnerability of a proof-of-work? The possibility implementations exist that exploit unaccounted for (lower and amortized) bounds in memory and/or execution time. Quote What is a proof of difficulty? A proof the amortized limits of memory and execution time cannot be less than the proposed implementation for a PoW. Quote And how do you rhyme the above with my Apr 30 observation that Momentum is a special case of Cuckoo Cycle where cycle length equals 2, a choice which leads it to be particularly memory-easy? Cycle lengths of 2 and 3 are special cases because they can be restated in terms of the birthday problem and 3-cliques respectively. They do not generalize to k-cycle (nor 42-cycle), which is the concern.

Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle: a new memory-hard proof-of-work system

on: June 25, 2014, 01:57:59 PM

Momentum is trivial to prove for any parameter selection because it is based on a well known problem. Cuckoo Cycle is based on a novel graph-theoretic approach which may or may not have vulnerabilities, but without a proof of difficulty for a particular set of parameters, there is no way to asses the vulnerability to future attacks. Optimality comes through applying engineering requirements to a defined set of provable algorithm characteristics. We don't have that (yet) for Cuckoo Cycle. EDIT #2: Grammar, also some searching turned up this recent (2009) paper "Deviation inequality for the number of k-cycles in a random graph" by Wang and Gao. Perhaps it may be of use?

Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle: a new memory-hard proof-of-work system

on: June 24, 2014, 07:43:15 PM

Quote You can find the motivation for choice of cycle length in the whitepaper; I devoted a section to it. While the Cuckoo Cycle paper does devote a section to this, the section lacks any hard guarantees on whether the parameter choices are correct or optimal. The graphs in the paper were derived from what seem like experimentation, but the problem with such an approach is it leaves open the possibility that more efficient and effective attacks exist. There needs to be provable formulae establishing the relationships between parameter choices, the likelihood of cycles, and in turn the theoretical bounds that any attacking implementation could have. Without this, it leaves open the possibility of novel attacks that may not have been identified yet, which is no basis for a currency. Quote There is more literature on cycle detection in graphs than on k-SUM, and very little on k-SUM in cyclic groups as you consider. I'm not advocating k-SUM as I think we've established it is a dead-end. That said, while there is much literature on cycle detection in graphs, I haven't found much in terms of proofs that concretely establish the probability of n-length such cycles existing in a G(n,p) graph. Would certainly appreciate information here if you are familiar with work in this area, and I think it would go a long way towards strengthening the fundamentals of Cuckoo Cycle as proposed.

Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle: a new memory-hard proof-of-work system

on: June 23, 2014, 10:24:29 PM

tromp - after our discussion on k-SUM, I went back and tried to categorize why I was uncomfortable with Cuckoo Cycle in its present state right now. Basically it comes down to: - It seems like the selection of 42 for the cycle length was based on experimentation, but there doesn't seem to be a hard proof for this (dga mentions this). I was originally looking at kSUM because the difficulty of kSUM is well known. In the case of cycle finding, I'm not sure if the difficulty is proven. - One way of proving this might be to instead establish the probability of a cycle of length k occurring in a randomly generated graph, and then tweaking parameters so the expectation is only 1 cycle is found. - Separately might it make sense to look deeper at k-cliques? Their probability seems to be much better understood, and might provide a more provably secure basis. Quick starting point: Assume G(n,p) - Erdős–Rényi graph let pK = Probability k nodes are a clique p^(k choose 2) let pNK = Probability no k-clique exists (1-pK)^(n choose k) Probability at least 1 k-clique exists = 1-pNK Thoughts?

Alternate cryptocurrencies / Altcoin Discussion / Re: Using k-SUM to Create an ASIC/FPGA Resistant PoW

on: June 22, 2014, 06:39:10 PM

Yes - agreed. Better to establish equivalence to an existing problem now than later. Appreciate the feedback and glad this structure was explored more thoroughly.

Alternate cryptocurrencies / Altcoin Discussion / Re: Using k-SUM to Create an ASIC/FPGA Resistant PoW

on: June 22, 2014, 06:24:09 PM

Quote Like I said before, you need to choose d and n so that you expect at most one solution. I suppose so - working through all this it seems we've established kSUM is equivalent to the Birthday Problem but with k collisions instead of two.

Alternate cryptocurrencies / Altcoin Discussion / Re: Using k-SUM to Create an ASIC/FPGA Resistant PoW

on: June 22, 2014, 03:12:48 PM

Quote Let n be the maximum number of nonces, h_1...h_n be the n hashes modulo d, and assume d = O(n^3). Then 3-SUM can be solved in O(n) memory and O(n^2) time, by storing all h_i as keys in a set S, and running for i = 1 to n for j =i+1 to n if S contains -(h_i + h_j) mod d then 3-SUM found So your memory usage is only sqrt of your running time. This is why I wrote earler: "For maximum memory-hardness, running time should be linear in memory use, which rules out some values of k, such as 3." Agree the upper bound on running time is O(n^2), but isn't the expected running time O(n)? We know the time to lookup h_i and h_j is O(1), and expect to find a collision 50% of the time in O(sqrt(n)). I'm using the assumption that d == n.

Alternate cryptocurrencies / Altcoin Discussion / Re: Using k-SUM to Create an ASIC/FPGA Resistant PoW

on: June 22, 2014, 06:27:24 AM

So the first thing I'd assert is any value of d should be a power of 2. The reasoning being it avoids biasing any k-SUM when applying modulo d. A convenient side benefit is the modulo can be done via bit mask. 3SUM is interesting because it's relatively prime with any power of 2 that could be d. Also it simplifies discussion because we know the addition of the 3rd number "shifts" the target. I.e. for SUM2 we have (k-SUM(hash(block, nonce)) % d) == 0 and for SUM3 we roughly have (k-SUM(hash(block, nonce)) % d) == -(3rd value % d) However, the extra input in 3SUM doesn't really achieve much, and actually works against us complexity wise. In normal 3SUM the domain is +-Infinity, but here it is GF(d), and by wrapping around at d we know once we pick a term, we have a 1/d chance the sum of whatever is chosen for all additional terms will "wrap us around" back to 0 and yield a solution. Plus, we can then reuse any items found as part of the solution for any other item. It's relatively intuitive to extend this reasoning from 3SUM to kSUM. Net: k should == 2 like you hinted at earlier, at least in this construction, since > 2 doesn't buy us anything. I also pondered an alternate construction to try to mitigate the wrap-around effect of GF(d): (k-SUM(hash(block, nonce) % d)) == 0 Here we apply modulus before k-SUM (assume the MSB of hash is a sign bit here since we don't have modulo wrap-around), but this has the adverse effect of making values closer to zero more valuable. I.e. if k=3, d=8, and you draw a 7, you can form far fewer solutions than if you had drawn a 1 due a lack of an outer modulus. So this alternate construction is out. Incorporating your observation that 2SUM is roughly equivalent to Momentum complexity wise, I went back to the drawing board... Taking the original 3SUM construction, I realized a significant source of weakness was the lack of ordering. If we change the construction to something like: SUM( hash(block, key0, nonce0), hash(block, key1, nonce1), hash(block, key2, nonce2)) % d == 0 (key0, key1, and key2 are unique/different) Then it becomes much harder memory-wise. Since we've established the average memory requirements for 2SUM are O(sqrt(d)), and we know 3SUM can be re-expressed as 2SUM(hash0, 2SUM(hash1, hash2)), 3SUM in this form should have complexity of O(sqrt(d)) * O(sqrt(d)) = O(d) average case, with a worst case of O(n^2). Such a massive variance in potential memory use is interesting, but I would expect a miner to optimize for the 50% case at sqrt(d). Assuming this is correct, it's pretty neat that we could tie the average memory usage linearly to d. What do you think? Does this construction seem correct/more robust?

Alternate cryptocurrencies / Altcoin Discussion / Re: Using k-SUM to Create an ASIC/FPGA Resistant PoW

on: June 21, 2014, 09:02:23 PM

OK - there isn't any fundamental disagreement then.

Alternate cryptocurrencies / Altcoin Discussion / Re: Using k-SUM to Create an ASIC/FPGA Resistant PoW

on: June 21, 2014, 08:40:45 PM

How are you defining an attempt? If it is a single nonce range scan then we agree. I was referring to progress as state within a nonce range scan.

Alternate cryptocurrencies / Altcoin Discussion / Re: Using k-SUM to Create an ASIC/FPGA Resistant PoW

on: June 21, 2014, 08:12:46 PM

Yes, while thinking about the design of memory-hard functions I realized there is a hard tradeoff here: 1. Progress-Free (scrypt; i.e. sequential memory hard) 2. Fast-To-Verify (cuckoo, ksum, momentum) The Momentum paper somewhat identifies the mechanism in #2 indirectly by calling out the necessity of independent intermediate solutions as being required for fast verification. The very presence of independent intermediate solutions implies "progress" since they are remembered in order to construct a result solution. Based on this, an ideal Fast-To-Verify memory-hard PoW would forces a 1:1 mapping between intermediate results and input domain values. An optimal one would only do this. Does this sound like a correct design goal?

Alternate cryptocurrencies / Altcoin Discussion / Re: Using k-SUM to Create an ASIC/FPGA Resistant PoW

on: June 21, 2014, 07:23:34 PM

It wasn't obvious to me how you were getting your O estimate of O(sqrt(d)) - I get that it represents complexity as d approaches infinity. Quote So for instance, 2sqrt(d) = O(d). With that many nonces, you can form up to 2sqrt(d) * (2sqrt(d)-1) / 2 = 2d-sqrt(d) 2-sums, making it pretty likely that one of them is 0 mod d. If I rephrase this, 2sqrt(d) (2sqrt(d)-1) / 2 == Choose(2 sqrt(d), 2) == 2d-sqrt(d) O(2d - sqrt(d)) == O(2d - d^{1/2}) == O(d) However concretely we also know that we have to sample d/2 unique hashes before we are guaranteed a 2SUM solution - also O(d). So I think* we're agreeing, but are taking different routes to the destination. Were you referring to O(sqrt(d)) before applying Choose, or something else? I agree that d/2 hashes on average was incorrect - it should have been d/2 hashes worst case - I'll need to do a bit more thorough complexity analysis. Separately - I'd like to dig deeper into making this progress-free, as right now I think both a modified kSUM and Cuckoo Cycle suffer from the same thing: Consider if one adds a final hash step construction to kSUM similar to Cuckoo Cycle: HASH(concat(nonce{i})) <= Target Such a construction really just constrains the domain to the nonce range. It's not obvious how adding the final HASH < Target buys anything for either kSUM or Cuckoo Cycle and may be a vector for simplification. I say this because one can arbitrarily pick nonces from potentially many partial solutions for a nonce range, and the likelihood of a partial solution gets greater as both kSUM and Cuckoo Cycle populate their sets/graphs.

Alternate cryptocurrencies / Altcoin Discussion / Re: Using k-SUM to Create an ASIC/FPGA Resistant PoW

on: June 21, 2014, 03:53:09 PM

When considering modulus by d, I did so because modulus by a fixed value has pretty well known optimizations. Quote With O(sqrt(d)) hashes you can form O(d) different 2-SUMs, and expect a solution, just like with the birthday paradox. For 3-SUM, O(d^{1/3}) hashes using O(d^{2/3}) memory should suffice. I don't think this is an instance of the birthday paradox as we're not trying to find collisions (like in Momentum), rather we're trying to find a subset sum where P is large. The number of unique subsets is d choose k. To make things concrete let d = 16, sqrt(16) = 4, so 4 hashes picked. Assume all are unique values and the sums don't collide. We can get all possible 2-sums of the 4 hashes via 4 choose 2 or 6. If O(sqrt(d)) gave us O(d) different 2-SUMs, then we'd expect for 4 hashes to get 16 unique 2-SUMS, but instead we have 6. Am I off here? How are you getting O(sqrt(d))? To be clearer, the PoW I'm suggesting is: SUM = 0; for (i to k) { SUM += hash(block, nonce{i}); } return (SUM % d) == 0; Collisions make it less likely that values will be found, not more likely. EDIT: Going in to this, I did read your paper and momentum first - the multi-nonce submission was inspired from your work. Structurally one can think of kSUM as a form of graph generation; every unique number hashed adds a fully connected vertex, but the edges have weights; find a path of weight d and length k.

Alternate cryptocurrencies / Altcoin Discussion / Re: Using k-SUM to Create an ASIC/FPGA Resistant PoW

on: June 20, 2014, 11:18:41 PM

Tromp - appreciate the comment, was hoping you'd reply Yes - it's quite likely there are improvements over naive mining algos, very curious to see what the best in the community can come up with to break/improve upon it.

Alternate cryptocurrencies / Altcoin Discussion / Re: Creating an altcoin that will never perform well on GPUs/FPGAs/ASICs

on: June 20, 2014, 11:09:06 PM

Hi all - I've posted an idea for an ASIC/FPGA restant PoW here.

Alternate cryptocurrencies / Altcoin Discussion / Using k-SUM to Create an ASIC/FPGA Resistant PoW

on: June 20, 2014, 10:43:54 PM

Following up on this thread, I went back and looked at Tromp's Cuckoo Cycle, dga's response, and Momentum. A commenter suggested investigating k-SUM (also known as the Subset sum problem) as the basis for a PoW. So here's what I've come up with: 1. Let d be the current difficulty 2. Let k be the number of hashes to SUM (k-SUM(hash(block, nonce)) % d) == 0 This function has the following properties: 1a. In 2-SUM, memory increases linearly as d increases (since d/2 hashes are stored on average until you find (hash_stored + hash_current) % d == 0). 1b. In 3-SUM, memory increases quadratically d increases (there are solutions that claim subquadratic execution reduction but none that claim subquadratic memory reduction, which is what we care about for ASIC/FPGA resistance). 1c. Generalized, we end up with a d^k memory requirement. 2. A verifier only needs constant memory, since they only have to check the k submitted nonce values. 3. It is easy to understand and implement, and there is a massive body of work around k-SUM. 4. It fits easily into existing infrastructure as the hash function could very well be SHA256d. 4a. Bitcoin's PoW can be generalized in terms of k-SUM: - SHA256d(block, nonce) < Target is essentially equivalent to 1SUM(SHA256d(block, nonce)) % d, as d ~= Target/Max_Target, with a starting d value of 2^32. 5. Simpler than anything else I've seen proposed so far. Setting aside whether people think an ASIC/FPGA resistant PoW is a good thing, I'd like to put this one out there for folks to pick apart. Thoughts?

Alternate cryptocurrencies / Altcoin Discussion / Re: Creating an altcoin that will never perform well on GPUs/FPGAs/ASICs

on: May 26, 2014, 07:45:10 PM

Quote Hashcash with a memory bound hash function like Scrypt is a dead-end for CPU-friendly PoWs because verification is linear in memory usage, which prevents the use of the large amounts of memory required to present a serious obstacle to ASICs. The ultimate CPU-friendly coin will have a PoW that takes GBs to prove, but can be verified instantly. This implies memory is/becomes cheaper on ASICs than on general purpose machines. ASICs are prevalent now because the opposite is true - compute is cheaper. Until evidence indicates otherwise, I disagree. A democratized PoW enables search and verification to be the same person. That said, I think a memory-trapdoor function would be nice to have. But it doesn't seem aligned with the requirements of a good PoW, and there isn't a good body of vetted peer-reviewed work in this area.

Alternate cryptocurrencies / Altcoin Discussion / Re: Creating an altcoin that will never perform well on GPUs/FPGAs/ASICs

on: May 26, 2014, 02:40:23 PM

Here's a good starting point when thinking about the relative costs of gates vs flip flops on ASICs: http://www.utdallas.edu/~mxl095420/EE6306/Final%20project/tsmc18_component.pdf While it is only for TSMC 180nm, it is useful for napkin analyses as the relative cell sizes are similar to more advanced process nodes which tend to require NDAs. XOR2X1 (1 bit XOR, drive strength 1) is 5.0 x 5.3um; DFFX1 is 5.0 x 11.2um. So actually 2 gates of compute for every bit of memory (reinforces the trade off even more). Also rotates have no cost as they are just wiring, and XORs are one of the larger logic cell sizes - AND2X1 is 5.0 x 2.6um so 4 gates for every bit of memory. N-bit adders are slightly more tricky, as they tend to have N log N gates in unoptimized implementations. If hypothetically you were presented with the following proof-of-work construction: Code: HashFns[10] // each is a fn ptr to a unique hash fn for (int i = 0; i < 10; ++i) { state = HashFns[state % 10](state); } You would likely implement it on an ASIC/FPGA with a proportionate number of cores relative to the size of each hash function. Each core would have a FIFO in front. All branching would do is have you "enqueue" the output of each hash function to the FIFO belonging to the core of the next hash function. This way all cores would be kept occupied, and no core would need to go idle. Other than first few clocks, no work or silicon would be wasted. So 10 cores + 10 FIFOs. This is what I meant by ASICs/FPGAs executing branches in parallel. Well designed circuits always break up algorithms into separate steps/execution stages, which are then pipelined so each execution stage is kept occupied. The implementation above is neither option A nor option B. It exploits the lack of RAM usage to maximize compute area. I agree that if options A and B were the only ways of implementing the design then yes branching would work, but professional designers would not do options A or B because they are inefficient like you call out . Quote Quote ANY design that relies on difficulty to code rather than cost to manufacture WILL fall to ASICs/FPGAs (i.e. X11). That's just not true. There is a limit of how many transistors you can place on a chip. If that's reached, you have to fall back to doing things serially. Considering that you need slow clock rates in order to be energy efficient, that will make it slow pretty quickly. We're not disagreeing on the transistor limit. I'm saying that when designing ASICs/FPGAs any time you hit serial execution just means you pipeline. This is exactly how SHA256d ASICs work (128 serial rounds of SHA in SHA256d). Separately - the reason cryptographic ASICs tend to run at lower clock rates is for efficiency and not due to any speed limitation of the flip-flops. Take the (obsolete) 180nm cells above, a DFF (d-flip flop) has a max propagation delay of .28ns, 1second/.28ns = ~3.57GHz. Clearly there must be another reason for lower clock speeds. Clock signal propagation is one of the most power expensive events on an ASIC (a pulse that has to propagate through the entire ASIC). Lowering the clock speed lets you implement more stages per clock since the signal has more time to propagate through gates (more time = lower drive voltage, fewer CMOS signal transitions). It is a balance because after too many gates the output becomes unstable due to asymmetries in most algorithms. Another bonus is that if you can do 2x the work per clock, you need fewer flip flops (due to fewer stages, where the FFs are present at the beginning of each stage to capture state), which means less die area and power. This is why most SHA256 ASICs perform at least 2 SHA rounds per clock/pipeline stage.

Pages: [1] 2 »