Latest posts of: tromp

Show Posts
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 [28] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »

541

Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox

on: July 11, 2016, 06:48:10 PM

Quote from: iamnotback on July 11, 2016, 12:02:55 PM The ASIC will always be at least two orders-of-magnitude more power efficient. potentially up to two or three orders of magnitude. potential for two orders of magnitude power efficiency improvement allowing those to be 3 to 4 orders of magnitude more power efficient on the ASIC. the ASIC will end up probably at least two orders-of-magnitude more power efficient another 1/2 to 1 order-of-magnitude advantage on electricity cost That's like an order of magnitude more orders of magnitude than I imagined! Seriously though, the large DRAM row size may not be that much of an issue, since all DRAM cells need refreshing on a frequent basis anyway. It's this refresh requirement that makes it Dynamic RAM. So I doubt you can make row size much smaller. And given that Cuckoo Cycle is constrained by memory latency, there's not that much need to optimize the computation of the siphashes. I would still say that an all ASIC solution for Cuckoo Cycle is at most one order of magnitude more energy efficient than a commodity one. Not ASIC proof, but certainly resistant...

542

Alternate cryptocurrencies / Altcoin Discussion / Re: Zcash Equihash PoW Released

on: June 20, 2016, 11:26:38 AM

Quote from: dga on June 20, 2016, 05:00:36 AM Quote from: tromp on April 17, 2016, 02:47:39 PM Again, we must wait to see the final parameters and optimized implementations. It seems likely they'll pick n=144 and k=5, which means they need to find collisions on 16 bits at a time. That is just a lookup in a 64K entry table; I don't expect to see any real sorting there. It's less clear how the actual items are represented, and if they are moved themselves, or by pointer. I believe you have the constants wrong. n=144, k=5 means that they're finding collisions on n/(k+1) = n/6 = 24 bits at a time, repeated 4 times, and then finding final collisions on the remaining 144-(4*24) = 48 bits, each time from a set of approximately 2^25 candidate hashes. You're right; I somehow managed to mess up a trivial calculation. Thanks for the correction. Quote It's quite intriguing algorithmically. I think the original paper might be a little too glib about the constants, though. Which constants are you referring to? Btw, nice to see you back after a 10 month hiatus...

543

Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox

on: June 11, 2016, 10:15:03 AM

Quote from: iamnotback on June 11, 2016, 06:11:04 AM I forgot to thank tromp for his very valuable interaction. Thanks John. You're welcome, Shelby.

544

Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox

on: June 10, 2016, 03:25:38 PM

Quote from: iamnotback on June 10, 2016, 02:15:52 PM we could also consider enumerate multiple nonces and discarding those that don't match a subset of page rows, i.e. process the cuckoo table in chunks with multiple passes. As mentioned before, this is exactly what higher values of PART_BITS achieve, as you can see in cuckoo_miner.h and cuda_miner.cu Each trimming round uses 2^PART_BITS passes with the number of counters reduced by the same factor. With sufficient reduction, you end up using only one row in each memory bank.

545

Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox

on: June 10, 2016, 01:28:47 PM

Quote from: iamnotback on June 10, 2016, 12:30:00 PM The ASIC can choose to use DRAM instead and amortize the power consumption over the row buffer. Here's a possible concrete design of what you are hinting at: The ASIC will have as many (pipelined) siphash circuits as are needed to max out the memory bandwidth of all the DRAM you're gonna hook up to it. For each row on each memory bank, the ASIC has a queue of in-row counter-indices. Each entry in the queue thus represents a "stalled thread". The siphash circuits fill up these queues at a mostly uniform rate. Once a queue fills up it is flushed to one of the memory controllers on the ASIC that will perform the corresponding atomic updates on the memory row. This looks pretty efficient, but all those queues together actually take up a lot of memory, which the ASIC needs to access randomly. So it seems we've just recreated the problem we were trying to solve. But we can aim for the total queue memory to be maybe 1% of the total DRAM. There's a lot of parameters here and many different ways to optimize with combinations of speed, energy cost, and hardware cost...

546

Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox

on: June 10, 2016, 11:37:41 AM

Quote from: iamnotback on June 10, 2016, 10:20:05 AM Quote from: tromp on June 10, 2016, 09:32:15 AM In fact siphash-2-4 is already overkill for my purposes, with siphash-4-8 being close to cryptographically secure. Your attack is about as conceivable to me as P being equal to NP, in which case there is no cryptographically secure hash function anyway. Fact? Based on what published cryptanalysis. I meant to say: In fact I believe that Quote Quote from: tromp on June 10, 2016, 09:40:21 AM You need millions to find 2^12 of them wanting to access a single row. With 2^15 threads you expect 1 access per row. Correct on 2^15 but not on the conclusion. With 2^27 you expect 2^12 accesses per row. That's hundreds of millions. Quote A CUDA gpu can apparently do 671 million threads Not in hardware. It can only run a few thousand in hardware (simultaneously). So the 671 have to run in thousands of batches, one after the other. Quote I don't think it is safe to assume that ASICs can't be designed to support millions of very efficient threads for this very customized computation Your ASIC would require an insanely huge die and be orders of magnitude more expensive than the DRAM it tries to optimize access to. Quote and again sharing the computation transistors amongst only the threads that aren't stalled, thus not needing millions of instances of the compute units. How would you know what to stall without doing its computation first?? Quote Also remember I need for instant transactions to have a very fast proving time, which thus means at 2^20 counters it could be trivially parallelized losing ASIC resistance with only thousands of threads. For such small instances, latency is not that relevant as it all fits in SRAM cache.

547

Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox

on: June 10, 2016, 09:40:21 AM

Quote from: iamnotback on June 09, 2016, 11:18:37 PM My calculation was 2^15, that isn't a million. You need millions to find 2^12 of them wanting to access a single row. With 2^15 threads you expect 1 access per row.

548

Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox

on: June 10, 2016, 09:32:15 AM

Quote from: iamnotback on June 09, 2016, 07:09:30 PM I am just arguing why not be safer than sorry when so much is at stake? Why would you recommend adding unnecessary risk even where you think you are omniscient and there is no risk? I only need good statistical properties of the hash function, not cryptographic security. In fact siphash-2-4 is already overkill for my purposes, with siphash-4-8 being close to cryptographically secure. Your attack is about as conceivable to me as P being equal to NP, in which case there is no cryptographically secure hash function anyway. Quote What I am saying is that the edge trimming eliminates most of the nodes from consideration and then you have a hash table representing a sparse array for the remainder of the nodes which aren't the pruned/trimmed leaf edges. Yes, totally correct. But in my view the term "bucket" denotes a fixed size unordered container, and my current algorithms use no such thing. Quote Well my wink is about using maximum nonce counts, i.e. edge fraction `M/N > 1/2`. My algorithms break down in that case, since the expected number of cycles becomes non-constant, causing the basic algorithm to overlook an increasing fraction of them, and edge-trimming fails altogether to remove a large fraction of edges.

549

Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox

on: June 09, 2016, 10:55:11 PM

Quote from: iamnotback on June 09, 2016, 05:33:30 PM The point is that if you have enough threads running, the likelihood is there are 2^12 threads wanting to read from the same row, thus your algorithm is no longer bounded on memory latency rather on memory bandwidth and moreover that there isn't 1 hash per memory row buffer load but instead 2^12 hashes, and thus actually your algorithm is computation bound and will be 1000s times more power efficient on the ASIC. I stated earlier Quote To avoid latency, you could increase the number of memory banks by a factor of 2^7, but this similarly increases die area, hardware costs, and energy usage. Your above proposal of simultaneously running millions of hardware threads and needing to sync them incurs much worse increases in die area (to the point of not fitting on any single die), hardware costs, and energy usage.

550

Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox

on: June 09, 2016, 06:47:12 PM

Quote from: iamnotback on June 09, 2016, 05:33:30 PM Quote from: tromp on June 09, 2016, 04:00:22 PM Attacks on hash functions are mostly about doing tons of computation (but less than try all inputs) to try invert its outputs. Siphash may not stand up to that. But to think that a weakness in inversion resistance can make finding cycles of length 42 in a graph defined by millions or billions of siphash outputs easier is just... utterly inconceivable. That's the gist of it. Please distinguish preimage attacks from multi-collision attacks. If we can find certain characteristics in the hash function which are not perfectly random and are repeatable relationship with certain changes of bits in the key from hash-to-hash, then we can perhaps impact the cycling in the Cuckoo. Cycle forming or the way we optimize detecting them, may be impacting by certain patterning away from a perfect Random Oracle. My comment applies similarly to multi-collision attacks. I believe detecting deviations from perfect randomness to be not only way too expensive but more importantly to die out exponentially fast as the cycle grows. Quote I am not comfortable with cheating margins of security on the hash function. Sorry. I know you and Zooko disagree with me and Dan on this. You feel such attacks are conceivable. I feel such attacks are quite inconceivable. Clearly we're not going to convince each other. So let's just agree to disagree. People are more than welcome to run Cuckoo Cycle with blake2b replacing siphash if they feel that trade off is worthwhile. It's just not something I recommend. Quote Any way, if the edge trimming is done after running enumerating all possible nonces, then it is not a heuristic and is an optimization to reduce memory consumption by eliminating all the edges which can't be in a cycle and then I presume you re-enumerate the nonces on the "basic algorithm" but use a hash table to represent a sparse array so you don't have to store all the buckets from the more naive version of the basic algorithm. The basic algorithm doesn't use buckets. Just a cuckoo hash table. In the edge trimming case, only on the order of 1% of edges remains, and that table is a little more complicated to translate the sparseness into memory savings. But still no buckets. Quote Btw, you may not remember that AnonyMint or TPTB_need_war wrote to you a long time ago (and I think perhaps before Dave Andersen did) that you could compress and count with bit flags. So you should credit Shelby too, lol (but Dave wrote a much more detailed exposition). Do I need to go find that post on BCT where AnonyMint or TPTB_need_war told you that? I remember reading it again this week when I was doing my deeper research on Cuckoo Cycle. Sure, a URL would help. Quote Also you do note that the edge trimming is applicable for the case where many/most of the edges will leaf nodes, which is the case where `M/N < 1/2`. You may think that is acceptable, because your intended usage is apparently to run Cuckoo at that load. I note it for reasons I will not reveal now. A much earlier version used edge fractions lower than 1/2 as a difficulty control, but I abandoned it since there a relationship between edge fraction and cycle probability is strongly non-linear, making dynamic difficulty control practically impossible. Quote Okay so now we are on the same page, so I can proceed to reply to what you had written before... Gonna visit the local chess club now. Will read and respond to that later...

551

Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox

on: June 09, 2016, 04:00:22 PM

Quote from: iamnotback on June 09, 2016, 01:31:29 PM Quote from: tromp on June 09, 2016, 01:03:26 PM Quote from: iamnotback on June 09, 2016, 07:41:57 AM Quote from: tromp on June 09, 2016, 07:09:40 AM Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function, and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound. Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed. I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle. Did he consider (especially parallelized) multi-collision attacks and their impact on the potential to find an algorithm that can find cycles faster? Can you paraphrase what he said or the gist of his reasoning? Attacks on hash functions are mostly about doing tons of computation (but less than try all inputs) to try invert its outputs. Siphash may not stand up to that. But to think that a weakness in inversion resistance can make finding cycles of length 42 in a graph defined by millions or billions of siphash outputs easier is just... utterly inconceivable. That's the gist of it. Quote I think you are referring to something entirely unrelated to what I was referring to. I see `int` here in the code I am referring to: https://github.com/tromp/cuckoo/blob/gh-pages/cuckoo.c#L10 That's the basic algorithm, not the reference miner, which is in https://github.com/tromp/cuckoo/blob/master/src/cuckoo_miner.h for CPUs and https://github.com/tromp/cuckoo/blob/master/src/cuda_miner.cu for GPUs. These spend most time doing edge trimming, so that's what this discussion should focus on.

552

Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox

on: June 09, 2016, 01:03:26 PM

Quote from: iamnotback on June 09, 2016, 07:41:57 AM Quote from: tromp on June 09, 2016, 07:09:40 AM Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function, and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound. Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed. I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle. Quote Are you referring to the latest version of your code, because I am referring to the code that was in the Appendix of your December 31. 2014 white paper which references `int` buckets. The bucketing in an earlier version was just something inessential to the algorithm that was found to be beneficial to performance. Last year I replaced it by prefetching in commit https://github.com/tromp/cuckoo/commit/5aef59659e77c599c730ece6a42a7a2597de80da which turned out to be more beneficial. Quote Is this edge trimming coming from a suggestion from Dave Andersen? Yes, implemented back in April 2014. Quote Quote from: tromp on June 09, 2016, 07:09:40 AM In one round of edge trimming, Cuckoo Cycle typically updates 2^29 counters spread over 2^8 memory banks. Each memory bank thus holds 2^21 counters, only 2^14 of which typically fit in a single row. This is where the latency comes in. To avoid latency, you could increase the number of memory banks by a factor of 2^7, but this similarly increases die area, hardware costs, and energy usage. Or increase the number of h/w threads to 2^15 (2^8 multiplied by 2^21 bits for 2-bit counters divided by 2^14 bits per row) which have some efficient h/w mechanism for waiting to be synchronized on a memory page/row. I don't see why you multiply with the number of rows that Cuckoo uses per memory bank. Your 2^15 threads will each hash some nonce and then most will want to access a unique row. Then what? There's little to be synchronized. Quote Quote from: tromp on June 09, 2016, 07:09:40 AM Alternatively, the algorithm parameter PART_BITS allows for a reduction in the number of counters in use at the same time, which is what your proposal essentially amounts to. Setting this to 21 will require only 2^8 counters, one per memory bank. But now your hash computations have increased by a factor 2^21, over 2 million. No that is not equivalent to increasing the number of h/w threads and syncing them to pause until 2^13 of them are queued up to read a shared memory page/row. In my code a thread is something that processes a sequence of nonces, hashing each one, and then atomically updating a corresponding counter, without any coordination. Are you proposing a thread for each nonce? Half a billion threads? I really don't understand what you are proposing and how this syncing is supposed to be implemented. Quote Btw, electrical power cost of SRAM is only one order-of-magnitude increase: I'm talking about the dollar cost of SRAM vs DRAM.

553

Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox

on: June 09, 2016, 07:09:40 AM

Quote from: iamnotback on June 07, 2016, 09:24:07 AM Cuckoo Cycle insecure use of Siphash. Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function, and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound. Quote I also expect that given Cuckoo Cycle is parallelizable No need to expect. This is clearly stated in the white paper. Quote necessary to make Cuckoo Cycle's proving time reasonable for instant transactions The white paper also explains that Cuckoo Cycle is not suitable for short block intervals, and works best for intervals even longer than bitcoin's. Instant transactions are best handled with off-chain payment channels, reserving the main chain for settlement. Quote tromp thinks that Cuckoo Cycle has very low computation relative to memory latency. But this depends on only one 32-bit bucket being read in each memory page/row. With massive number of h/w threads, can get them all synced up (or sorted) so that we read 1024 of 32-bit buckets in one memory latency cycle Cuckoo Cycle randomly accesses 2-bit counters, not 32-bit buckets. In one round of edge trimming, Cuckoo Cycle typically updates 2^29 counters spread over 2^8 memory banks. Each memory bank thus holds 2^21 counters, only 2^14 of which typically fit in a single row. This is where the latency comes in. To avoid latency, you could increase the number of memory banks by a factor of 2^7, but this similarly increases die area, hardware costs, and energy usage. Alternatively, the algorithm parameter PART_BITS allows for a reduction in the number of counters in use at the same time, which is what your proposal essentially amounts to. Setting this to 21 will require only 2^8 counters, one per memory bank. But now your hash computations have increased by a factor 2^21, over 2 million. Both of these approaches are rather cost prohibitive. You might as well avoid DRAM latency by using SRAM instead. But that is also two orders of magnitude more expensive, while the performance increase is closer to one order of magnitude. Cuckoo Cycle may not be ASIC-proof, but it certainly seems to resist...

554

Alternate cryptocurrencies / Mining (Altcoins) / Re: How will this change the world of mining?? GTX 1080 / 1070

on: June 05, 2016, 10:33:38 AM

Quote from: vaulter on May 10, 2016, 10:47:57 PM What do you suggest to test next? Could you please test Cuckoo Cycle? https://github.com/tromp/cuckoo On a Linux system, after cloning the repo, can you run cd src make cuda30 time ./cuda30 -h 0 -n 11 -t 16384 and report the output?

555

Bitcoin / Development & Technical Discussion / Re: Fixing Bitcoin's 2nd big issue..

on: May 07, 2016, 06:58:26 PM

Quote from: AlexGR on May 07, 2016, 05:02:55 PM What if they don't need to scale? What if they break down a 2000 qubit problem into emulated segments of 5 qubits, just like a 8-bit processor can do with 16-bit or 32-bit operations? Because those segments would not be quantum-entangled. We can already emulate 5-qubit computations much faster on classical hardware than on a real quantum compute, and we can combine 1000s of those emulations. That does not a 200 qubit system make...

556

Bitcoin / Development & Technical Discussion / Re: Fixing Bitcoin's 2nd big issue..

on: May 07, 2016, 04:51:47 PM

Quote from: bob123 on May 07, 2016, 01:36:20 PM Quantum computer are a problem for btc. thats true. But it still need quite a lot of time until quantum computers are made to scale. Fixed that for you... Btw, quantum mining is a tiny problem compared to being able to steal money from any address that's ever spent money, by using Shor's algorithm to recover the private key from the public key...

557

Alternate cryptocurrencies / Altcoin Discussion / Re: Zcash Equihash PoW Released

on: April 29, 2016, 03:13:41 PM

Quote from: maxsinner on April 29, 2016, 01:50:37 PM So can we mine this already? Not for real. You can mine on the zcash testnet but the current "basic" solver is un-optimized (hence rather slow) and uses reduced parameters.

558

Alternate cryptocurrencies / Altcoin Discussion / Re: Zcash Equihash PoW Released

on: April 29, 2016, 01:40:30 PM

Quote from: tromp on April 18, 2016, 01:56:29 AM Quote from: TPTB_need_war on April 17, 2016, 06:49:58 PM I thought Wagner's algorithm sorted. I'll have to reread the paper more carefully. That puzzled me too. I don't see a need for sorting, just for binning to find collisions on the next chunk of n/(k+1) bits In this case it turns out there is little difference between fully sorting and sorting into bins, since the expected bin size is only 2 :-) So equihash really seems to be all about sorting...

559

Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] Wolf's GBT CPU HODL Miner - Bounty Fund

on: April 26, 2016, 01:37:48 PM

Quote from: in2tactics on April 26, 2016, 10:44:53 AM Quote from: Wolf0 on April 25, 2016, 05:51:58 PM Quote from: in2tactics on April 25, 2016, 10:31:31 AM ... What would you have suggested in place of the current algorithm? ... A proper KDF would work... many are even tunable to different needs. I never thought of using KDF in this manner. You never heard of scrypt?

560

Bitcoin / Development & Technical Discussion / Re: Fixing Bitcoin's 2nd big issue..

on: April 21, 2016, 01:58:40 PM

Quote from: spartacusrex on April 21, 2016, 08:48:57 AM Ho hum.. just talking to myself here.. How about this : Make it so that the miner's centralised hash power can't compete with the masses. .. There are 1 billion android phones out there. If Android phones (unsure if Apple would jump onboard with this..) started to come out with heat efficient, low power, mining chips and excellent low-bandwidth mining pool integration, would that help.. Phones already have heat efficient, low power, mining chips: DRAM. You just need to use a PoW for which reading and writing billions of bits at random locations forms the bottleneck...

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 [28] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »