Bitcoin Forum
May 24, 2024, 12:49:10 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
  Home Help Search Login Register More  
  Show Posts
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 [28] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
541  Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox on: July 11, 2016, 06:48:10 PM
The ASIC will always be at least two orders-of-magnitude more power efficient.

potentially up to two or three orders of magnitude.

potential for two orders of magnitude power efficiency improvement

allowing those to be 3 to 4 orders of magnitude more power efficient on the ASIC.

the ASIC will end up probably at least two orders-of-magnitude more power efficient

another 1/2 to 1 order-of-magnitude advantage on electricity cost

That's like an order of magnitude more orders of magnitude than I imagined!

Seriously though, the large DRAM row size may not be that much of an issue, since
all DRAM cells need refreshing on a frequent basis anyway. It's this refresh requirement
that makes it Dynamic RAM. So I doubt you can make row size much smaller.
And given that Cuckoo Cycle is constrained by memory latency, there's not that much
need to optimize the computation of the siphashes.
I would still say that an all ASIC solution for Cuckoo Cycle is at most one order
of magnitude more energy efficient than a commodity one.

Not ASIC proof, but certainly resistant...
542  Alternate cryptocurrencies / Altcoin Discussion / Re: Zcash Equihash PoW Released on: June 20, 2016, 11:26:38 AM
Again, we must wait to see the final parameters and optimized implementations. It seems likely they'll pick
n=144 and k=5, which means they need to find collisions on 16 bits at a time. That is just a lookup in a 64K entry table;
I don't expect to see any real sorting there. It's less clear how the actual items are represented, and if they are moved
themselves, or by pointer.

I believe you have the constants wrong.  n=144, k=5 means that they're finding collisions
on n/(k+1) = n/6 = 24 bits at a time, repeated 4 times, and then finding final collisions
on the remaining 144-(4*24) = 48 bits, each time from a set of approximately 2^25
candidate hashes.

You're right; I somehow managed to mess up a trivial calculation. Thanks for the correction.

Quote
It's quite intriguing algorithmically.  I think the original paper might be a little too glib about the constants, though.

Which constants are you referring to?

Btw, nice to see you back after a 10 month hiatus...
543  Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox on: June 11, 2016, 10:15:03 AM
I forgot to thank tromp for his very valuable interaction. Thanks John.

You're welcome, Shelby.
544  Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox on: June 10, 2016, 03:25:38 PM
we could also consider enumerate multiple nonces and discarding those that don't match a subset of page rows, i.e. process the cuckoo table in chunks with multiple passes.

As mentioned before, this is exactly what higher values of PART_BITS achieve,
as you can see in cuckoo_miner.h and cuda_miner.cu

Each trimming round uses 2^PART_BITS passes with the number of counters reduced by the same factor.
With sufficient reduction, you end up using only one row in each memory bank.
545  Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox on: June 10, 2016, 01:28:47 PM
The ASIC can choose to use DRAM instead and amortize the power consumption over the row buffer.

Here's a possible concrete design of what you are hinting at:

The ASIC will have as many (pipelined) siphash circuits as are needed to max out the memory
bandwidth of all the DRAM you're gonna hook up to it.
For each row on each memory bank, the ASIC has a queue of in-row counter-indices.
Each entry in the queue thus represents a "stalled thread".
The siphash circuits fill up these queues at a mostly uniform rate.
Once a queue fills up it is flushed to one of the memory controllers on the ASIC that
will perform the corresponding atomic updates on the memory row.

This looks pretty efficient, but all those queues together actually take up a lot of memory,
which the ASIC needs to access randomly.

So it seems we've just recreated the problem we were trying to solve.
But we can aim for the total queue memory to be maybe 1% of the total DRAM.

There's a lot of parameters here and many different ways to optimize with
combinations of speed, energy cost, and hardware cost...

546  Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox on: June 10, 2016, 11:37:41 AM
In fact siphash-2-4 is already overkill for my purposes, with siphash-4-8 being close to cryptographically secure. Your attack is about as conceivable to me as P being equal to NP, in which case there is no cryptographically secure hash function anyway.

Fact? Based on what published cryptanalysis.

I meant to say: In fact I believe that

Quote
You need millions to find 2^12 of them wanting to access a single row.
With 2^15 threads you expect 1 access per row.
Correct on 2^15 but not on the conclusion.

With 2^27 you expect 2^12 accesses per row. That's hundreds of millions.

Quote
A CUDA gpu can apparently do 671 million threads

Not in hardware. It can only run a few thousand in hardware (simultaneously).
So the 671 have to run in thousands of batches, one after the other.

Quote
I don't think it is safe to assume that ASICs can't be designed to support millions of very efficient threads for this very customized computation

Your ASIC would require an insanely huge die and be orders of magnitude more expensive than the DRAM it tries to optimize access to.

Quote
and again sharing the computation transistors amongst only the threads that aren't stalled, thus not needing millions of instances of the compute units.

How would you know what to stall without doing its computation first??

Quote
Also remember I need for instant transactions to have a very fast proving time, which thus means at 2^20 counters it could be trivially parallelized losing ASIC resistance with only thousands of threads.

For such small instances, latency is not that relevant as it all fits in SRAM cache.
547  Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox on: June 10, 2016, 09:40:21 AM
My calculation was 2^15, that isn't a million.

You need millions to find 2^12 of them wanting to access a single row.

With 2^15 threads you expect 1 access per row.
548  Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox on: June 10, 2016, 09:32:15 AM
I am just arguing why not be safer than sorry when so much is at stake? Why would you recommend adding unnecessary risk even where you think you are omniscient and there is no risk?

I only need good statistical properties of the hash function, not cryptographic security.
In fact siphash-2-4 is already overkill for my purposes, with siphash-4-8 being
close to cryptographically secure.
Your attack is about as conceivable to me as P being equal to NP,
in which case there is no cryptographically secure hash function anyway.

Quote
What I am saying is that the edge trimming eliminates most of the nodes from consideration and then you have a hash table representing a sparse array for the remainder of the nodes which aren't the pruned/trimmed leaf edges.

Yes, totally correct. But in my view the term "bucket" denotes a fixed size unordered container,
and my current algorithms use no such thing.

Quote
Well my wink is about using maximum nonce counts, i.e. edge fraction M/N > 1/2.

My algorithms break down in that case, since the expected number of cycles becomes non-constant,
causing the basic algorithm to overlook an increasing fraction of them,
and edge-trimming fails altogether to remove a large fraction of edges.
549  Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox on: June 09, 2016, 10:55:11 PM
The point is that if you have enough threads running, the likelihood is there are 2^12 threads wanting to read from the same row, thus your algorithm is no longer bounded on memory latency rather on memory bandwidth and moreover that there isn't 1 hash per memory row buffer load but instead 2^12 hashes, and thus actually your algorithm is computation bound and will be 1000s times more power efficient on the ASIC.

I stated earlier
Quote
To avoid latency, you could increase the number of memory banks by a factor of 2^7,
but this similarly increases die area, hardware costs, and energy usage.

Your above proposal of simultaneously running millions of hardware threads and needing to sync them
incurs much worse increases in die area (to the point of not fitting on any single die),
hardware costs, and energy usage.
550  Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox on: June 09, 2016, 06:47:12 PM
Attacks on hash functions are mostly about doing tons of computation (but less than try all inputs)
to try invert its outputs. Siphash may not stand up to that. But to think that a weakness in inversion
resistance can make finding cycles of length 42 in a graph defined by millions or billions of siphash
outputs easier is just... utterly inconceivable. That's the gist of it.

Please distinguish preimage attacks from multi-collision attacks. If we can find certain characteristics in the hash function which are not perfectly random and are repeatable relationship with certain changes of bits in the key from hash-to-hash, then we can perhaps impact the cycling in the Cuckoo. Cycle forming or the way we optimize detecting them, may be impacting by certain patterning away from a perfect Random Oracle.
My comment applies similarly to multi-collision attacks. I believe detecting deviations
from perfect randomness to be not only way too expensive but more importantly to die out exponentially
fast as the cycle grows.

Quote
I am not comfortable with cheating margins of security on the hash function. Sorry.
I know you and Zooko disagree with me and Dan on this.
You feel such attacks are conceivable. I feel such attacks are quite inconceivable.
Clearly we're not going to convince each other. So let's just agree to disagree.
People are more than welcome to run Cuckoo Cycle with blake2b replacing siphash
if they feel that trade off is worthwhile. It's just not something I recommend.

Quote
Any way, if the edge trimming is done after running enumerating all possible nonces, then it is not a heuristic and is an optimization to reduce memory consumption by eliminating all the edges which can't be in a cycle and then I presume you re-enumerate the nonces on the "basic algorithm" but use a hash table to represent a sparse array so you don't have to store all the buckets from the more naive version of the basic algorithm.

The basic algorithm doesn't use buckets. Just a cuckoo hash table.
In the edge trimming case, only on the order of 1% of edges remains, and
that table is a little more complicated to translate the sparseness into memory savings.
But still no buckets.

Quote
Btw, you may not remember that AnonyMint or TPTB_need_war wrote to you a long time ago (and I think perhaps before Dave Andersen did) that you could compress and count with bit flags. So you should credit Shelby too, lol (but Dave wrote a much more detailed exposition). Do I need to go find that post on BCT where AnonyMint or TPTB_need_war told you that? I remember reading it again this week when I was doing my deeper research on Cuckoo Cycle.

Sure, a URL would help.

Quote
Also you do note that the edge trimming is applicable for the case where many/most of the edges will leaf nodes, which is the case where M/N < 1/2. You may think that is acceptable, because your intended usage is apparently to run Cuckoo at that load. I note it for reasons I will not reveal now.  Wink

A much earlier version used edge fractions lower than 1/2 as a difficulty control, but
I abandoned it since there a relationship between edge fraction and cycle probability is strongly
non-linear, making dynamic difficulty control practically impossible.

Quote
Okay so now we are on the same page, so I can proceed to reply to what you had written before...

Gonna visit the local chess club now. Will read and respond to that later...
551  Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox on: June 09, 2016, 04:00:22 PM
Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.
Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.
I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle.

Did he consider  (especially parallelized) multi-collision attacks and their impact on the potential to find an algorithm that can find cycles faster? Can you paraphrase what he said or the gist of his reasoning?

Attacks on hash functions are mostly about doing tons of computation (but less than try all inputs)
to try invert its outputs. Siphash may not stand up to that. But to think that a weakness in inversion
resistance can make finding cycles of length 42 in a graph defined by millions or billions of siphash
outputs easier is just... utterly inconceivable. That's the gist of it.

Quote
I think you are referring to something entirely unrelated to what I was referring to. I see int here in the code I am referring to:

https://github.com/tromp/cuckoo/blob/gh-pages/cuckoo.c#L10

That's the basic algorithm, not the reference miner, which is in
https://github.com/tromp/cuckoo/blob/master/src/cuckoo_miner.h
for CPUs and
https://github.com/tromp/cuckoo/blob/master/src/cuda_miner.cu
for GPUs.
These spend most time doing edge trimming,
so that's what this discussion should focus on.
552  Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox on: June 09, 2016, 01:03:26 PM
Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.
Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.
I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle.


Quote
Are you referring to the latest version of your code, because I am referring to the code that was in the Appendix of your December 31. 2014 white paper which references int buckets.

The bucketing in an earlier version was just something inessential to the algorithm that was
found to be beneficial to performance. Last year I replaced it by prefetching in commit
https://github.com/tromp/cuckoo/commit/5aef59659e77c599c730ece6a42a7a2597de80da
which turned out to be more beneficial.

Quote
Is this edge trimming coming from a suggestion from Dave Andersen?
Yes, implemented back in April 2014.

Quote
In one round of edge trimming, Cuckoo Cycle typically updates 2^29 counters
spread over 2^8 memory banks. Each memory bank thus holds 2^21 counters,
only 2^14 of which typically fit in a single row. This is where the latency comes in.
To avoid latency, you could increase the number of memory banks by a factor of 2^7,
but this similarly increases die area, hardware costs, and energy usage.
Or increase the number of h/w threads to 2^15 (2^8 multiplied by 2^21 bits for 2-bit counters divided by 2^14 bits per row) which have some efficient h/w mechanism for waiting to be synchronized on a memory page/row.

I don't see why you multiply with the number of rows that Cuckoo uses per memory bank.
Your 2^15 threads will each hash some nonce and then most will want to access a unique row.
Then what? There's little to be synchronized.

Quote
Alternatively, the algorithm parameter PART_BITS allows for a reduction in the number
of counters in use at the same time, which is what your proposal essentially amounts to.
Setting this to 21 will require only 2^8 counters,
one per memory bank. But now your hash computations have increased by a factor 2^21,
over 2 million.
No that is not equivalent to increasing the number of h/w threads and syncing them to pause until 2^13 of them are queued up to read a shared memory page/row.


In my code a thread is something that processes a sequence of nonces, hashing each one, and
then atomically updating a corresponding counter, without any coordination.

Are you proposing a thread for each nonce? Half a billion threads?
I really don't understand what you are proposing and how this syncing is supposed to be implemented.

Quote
Btw, electrical power cost of SRAM is only one order-of-magnitude increase:

I'm talking about the dollar cost of SRAM vs DRAM.
553  Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox on: June 09, 2016, 07:09:40 AM
Cuckoo Cycle insecure use of Siphash.

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.

Quote
I also expect that given Cuckoo Cycle is parallelizable

No need to expect. This is clearly stated in the white paper.

Quote
necessary to make Cuckoo Cycle's proving time reasonable for instant transactions

The white paper also explains that Cuckoo Cycle is not suitable for short block intervals,
and works best for intervals even longer than bitcoin's.
Instant transactions are best handled with off-chain payment channels,
reserving the main chain for settlement.

Quote
tromp thinks that Cuckoo Cycle has very low computation relative to memory latency. But this depends on only one 32-bit bucket being read in each memory page/row. With massive number of h/w threads, can get them all synced up (or sorted) so that we read 1024 of 32-bit buckets in one memory latency cycle

Cuckoo Cycle randomly accesses 2-bit counters, not 32-bit buckets.
In one round of edge trimming, Cuckoo Cycle typically updates 2^29 counters
spread over 2^8 memory banks. Each memory bank thus holds 2^21 counters,
only 2^14 of which typically fit in a single row. This is where the latency comes in.
To avoid latency, you could increase the number of memory banks by a factor of 2^7,
but this similarly increases die area, hardware costs, and energy usage.

Alternatively, the algorithm parameter PART_BITS allows for a reduction in the number
of counters in use at the same time, which is what your proposal essentially amounts to.
Setting this to 21 will require only 2^8 counters,
one per memory bank. But now your hash computations have increased by a factor 2^21,
over 2 million.

Both of these approaches are rather cost prohibitive. You might as well avoid DRAM latency
by using SRAM instead. But that is also two orders of magnitude more expensive, while the
performance increase is closer to one order of magnitude.

Cuckoo Cycle may not be ASIC-proof, but it certainly seems to resist...
554  Alternate cryptocurrencies / Mining (Altcoins) / Re: How will this change the world of mining?? GTX 1080 / 1070 on: June 05, 2016, 10:33:38 AM
What do you suggest to test next?

Could you please test Cuckoo Cycle?

https://github.com/tromp/cuckoo

On a Linux system, after cloning the repo, can you run
  cd src
  make cuda30
  time ./cuda30 -h 0 -n 11 -t 16384
and report the output?
555  Bitcoin / Development & Technical Discussion / Re: Fixing Bitcoin's 2nd big issue.. on: May 07, 2016, 06:58:26 PM
What if they don't need to scale? What if they break down a 2000 qubit problem into emulated segments of 5 qubits, just like a 8-bit processor can do with 16-bit or 32-bit operations?

Because those segments would not be quantum-entangled.

We can already emulate 5-qubit computations much faster on classical hardware
than on a real quantum compute, and we can combine 1000s of those emulations.
That does not a 200 qubit system make...
556  Bitcoin / Development & Technical Discussion / Re: Fixing Bitcoin's 2nd big issue.. on: May 07, 2016, 04:51:47 PM
Quantum computer are a problem for btc. thats true.
But it still need quite a lot of time until quantum computers are

made to scale.

Fixed that for you...


Btw, quantum mining is a tiny problem compared to being able to steal money from any address that's ever spent money,
by using Shor's algorithm to recover the private key from the public key...
557  Alternate cryptocurrencies / Altcoin Discussion / Re: Zcash Equihash PoW Released on: April 29, 2016, 03:13:41 PM
So can we mine this already?

Not for real.

You can mine on the zcash testnet but the current "basic" solver is un-optimized (hence rather slow)
and uses reduced parameters.
558  Alternate cryptocurrencies / Altcoin Discussion / Re: Zcash Equihash PoW Released on: April 29, 2016, 01:40:30 PM
I thought Wagner's algorithm sorted. I'll have to reread the paper more carefully.

That puzzled me too. I don't see a need for sorting, just for binning to find collisions on the next chunk of n/(k+1) bits

In this case it turns out there is little difference between fully sorting and sorting into bins,
since the expected bin size is only 2 :-)

So equihash really seems to be all about sorting...
559  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] Wolf's GBT CPU HODL Miner - Bounty Fund on: April 26, 2016, 01:37:48 PM
... What would you have suggested in place of the current algorithm? ...

A proper KDF would work... many are even tunable to different needs.

I never thought of using KDF in this manner.

You never heard of scrypt?
560  Bitcoin / Development & Technical Discussion / Re: Fixing Bitcoin's 2nd big issue.. on: April 21, 2016, 01:58:40 PM
Ho hum.. just talking to myself here..   Tongue

How about this :

Make it so that the miner's centralised hash power can't compete with the masses.

..

There are 1 billion android phones out there.

If Android phones (unsure if Apple would jump onboard with this..) started to come out with heat efficient, low power, mining chips and excellent low-bandwidth mining pool integration, would that help..

Phones already have heat efficient, low power, mining chips: DRAM.

You just need to use a PoW for which reading and writing billions of bits at random locations
forms the bottleneck...
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 [28] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!