Bitcoin Forum
December 14, 2017, 09:04:09 AM *
News: Latest stable version of Bitcoin Core: 0.15.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 ... 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 [54] 55 56 57 58 »
  Print  
Author Topic: The Ethereum Paradox  (Read 84474 times)
finitemaz
Member
**
Offline Offline

Activity: 62


View Profile
June 07, 2016, 08:27:03 PM
 #1061

well above my absorption rate these days. Sad

Oregano oil very high dosed of sublingually and via enteric coated capsules. Do not delay.

Why talk about yourself in third person?

Because the mods are the FUD trolls:

https://bitcointalk.org/index.php?topic=1502277.msg15118630#msg15118630

Some dumb monkey wants to feel important by kissing Theymos's butthole delusions. I remember before Lauda became "Staff", he stood out to me as one of the prominent dumb trolls. That he became a mod or staff is so befitting of the dumber than dumb of any collectivized endeavor. It is simply impossible to not have a large social grouping not end up as the least common denominator.

And who is talking about themselves in the third person?  Wink Next time I will be forced to "not come back" in a drag queen persona.

I am wasting time. Bye.

"What is coming?", says finitemaz, "please tell me". 
The Bitcoin network protocol was designed to be extremely flexible. It can be used to create timed transactions, escrow transactions, multi-signature transactions, etc. The current features of the client only hint at what will be possible in the future.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1513242249
Hero Member
*
Offline Offline

Posts: 1513242249

View Profile Personal Message (Offline)

Ignore
1513242249
Reply with quote  #2

1513242249
Report to moderator
1513242249
Hero Member
*
Offline Offline

Posts: 1513242249

View Profile Personal Message (Offline)

Ignore
1513242249
Reply with quote  #2

1513242249
Report to moderator
iamnotback
Sr. Member
****
Offline Offline

Activity: 336



View Profile
June 08, 2016, 06:18:30 AM
 #1062

"What is coming?", says finitemaz, "please tell me".  

Last post for as long as I can resist the temptation to respond to the BS on BCT:

Re: Will Bitcoin be replaced by another cryptocurrency?

Yes it could happen but i doubt very much it will happen the short to medium term, to actually make bitcoin obsolete would be a achievement.

It would have to offer something very important for mass adoption which Bitcoin doesn't offer.

And that something would need viral uptake.

I am planning to make that Bitcoin killer.

And yes very difficult to achieve and implementation is a Herculian undertaking that is more likely to not be achieved quickly enough.

Although Bitcoin can copy any innovation...

Bitcoin can't copy every innovation, especially if that innovation turns all the existing mining farms into useless door stops. The vested interests in Bitcoin could never agree to copy radical block chain overhaul. Also that new design would already have the first mover advantage with millions more users than Bitcoin. Bitcoin can't scale so it won't have those millions of users to start with.

...yes it could but at this rate of popularity its going to be pretty hard for some other crypto to take over...

What popularity? Do you have any clue that Bitcoin has less users than any one of dozens of minority marketshare social networks.

Virtually no one uses Bitcoin for commerce. The main use of Bitcoin is for avoiding capital controls, speculation, gambling, and "nefarious" activities. Bitcoin isn't close to becoming mainstream.

Merchants accept Bitcoin not because they get a lot of sales from it, but because it is free geek-cool promotion to the geek demographic.

Note I added "Edit#3" for tromp to read.
ravens
Member
**
Offline Offline

Activity: 98


View Profile
June 08, 2016, 06:46:41 AM
 #1063

Ethereum is better than bitcoin.  End of story
Nope, bitcoin is better than eth. Eth is still new(compared to bit/doge/litecoin). The only thing Eth has going for it is it's contracts(which I admit, are useful), that's it. End of story.

▄ ▀ ▄ ▀   LET'S THINK ABOUT THE FUTURE   ▀ ▄ ▀ ▄
●▬●▬●▬●▬●▬●▬●▬●▬●▬●▬●   WhyFuture.com   ●▬●▬●▬●▬●▬●▬●▬●▬●▬●▬●
A Blog That Discusses About Artificial Intelligence, The Future, Ethics on AI, and more
Washika
Sr. Member
****
Offline Offline

Activity: 332


View Profile
June 08, 2016, 04:48:20 PM
 #1064

Ethereum is better than bitcoin.  End of story
Nope, bitcoin is better than eth. Eth is still new(compared to bit/doge/litecoin). The only thing Eth has going for it is it's contracts(which I admit, are useful), that's it. End of story.

So far, I prefer bitcoin as bitcoin has been tested by time and it is under active development. But things may change.
manselr
Hero Member
*****
Offline Offline

Activity: 840

ICO appreciator


View Profile
June 08, 2016, 06:24:06 PM
 #1065



Did anyone see this?
https://www.reddit.com/r/Bitcoin/comments/4n4vld/counterparty_announces_ethereum_virtual_machine/

As Bitcoin continues to rise and tech gets better and features added, ETH will be hardpressed to justify the existence of their token to be honest.
MyownBoss
Full Member
***
Offline Offline

Activity: 126


View Profile
June 08, 2016, 10:04:18 PM
 #1066

eth still scam?
Yes.

BTC is the experiment chasing the flawed central banks

ETH is the flawed experiment chasing THIS experiment

Iamnotback might argue THIS is already flawed and he might be right with some CORE.

BTC is the flawed experiment chasing the flawed central banks

ETH is the flawed experiement chasing THIS flawed experiment

in short...

Rule #1:Never lose money.
Rule #2: Never forget #
tromp
Hero Member
*****
Offline Offline

Activity: 505


View Profile
June 09, 2016, 07:09:40 AM
 #1067

Cuckoo Cycle insecure use of Siphash.

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.

Quote
I also expect that given Cuckoo Cycle is parallelizable

No need to expect. This is clearly stated in the white paper.

Quote
necessary to make Cuckoo Cycle's proving time reasonable for instant transactions

The white paper also explains that Cuckoo Cycle is not suitable for short block intervals,
and works best for intervals even longer than bitcoin's.
Instant transactions are best handled with off-chain payment channels,
reserving the main chain for settlement.

Quote
tromp thinks that Cuckoo Cycle has very low computation relative to memory latency. But this depends on only one 32-bit bucket being read in each memory page/row. With massive number of h/w threads, can get them all synced up (or sorted) so that we read 1024 of 32-bit buckets in one memory latency cycle

Cuckoo Cycle randomly accesses 2-bit counters, not 32-bit buckets.
In one round of edge trimming, Cuckoo Cycle typically updates 2^29 counters
spread over 2^8 memory banks. Each memory bank thus holds 2^21 counters,
only 2^14 of which typically fit in a single row. This is where the latency comes in.
To avoid latency, you could increase the number of memory banks by a factor of 2^7,
but this similarly increases die area, hardware costs, and energy usage.

Alternatively, the algorithm parameter PART_BITS allows for a reduction in the number
of counters in use at the same time, which is what your proposal essentially amounts to.
Setting this to 21 will require only 2^8 counters,
one per memory bank. But now your hash computations have increased by a factor 2^21,
over 2 million.

Both of these approaches are rather cost prohibitive. You might as well avoid DRAM latency
by using SRAM instead. But that is also two orders of magnitude more expensive, while the
performance increase is closer to one order of magnitude.

Cuckoo Cycle may not be ASIC-proof, but it certainly seems to resist...
iamnotback
Sr. Member
****
Offline Offline

Activity: 336



View Profile
June 09, 2016, 07:41:57 AM
 #1068

Cuckoo Cycle insecure use of Siphash.

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.

Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.

I think perhaps you have misunderstood Dan or (more unlikely given how smart he is) Dan has misunderstood something. It can happen.

I also expect that given Cuckoo Cycle is parallelizable

No need to expect. This is clearly stated in the white paper.

I presume based on your birthplace, that English is not your native language. Please don't ignore then word 'given'. The 'expect' is referring to the remainder of the statement that you elided.

necessary to make Cuckoo Cycle's proving time reasonable for instant transactions

The white paper also explains that Cuckoo Cycle is not suitable for short block intervals,
and works best for intervals even longer than bitcoin's.

Yes but you have interleaved other reasons for that, afair not mentioning the reason that I added. So I am just adding analysis.

By no means was I indicating your work is worthless. There is something very important that will come from this.

Instant transactions are best handled with off-chain payment channels, reserving the main chain for settlement.

Technically incorrect.

tromp thinks that Cuckoo Cycle has very low computation relative to memory latency. But this depends on only one 32-bit bucket being read in each memory page/row. With massive number of h/w threads, can get them all synced up (or sorted) so that we read 1024 of 32-bit buckets in one memory latency cycle

Cuckoo Cycle randomly accesses 2-bit counters, not 32-bit buckets.

Are you referring to the latest version of your code, because I am referring to the code that was in the Appendix of your December 31. 2014February 1, 2014 white paper which references int buckets.

Is this edge trimming coming from a suggestion from Dave Andersen?

In one round of edge trimming, Cuckoo Cycle typically updates 2^29 counters
spread over 2^8 memory banks. Each memory bank thus holds 2^21 counters,
only 2^14 of which typically fit in a single row. This is where the latency comes in.
To avoid latency, you could increase the number of memory banks by a factor of 2^7,
but this similarly increases die area, hardware costs, and energy usage.

Or increase the number of h/w threads to 2^15 (2^8 multiplied by 2^21 bits for 2-bit counters divided by 2^14 bits per row) which have some efficient h/w mechanism for waiting to be synchronized on a memory page/row.

Alternatively, the algorithm parameter PART_BITS allows for a reduction in the number
of counters in use at the same time, which is what your proposal essentially amounts to.
Setting this to 21 will require only 2^8 counters,
one per memory bank. But now your hash computations have increased by a factor 2^21,
over 2 million.

No that is not equivalent to increasing the number of h/w threads and syncing them to pause until 2^13 of them are queued up to read a shared memory page/row.

Both of these approaches are rather cost prohibitive. You might as well avoid DRAM latency
by using SRAM instead. But that is also two orders of magnitude more expensive, while the
performance increase is closer to one order of magnitude.

Btw, electrical power cost of SRAM is only one order-of-magnitude increase:

Magnus Moreau. Estimating the Energy Consumption of Emerging Random Access Memory Technologies

Cuckoo Cycle may not be ASIC-proof, but it certainly seems to resist...

So far, I disagree. I appreciate your discussion, because I want to make sure my analysis is correct. I also have to strike a balance between full publishing and secrecy, because unlike you, I have objectives around releasing a Bitcoin killer altcoin.
iamnotback
Sr. Member
****
Offline Offline

Activity: 336



View Profile
June 09, 2016, 10:04:09 AM
 #1069

The early days of Ethereum recounted by Charles:

https://www.youtube.com/watch?v=FIMk-A7L8S4#t=665
tromp
Hero Member
*****
Offline Offline

Activity: 505


View Profile
June 09, 2016, 01:03:26 PM
 #1070

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.
Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.
I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle.


Quote
Are you referring to the latest version of your code, because I am referring to the code that was in the Appendix of your December 31. 2014 white paper which references int buckets.

The bucketing in an earlier version was just something inessential to the algorithm that was
found to be beneficial to performance. Last year I replaced it by prefetching in commit
https://github.com/tromp/cuckoo/commit/5aef59659e77c599c730ece6a42a7a2597de80da
which turned out to be more beneficial.

Quote
Is this edge trimming coming from a suggestion from Dave Andersen?
Yes, implemented back in April 2014.

Quote
In one round of edge trimming, Cuckoo Cycle typically updates 2^29 counters
spread over 2^8 memory banks. Each memory bank thus holds 2^21 counters,
only 2^14 of which typically fit in a single row. This is where the latency comes in.
To avoid latency, you could increase the number of memory banks by a factor of 2^7,
but this similarly increases die area, hardware costs, and energy usage.
Or increase the number of h/w threads to 2^15 (2^8 multiplied by 2^21 bits for 2-bit counters divided by 2^14 bits per row) which have some efficient h/w mechanism for waiting to be synchronized on a memory page/row.

I don't see why you multiply with the number of rows that Cuckoo uses per memory bank.
Your 2^15 threads will each hash some nonce and then most will want to access a unique row.
Then what? There's little to be synchronized.

Quote
Alternatively, the algorithm parameter PART_BITS allows for a reduction in the number
of counters in use at the same time, which is what your proposal essentially amounts to.
Setting this to 21 will require only 2^8 counters,
one per memory bank. But now your hash computations have increased by a factor 2^21,
over 2 million.
No that is not equivalent to increasing the number of h/w threads and syncing them to pause until 2^13 of them are queued up to read a shared memory page/row.


In my code a thread is something that processes a sequence of nonces, hashing each one, and
then atomically updating a corresponding counter, without any coordination.

Are you proposing a thread for each nonce? Half a billion threads?
I really don't understand what you are proposing and how this syncing is supposed to be implemented.

Quote
Btw, electrical power cost of SRAM is only one order-of-magnitude increase:

I'm talking about the dollar cost of SRAM vs DRAM.
iamnotback
Sr. Member
****
Offline Offline

Activity: 336



View Profile
June 09, 2016, 01:31:29 PM
 #1071

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.
Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.
I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle.

Did he consider  (especially parallelized) multi-collision attacks and their impact on the potential to find an algorithm that can find cycles faster? Can you paraphrase what he said or the gist of his reasoning?

The basic issue is that can we guarantee that patterns will not leak through Siphash because the key is controllable by the attacker? By not using a general cryptographic hash function, there is not the setup overhead of randomizing the impact (or lack thereof) of a key, but I think this is very risky. If not necessary to take this risk, then why do it. I understand you think you need to minimize computation for ASIC resistance but I don't think that is the wise place to remove computation from the assurances on the Cuckoo Cycle PoW.

For example, look at the potential cryptanalysis break found just recently on break 256-bit random numbers by reducing the search space by eliminating certain patterns (although this is probably a hoax):

https://bitcointalk.org/index.php?topic=1504710.msg15138928#msg15138928

If the PoW function is broken, the entire block chain can be rewritten from the genesis block. This is not a trivial decision. We need large margin of error assurances.

Are you referring to the latest version of your code, because I am referring to the code that was in the Appendix of your December 31. 2014 white paper which references int buckets.

The bucketing in an earlier version was just something inessential to the algorithm that was
found to be beneficial to performance. Last year I replaced it by prefetching in commit
https://github.com/tromp/cuckoo/commit/5aef59659e77c599c730ece6a42a7a2597de80da
which turned out to be more beneficial.

I think you are referring to something entirely unrelated to what I was referring to. I see int here in the code I am referring to:

https://github.com/tromp/cuckoo/blob/gh-pages/cuckoo.c#L10

I am intentionally referring to the earlier version of your algorithm, to avoid all the source code noise implementing multi-threading. Also apparently you added some TMTO tradeoffs in the latter version of your code which I presume adds more complexity to the code.

Once we establish what we are referring to, then I can better respond to your other points. I will await first your reply.

Btw, electrical power cost of SRAM is only one order-of-magnitude increase:

I'm talking about the dollar cost of SRAM vs DRAM.

I know you were. And I thinking electrical power consumption is what matters most for mining and anti-DDoS economics.
tromp
Hero Member
*****
Offline Offline

Activity: 505


View Profile
June 09, 2016, 04:00:22 PM
 #1072

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.
Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.
I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle.

Did he consider  (especially parallelized) multi-collision attacks and their impact on the potential to find an algorithm that can find cycles faster? Can you paraphrase what he said or the gist of his reasoning?

Attacks on hash functions are mostly about doing tons of computation (but less than try all inputs)
to try invert its outputs. Siphash may not stand up to that. But to think that a weakness in inversion
resistance can make finding cycles of length 42 in a graph defined by millions or billions of siphash
outputs easier is just... utterly inconceivable. That's the gist of it.

Quote
I think you are referring to something entirely unrelated to what I was referring to. I see int here in the code I am referring to:

https://github.com/tromp/cuckoo/blob/gh-pages/cuckoo.c#L10

That's the basic algorithm, not the reference miner, which is in
https://github.com/tromp/cuckoo/blob/master/src/cuckoo_miner.h
for CPUs and
https://github.com/tromp/cuckoo/blob/master/src/cuda_miner.cu
for GPUs.
These spend most time doing edge trimming,
so that's what this discussion should focus on.
iamnotback
Sr. Member
****
Offline Offline

Activity: 336



View Profile
June 09, 2016, 05:33:30 PM
 #1073

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.
Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.
I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle.

Did he consider  (especially parallelized) multi-collision attacks and their impact on the potential to find an algorithm that can find cycles faster? Can you paraphrase what he said or the gist of his reasoning?

Attacks on hash functions are mostly about doing tons of computation (but less than try all inputs)
to try invert its outputs. Siphash may not stand up to that. But to think that a weakness in inversion
resistance can make finding cycles of length 42 in a graph defined by millions or billions of siphash
outputs easier is just... utterly inconceivable. That's the gist of it.

Please distinguish preimage attacks from multi-collision attacks. If we can find certain characteristics in the hash function which are not perfectly random and are repeatable relationship with certain changes of bits in the key from hash-to-hash, then we can perhaps impact the cycling in the Cuckoo. Cycle forming or the way we optimize detecting them, may be impacting by certain patterning away from a perfect Random Oracle.

Think of a differential cryptanalysis where we look for the way a hash changes over differentials from one hash to the next. And also think of the parallelized rho attack. The weakness in Cuckcoo w.r.t. to hashing it is isn't doing one hash, but millions of them and so the dynamic randomness of correlation over hash differentials also has to be considered.

I am not comfortable with cheating margins of security on the hash function. Sorry. I think Zooko @ Zcash was correct to be concerned even though he didn't express what sort of attack might concern him. And remember he worked on the Blake2s cryptographic hash function but I understand he isn't a cryptographer per se (and neither am I). Crypto-currency is also about being able to trust the cryptography is strong and sound. I don't see the advantage of cheating on the hash function used in the Cuckoo Cycle, especially given I don't think you attain ASIC resistance any way, which was the entire motivation for replacing SHA256 with Siphash in order to lower the computation to memory consumption ratio.

You are apparently thinking of inversion and not about what matters. I doubt Dan really thought deeply about what he was commenting on off-the-cuff.

And Btw, you write about in our past discussions and also in your white paper (or blog) about 1/3 computation and 2/3 memory accesses, but this is execution time, not energy cost. The power cost of loading a row buffer on DRAM is very cheap because the circuits are much more dense and the page is localized. The cost of computation on a CPU is very costly because the circuits are spread out all over the chip and not as dense as 1 transistor per bit of DRAM, thus the parasitic capacitance and resistance of the interconnects is an order-of-magnitude or more greater. I think it is a losing strategy to try to say 1 hash per 1 memory access (meaning you expect 1 memory access to load a new memory row) is going to only result in a 1/3 speedup for the ASIC. And besides below I will explain how I think the assumption of 1 hash per memory row buffer load can I think be avoided by the ASIC running many h/w threads.

I think you are referring to something entirely unrelated to what I was referring to. I see int here in the code I am referring to:

https://github.com/tromp/cuckoo/blob/gh-pages/cuckoo.c#L10

That's the basic algorithm, not the reference miner, which is in
https://github.com/tromp/cuckoo/blob/master/src/cuckoo_miner.h
for CPUs and
https://github.com/tromp/cuckoo/blob/master/src/cuda_miner.cu
for GPUs.
These spend most time doing edge trimming,
so that's what this discussion should focus on.

I was referring to the "simple array in the basic algorithm" which was from the February 1, 2014 version of your white paper. I note you mentioned the edge trimming in the December 31, 2014 update.

I was avoiding studying the edge trimming code and was focusing on the "basic algorithm".

Any way, if the edge trimming is done after running enumerating all possible nonces, then it is not a heuristic and is an optimization to reduce memory consumption by eliminating all the edges which can't be in a cycle and then I presume you re-enumerate the nonces on the "basic algorithm" but use a hash table to represent a sparse array so you don't have to store all the buckets from the more naive version of the basic algorithm.

So it is reducing memory consumption and I presume eliminating the wasted code paths for walking leaf edges in the final basic algorithm.

Btw, you may not remember that AnonyMint or TPTB_need_war wrote to you a long time ago (and I think perhaps before Dave Andersen did) that you could compress and count with bit flags. So you should credit Shelby too, lol (but Dave wrote a much more detailed exposition). Do I need to go find that post on BCT where AnonyMint or TPTB_need_war told you that? I remember reading it again this week when I was doing my deeper research on Cuckoo Cycle.

In any case, it seems it doesn't change my theory about the lack of ASIC resistance. Rather it just reduces the size of the data being randomly read/write from 32-bits to 2-bits for the majority of running time.

Also you do note that the edge trimming is applicable for the case where many/most of the edges will leaf nodes, which is the case where M/N < 1/2. You may think that is acceptable, because your intended usage is apparently to run Cuckoo at that load. I note it for reasons I will not reveal now.  Wink

Okay so now we are on the same page, so I can proceed to reply to what you had written before...

In one round of edge trimming, Cuckoo Cycle typically updates 2^29 counters
spread over 2^8 memory banks. Each memory bank thus holds 2^21 counters,
only 2^14 of which typically fit in a single row. This is where the latency comes in.
To avoid latency, you could increase the number of memory banks by a factor of 2^7,
but this similarly increases die area, hardware costs, and energy usage.
Or increase the number of h/w threads to 2^15 (2^8 multiplied by 2^21 bits for 2-bit counters divided by 2^14 bits per row) which have some efficient h/w mechanism for waiting to be synchronized on a memory page/row.

I don't see why you multiply with the number of rows that Cuckoo uses per memory bank.
Your 2^15 threads will each hash some nonce and then most will want to access a unique row.
Then what? There's little to be synchronized.

The point is that if you have enough threads running, the likelihood is there are 2^12 threads wanting to read from the same row, thus your algorithm is no longer bounded on memory latency rather on memory bandwidth and moreover that there isn't 1 hash per memory row buffer load but instead 2^12 hashes, and thus actually your algorithm is computation bound and will be 1000s times more power efficient on the ASIC.

I suspect if the GPU is running at memory bandwidth, then it may be getting multiple hashes per row buffer load and thus more power efficient than the CPU with only 8 threads/cores. CPU h/w threads may not be coalesced to sync on row buffer loads either (and I am not sure if they are on the GPU).

Alternatively, the algorithm parameter PART_BITS allows for a reduction in the number
of counters in use at the same time, which is what your proposal essentially amounts to.
Setting this to 21 will require only 2^8 counters,
one per memory bank. But now your hash computations have increased by a factor 2^21,
over 2 million.
No that is not equivalent to increasing the number of h/w threads and syncing them to pause until 2^13 of them are queued up to read a shared memory page/row.


In my code a thread is something that processes a sequence of nonces, hashing each one, and
then atomically updating a corresponding counter, without any coordination.

Are you proposing a thread for each nonce? Half a billion threads?
I really don't understand what you are proposing and how this syncing is supposed to be implemented.

It doesn't matter that the thread proceeds to do something else after reading and updating the 2-bit count corresponding to a nonce. It only matters that enough h/w threads are running to enable syncing many reads on the same row buffer load without stalling all the threads and gridlock.

The ASIC will need some custom timing and syncing logic. I presume stalled threads (waiting for the other threads to coalesce on each row buffer load) don't cost much in terms of power consumption when implemented with custom circuitry on the ASIC.

I think you really didn't understand what Bill Cox was trying to explain.
finitemaz
Member
**
Offline Offline

Activity: 62


View Profile
June 09, 2016, 05:35:51 PM
 #1074

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.
Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.
I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle.

Did he consider  (especially parallelized) multi-collision attacks and their impact on the potential to find an algorithm that can find cycles faster? Can you paraphrase what he said or the gist of his reasoning?

Attacks on hash functions are mostly about doing tons of computation (but less than try all inputs)
to try invert its outputs. Siphash may not stand up to that. But to think that a weakness in inversion
resistance can make finding cycles of length 42 in a graph defined by millions or billions of siphash
outputs easier is just... utterly inconceivable. That's the gist of it.

Please distinguish preimage attacks from multi-collision attacks. If we can find certain characteristics in the hash function which are not perfectly random and are repeatable relationship with certain changes of bits in the key from hash-to-hash, then we can perhaps impact the cycling in the Cuckoo. Cycle forming or the way we optimize detecting them, may be impacting by certain patterning away from a perfect Random Oracle.

Think of a differential cryptanalysis where we look for the way a hash changes over differentials from one hash to the next. And also think of the parallelized rho attack. The weakness in Cuckcoo w.r.t. to hashing it is isn't doing one hash, but millions of them and so the dynamic randomness of correlation over hash differentials also has to be considered.

I am not comfortable with cheating margins of security on the hash function. Sorry. I think Zooko @ Zcash was correct to be concerned even though he didn't express what sort of attack might concern him. And remember he worked on the Blake2s cryptographic hash function but I understand he isn't a cryptographer per se (and neither am I). Crypto-currency is also about being able to trust the cryptography is strong and sound. I don't see the advantage of cheating on the hash function used in the Cuckoo Cycle, especially given I don't think you attain ASIC resistance any way, which was the entire motivation for replacing SHA256 with Siphash in order to lower the computation to memory consumption ratio.

You are apparently thinking of inversion and not about what matters. I doubt Dan really thought deeply about what he was commenting on off-the-cuff.

And Btw, you write about in our past discussions and also in your white paper (or blog) about 1/3 computation and 2/3 memory accesses, but this is execution time, not energy cost. The power cost of loading a row buffer on DRAM is very cheap because the circuits are much more dense and the page is localized. The cost of computation on a CPU is very costly because the circuits are spread out all over the chip and not as dense as 1 transistor per bit of DRAM. I think it is a losing strategy to try to say 1 hash per 1 memory access (meaning you expect 1 memory access to load a new memory row) is going to only result in a 1/3 speedup for the ASIC. And besides below I will explain how I think the assumption of 1 hash per memory row buffer load can I think be avoided by the ASIC running many h/w threads.

I think you are referring to something entirely unrelated to what I was referring to. I see int here in the code I am referring to:

https://github.com/tromp/cuckoo/blob/gh-pages/cuckoo.c#L10

That's the basic algorithm, not the reference miner, which is in
https://github.com/tromp/cuckoo/blob/master/src/cuckoo_miner.h
for CPUs and
https://github.com/tromp/cuckoo/blob/master/src/cuda_miner.cu
for GPUs.
These spend most time doing edge trimming,
so that's what this discussion should focus on.

I was referring to the "simple array in the basic algorithm" which was from the February 1, 2014 version of your white paper. I note you mentioned the edge trimming in the December 31, 2014 update.

I was avoiding studying the edge trimming code and was focusing on the "basic algorithm".

Any way, if the edge trimming is done after running enumerating all possible nonces, then it is not a heuristic and is an optimization to reduce memory consumption by eliminating all the edges which can't be in a cycle and then I presume you re-enumerate the nonces on the "basic algorithm" but use a hash table to represent a sparse array so you don't have to store all the buckets from the more naive version of the basic algorithm.

So it is reducing memory consumption and I presume eliminating the wasted code paths for walking leaf edges in the final basic algorithm.

Btw, you may not remember that AnonyMint or TPTB_need_war wrote to you a long time ago (and I think perhaps before Dave Andersen did) that you could compress and count with bit flags. So you should credit Shelby too, lol (but Dave wrote a much more detailed exposition). Do I need to go find that post on BCT where AnonyMint or TPTB_need_war told you that? I remember reading it again this week when I was doing my deeper research on Cuckoo Cycle.

In any case, it seems it doesn't change my theory about the lack of ASIC resistance. Rather it just reduces the size of the data being randomly read/write from 32-bits to 2-bits for the majority of running time.

Also you do note that the edge trimming is applicable for the case where many/most of the edges will leaf nodes, which is the case where M/N < 1/2. You may think that is acceptable, because your intended usage is apparently to run Cuckoo at that load. I note it for reasons I will not reveal now.  Wink

Okay so now we are on the same page, so I can proceed to reply to what you had written before...

In one round of edge trimming, Cuckoo Cycle typically updates 2^29 counters
spread over 2^8 memory banks. Each memory bank thus holds 2^21 counters,
only 2^14 of which typically fit in a single row. This is where the latency comes in.
To avoid latency, you could increase the number of memory banks by a factor of 2^7,
but this similarly increases die area, hardware costs, and energy usage.
Or increase the number of h/w threads to 2^15 (2^8 multiplied by 2^21 bits for 2-bit counters divided by 2^14 bits per row) which have some efficient h/w mechanism for waiting to be synchronized on a memory page/row.

The point is that if you have enough threads running, the likelihood is there are 2^12 threads wanting to read from the same row, thus your algorithm is no longer bounded on memory latency rather on memory bandwidth and moreover that there isn't 1 hash per memory row buffer load but instead 2^12 hashes, and thus actually your algorithm is computation bound and will be 1000s times more power efficient on the ASIC.

I don't see why you multiply with the number of rows that Cuckoo uses per memory bank.
Your 2^15 threads will each hash some nonce and then most will want to access a unique row.
Then what? There's little to be synchronized.

Alternatively, the algorithm parameter PART_BITS allows for a reduction in the number
of counters in use at the same time, which is what your proposal essentially amounts to.
Setting this to 21 will require only 2^8 counters,
one per memory bank. But now your hash computations have increased by a factor 2^21,
over 2 million.
No that is not equivalent to increasing the number of h/w threads and syncing them to pause until 2^13 of them are queued up to read a shared memory page/row.


In my code a thread is something that processes a sequence of nonces, hashing each one, and
then atomically updating a corresponding counter, without any coordination.

Are you proposing a thread for each nonce? Half a billion threads?
I really don't understand what you are proposing and how this syncing is supposed to be implemented.

It doesn't matter that the thread proceeds to do something else after read a nonce. It only matters that enough h/w threads are running to enable syncing many reads on the same row buffer load without stalling all the threads and gridlock.

The ASIC will need some custom timing and syncing logic.

I think you really didn't understand what Bill Cox was trying to explain.

Wasting time! I want to see the release! Wink
tromp
Hero Member
*****
Offline Offline

Activity: 505


View Profile
June 09, 2016, 06:47:12 PM
 #1075

Attacks on hash functions are mostly about doing tons of computation (but less than try all inputs)
to try invert its outputs. Siphash may not stand up to that. But to think that a weakness in inversion
resistance can make finding cycles of length 42 in a graph defined by millions or billions of siphash
outputs easier is just... utterly inconceivable. That's the gist of it.

Please distinguish preimage attacks from multi-collision attacks. If we can find certain characteristics in the hash function which are not perfectly random and are repeatable relationship with certain changes of bits in the key from hash-to-hash, then we can perhaps impact the cycling in the Cuckoo. Cycle forming or the way we optimize detecting them, may be impacting by certain patterning away from a perfect Random Oracle.
My comment applies similarly to multi-collision attacks. I believe detecting deviations
from perfect randomness to be not only way too expensive but more importantly to die out exponentially
fast as the cycle grows.

Quote
I am not comfortable with cheating margins of security on the hash function. Sorry.
I know you and Zooko disagree with me and Dan on this.
You feel such attacks are conceivable. I feel such attacks are quite inconceivable.
Clearly we're not going to convince each other. So let's just agree to disagree.
People are more than welcome to run Cuckoo Cycle with blake2b replacing siphash
if they feel that trade off is worthwhile. It's just not something I recommend.

Quote
Any way, if the edge trimming is done after running enumerating all possible nonces, then it is not a heuristic and is an optimization to reduce memory consumption by eliminating all the edges which can't be in a cycle and then I presume you re-enumerate the nonces on the "basic algorithm" but use a hash table to represent a sparse array so you don't have to store all the buckets from the more naive version of the basic algorithm.

The basic algorithm doesn't use buckets. Just a cuckoo hash table.
In the edge trimming case, only on the order of 1% of edges remains, and
that table is a little more complicated to translate the sparseness into memory savings.
But still no buckets.

Quote
Btw, you may not remember that AnonyMint or TPTB_need_war wrote to you a long time ago (and I think perhaps before Dave Andersen did) that you could compress and count with bit flags. So you should credit Shelby too, lol (but Dave wrote a much more detailed exposition). Do I need to go find that post on BCT where AnonyMint or TPTB_need_war told you that? I remember reading it again this week when I was doing my deeper research on Cuckoo Cycle.

Sure, a URL would help.

Quote
Also you do note that the edge trimming is applicable for the case where many/most of the edges will leaf nodes, which is the case where M/N < 1/2. You may think that is acceptable, because your intended usage is apparently to run Cuckoo at that load. I note it for reasons I will not reveal now.  Wink

A much earlier version used edge fractions lower than 1/2 as a difficulty control, but
I abandoned it since there a relationship between edge fraction and cycle probability is strongly
non-linear, making dynamic difficulty control practically impossible.

Quote
Okay so now we are on the same page, so I can proceed to reply to what you had written before...

Gonna visit the local chess club now. Will read and respond to that later...
iamnotback
Sr. Member
****
Offline Offline

Activity: 336



View Profile
June 09, 2016, 07:09:30 PM
 #1076

Attacks on hash functions are mostly about doing tons of computation (but less than try all inputs)
to try invert its outputs. Siphash may not stand up to that. But to think that a weakness in inversion
resistance can make finding cycles of length 42 in a graph defined by millions or billions of siphash
outputs easier is just... utterly inconceivable. That's the gist of it.

Please distinguish preimage attacks from multi-collision attacks. If we can find certain characteristics in the hash function which are not perfectly random and are repeatable relationship with certain changes of bits in the key from hash-to-hash, then we can perhaps impact the cycling in the Cuckoo. Cycle forming or the way we optimize detecting them, may be impacting by certain patterning away from a perfect Random Oracle.

My comment applies similarly to multi-collision attacks. I believe detecting deviations from perfect randomness to be not only way too expensive but more importantly to die out exponentially fast as the cycle grows.

Except if the cycle is growing in a patterned way and that is part of the algorithm of the attack, i.e. deviates from the algorithm you use but is still able to provide the cycles within the maximum nonce count (and does so with some advantage in performance obviously else there is no point to it).

I think it is naive for you to assume that a random walk becomes more convoluted as cycle length grows when the entire risk is about whether the uniform randomness could be subverted.

I have a very creative imagination, which is why I am often able to come up with things that others don't think of.

There are so many permutations of possibilities that you sweep away with a generalized notion of random walking which may not hold when the random oracle assumption is subverted.

It may be that there is some differential analysis or distinguisher (even just bias in even one of the bits between relative outputs to relative inputs between multiple hashes) that can be precomputed and amortized over all the hashes, because it is a table of differentials of the way the hash behaves over huge number of invocations where the spread between the enumeration of the space is smaller due to the large number of hashes computed. There are many different angles to analyse because we are not just considering one hash, but millions of them for one Cuckoo Cycle solution.

Of course I respect Dan Berstein as a much smarter cryptographer than I am, but unless I am reading that he has written, I am hesitant to be convinced he has considered all of what I am getting at just from your second hand summary of what he said verbally at a conference.

I am not comfortable with cheating margins of security on the hash function. Sorry.

I know you and Zooko disagree with me and Dan on this. You feel such attacks are conceivable. I feel such attacks are quite inconceivable. Clearly we're not going to convince each other. So let's just agree to disagree.
People are more than welcome to run Cuckoo Cycle with blake2b replacing siphash if they feel that trade off is worthwhile. It's just not something I recommend.

I am just arguing why not be safer than sorry when so much is at stake? Why would you recommend adding unnecessary risk even where you think you are omniscient and there is no risk?

Any way, if the edge trimming is done after running enumerating all possible nonces, then it is not a heuristic and is an optimization to reduce memory consumption by eliminating all the edges which can't be in a cycle and then I presume you re-enumerate the nonces on the "basic algorithm" but use a hash table to represent a sparse array so you don't have to store all the buckets from the more naive version of the basic algorithm.

The basic algorithm doesn't use buckets. Just a cuckoo hash table. In the edge trimming case, only on the order of 1% of edges remains, and that table is a little more complicated to translate the sparseness into memory savings. But still no buckets.

I don't see what you wrote that is different than what I wrote. You focus too literally on the use of the work 'buckets'. What I am saying is that the edge trimming eliminates most of the nodes from consideration and then you have a hash table representing a sparse array for the remainder of the nodes which aren't the pruned/trimmed leaf edges. Those remaining nodes were "buckets" in a regular array in the former untrimmed basic algorithm.

Btw, you may not remember that AnonyMint or TPTB_need_war wrote to you a long time ago (and I think perhaps before Dave Andersen did) that you could compress and count with bit flags. So you should credit Shelby too, lol (but Dave wrote a much more detailed exposition). Do I need to go find that post on BCT where AnonyMint or TPTB_need_war told you that? I remember reading it again this week when I was doing my deeper research on Cuckoo Cycle.

Sure, a URL would help.

Here it was:

https://bitcointalk.org/index.php?topic=1219023.msg13721473#msg13721473

Note that was quoting a document that AnonyMint wrote I believe in late 2013 or early 2014. But he didn't publish all of that document until January 2016. Note he had published excerpts of that document on BCT when Monero was first released and AnonyMint was having an argument with the guy who optimized Monero's hash function. So we could go correlate that if we wanted to prove it. Any way, I am fine with David getting the credit, as he developed the concept more than AnonyMint did.

Also you do note that the edge trimming is applicable for the case where many/most of the edges will leaf nodes, which is the case where M/N < 1/2. You may think that is acceptable, because your intended usage is apparently to run Cuckoo at that load. I note it for reasons I will not reveal now.  Wink

A much earlier version used edge fractions lower than 1/2 as a difficulty control, but I abandoned it since there a relationship between edge fraction and cycle probability is strongly non-linear, making dynamic difficulty control practically impossible.

Well my wink is about using maximum nonce counts, i.e. edge fraction M/N > 1/2.

Yeah I can see that dynamic difficulty control gets messed up if deviate too far from 1/2. But difficulty can also be controlled externally by hashing the result with SHA256, which you also noted in your paper.
tromp
Hero Member
*****
Offline Offline

Activity: 505


View Profile
June 09, 2016, 10:55:11 PM
 #1077

The point is that if you have enough threads running, the likelihood is there are 2^12 threads wanting to read from the same row, thus your algorithm is no longer bounded on memory latency rather on memory bandwidth and moreover that there isn't 1 hash per memory row buffer load but instead 2^12 hashes, and thus actually your algorithm is computation bound and will be 1000s times more power efficient on the ASIC.

I stated earlier
Quote
To avoid latency, you could increase the number of memory banks by a factor of 2^7,
but this similarly increases die area, hardware costs, and energy usage.

Your above proposal of simultaneously running millions of hardware threads and needing to sync them
incurs much worse increases in die area (to the point of not fitting on any single die),
hardware costs, and energy usage.
iamnotback
Sr. Member
****
Offline Offline

Activity: 336



View Profile
June 09, 2016, 11:18:37 PM
 #1078

The point is that if you have enough threads running, the likelihood is there are 2^12 threads wanting to read from the same row, thus your algorithm is no longer bounded on memory latency rather on memory bandwidth and moreover that there isn't 1 hash per memory row buffer load but instead 2^12 hashes, and thus actually your algorithm is computation bound and will be 1000s times more power efficient on the ASIC.

I stated earlier
Quote
To avoid latency, you could increase the number of memory banks by a factor of 2^7,
but this similarly increases die area, hardware costs, and energy usage.

Your above proposal of simultaneously running millions of hardware threads and needing to sync them
incurs much worse increases in die area (to the point of not fitting on any single die),
hardware costs, and energy usage.

My calculation was 2^15, that isn't a million.

Did you really design these threads already. I bet they can use a minimal number of transistors each by reusing some components for threads that are stalled.

Multiple CPUs can access the same memory. Perhaps the same could be done if multiple dies are needed.

Afaik, the hardware cost is rather insignificant relative to the relative power consumption efficiency cost when it comes to mining economics for ASIC farms.
tromp
Hero Member
*****
Offline Offline

Activity: 505


View Profile
June 10, 2016, 09:32:15 AM
 #1079

I am just arguing why not be safer than sorry when so much is at stake? Why would you recommend adding unnecessary risk even where you think you are omniscient and there is no risk?

I only need good statistical properties of the hash function, not cryptographic security.
In fact siphash-2-4 is already overkill for my purposes, with siphash-4-8 being
close to cryptographically secure.
Your attack is about as conceivable to me as P being equal to NP,
in which case there is no cryptographically secure hash function anyway.

Quote
What I am saying is that the edge trimming eliminates most of the nodes from consideration and then you have a hash table representing a sparse array for the remainder of the nodes which aren't the pruned/trimmed leaf edges.

Yes, totally correct. But in my view the term "bucket" denotes a fixed size unordered container,
and my current algorithms use no such thing.

Quote
Well my wink is about using maximum nonce counts, i.e. edge fraction M/N > 1/2.

My algorithms break down in that case, since the expected number of cycles becomes non-constant,
causing the basic algorithm to overlook an increasing fraction of them,
and edge-trimming fails altogether to remove a large fraction of edges.
tromp
Hero Member
*****
Offline Offline

Activity: 505


View Profile
June 10, 2016, 09:40:21 AM
 #1080

My calculation was 2^15, that isn't a million.

You need millions to find 2^12 of them wanting to access a single row.

With 2^15 threads you expect 1 access per row.
Pages: « 1 ... 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 [54] 55 56 57 58 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!