Bitcoin Forum
May 24, 2024, 07:42:34 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
  Home Help Search Login Register More  
  Show Posts
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 [40] 41 42 43 44 45 46 47 48 49 »
781  Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle Speed Challenge; $2500 in bounties on: July 31, 2014, 08:04:49 PM
"the complexity of determining a shortest cycle of even length" (I couldn't find a non-paywalled PDF, but could send you a copy if you want).  A key difference is the shortest aspect, however.  And asymptotically, the V^2 runtime is worse than the |V||E| runtime of the simpler BFS-based approach on the graph you choose for cuckoo.
(Note:  I wasn't suggesting it's better -- I mostly wanted to avoid chasing down the Yuster algorithm if you'd already evaluated it. ;-)

Please email me a copy of that Monien paper. Btw, my runtime is O(|E|) = O(|V|), not |V| |E|. I didn't check much graph theory literature because my setting is pretty specific:

  the graph is (pseudo-)random. 
  it is very sparse; in fact a forest, apart from a constant expected number of additional cycle inducing edges.
  it is only given implicitly, rather than some adjacency matrix or set of adjacency lists.

That's why the cuckoo hashing literature seemed more relevant, since these graph arise in that setting.

782  Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle Speed Challenge; $2500 in bounties on: July 31, 2014, 05:09:34 PM
This thread in Bitcoin Development & Technical Discussion may also be of interest:

https://bitcointalk.org/index.php?topic=717267.0

For the record, I have no interest in developing my own coin. As a computer scientist, proof-of-work algorithms interest me, and Cuckoo Cycle has become one of my hobbies. Beyond the proof-of-work, crypto currencies have too many complexities (e.g. key management, networking issues) that I don't want to have to deal with.

Cuckoo Cycle will be adopted eventually. I suggested that the Ethereum guys take a serious look at it. Let's see what they come up with and how it compares...

Hi, John -

One more follow-up.  I posted in my public review that, beyond the speedups I suggested to you, I was somewhat nervous about algorithmic speedups to the problem still being possible, and I can't shake that nervousness yet.  (But I haven't been able to exploit it either, but my failure to do so doesn't mean it's not possible, just that it's not immediately obvious).

Please stay nervous:-) As you noted in the past, losing my nervousness is one of my problems...

Quote
One thing I'm curious about: your current union-find solution seems to echo that of Monien.  Before wasting my time thinking about it more if you've already done the work, did you attempt to apply Yuster & Zwick's algorithm (in "Finding Even Cycles Even Faster")?  The only difference is a factor of inverse-of-ackerman's in (V), but it's mostly intriguing because it's simpler from a data structure perspective, which might lead to a faster real-world implementation.  Or not. Smiley

I thought their approaches are quite different. Can you point to a specific paper and page of Monien describing a union-find like algorithm? My paper has a paragraph on both the similarity to and crucial difference with union-find
which explains why I can't use path-compression (which leads to the Ackermann function). My algorithm also crucially depends on the edge to vertex ratio being at most one half.
But any speedup of this part is mostly irrelevant anyway, as the running time of Cuckoo Cycle is entirely dominated by the edge trimming phase.

Quote
My nervousness remains because of the unique structure of your graphs - as you note, they're like GNMp graphs - and it's not clear that there are specific tricks that can be used here.  I'm a little less worried about this than I was, but I've still got this concern that the algorithm could be vulnerable to non-breakthrough-level theory developments (i.e., things easier than breaking discrete logs. :-).  Still trying to wrap my head around a good formulation of it, and people shouldn't take me too seriously on this, because it's a tickling hunch, not a "gosh, I'm sure" kind of feeling.

I still compare it in my head to Momentum, where there's less structure sitting around, and therefore I'm more confident in the analysis, which I view as a good property in a PoW.  But we know that Momentum, even if it were modified to use SIPhash, basically turns into a DRAM bandwidth problem.  That's not bad from an ASIC standpoint, but it is comparatively GPU-friendly.

Momentum suffers from a having a linear time-memory trade-off though. It's essentially identical to Cuckoo Cycle with N=2^26 vertices, cycle length 2, and #edges = N rather than N/2. Note that the best known implementations (entirely or partly due to you) are also essentially identical. I believe that by generalizing cycle length, Cuckoo Cycle leaves less room for trade-offs, as witnessed by the fact that Momentum's linear tmto does not carry over...
783  Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle Speed Challenge; $2500 in bounties on: July 31, 2014, 02:29:26 PM
This thread in Bitcoin Development & Technical Discussion may also be of interest:

https://bitcointalk.org/index.php?topic=717267.0

For the record, I have no interest in developing my own coin. As a computer scientist, proof-of-work algorithms interest me, and Cuckoo Cycle has become one of my hobbies. Beyond the proof-of-work, crypto currencies have too many complexities (e.g. key management, networking issues) that I don't want to have to deal with.

Cuckoo Cycle will be adopted eventually. I suggested that the Ethereum guys take a serious look at it. Let's see what they come up with and how it compares...
784  Bitcoin / Development & Technical Discussion / Re: Cuckoo Cycle revisited on: July 31, 2014, 02:13:03 PM
the bandwidth of a few dozen/hundred DRAM chips (are these the ASICs you're looking for?)
No, these are.

For CRAM to be effective, it must be possible to partition the data-set over memory banks,
such that each bank can do independent local computation on the part of the data-set that it stores.
I don't see Cuckoo Cycle fitting into this mold, as any subarray of vertex degree counters needs access
to the entire bitvector of life edges in order to compute degrees correctly.
Can you elaborate on what it is that you would have each memory bank store and process?

Quote
Unfortunately, bounties don't seem to work, especially if you're looking to (dis)prove security with them— I think it sad, because your ideas are probably the least ill-advised in the memory-hard workfunction space.  (Though, seeing that you're now using siphash internally, I might worry some about your approximation-freseness...)

The bounties are to increase the likelihood of identifying opportunities for optimization that I may have overlooked. Of course, people finding such opportunities may decide to keep them secret in the hope of exploiting them later, but I'm hoping that at least some people will prefer to claim the bounty and help establish a more level playing field for when adoption does take place.

Quote
I say "least ill-advised" with no intended insult to your work: I'm concerned that the whole notion of memory hardness as a useful part of a work-function my be ill-advised:  If anyone is looking for a grant for research on the implications of how "memory hardness" moves costs from energy to die-area and may or may not create differential advantages for attackers or defenders due die-area related costs being amortized during long term operation in either the POW-blockchain model or the password strengthening KDF model, I can very likely find funding for that (esp for the KDF model).

It is becoming conventional wisdom that memory hard KDFs are "well advised", but the analysis in the scrypt paper ignores operating costs— I talked to Colin Percival, and it seems he didn't really have enough visibility into the hardware low levels to even begin to estimate energy impact.  Experience on modern process mass produced computing hardware (e.g. GPUs) seems to show power costs exceeding retail, much less manufacturing costs, in only a couple months of operation, so I'm personally pretty worried that memory hard functions go in the wrong direction, creating amortization advantages for attackers compared to functions whos large scale execution end up thermally bound— but it's an informal concern and I have no idea where the breakpoints are where this would start to strongly favor attacks (/centralization). At a minimum existing "cost" arguments for memory hard functions in some applications are ignoring a majority of the cost (energy), so more research here could be pretty interesting.

I think that economic consequences of large scale adoption of a memory-hard PoW are hard to predict. I am hoping that both many-core CPUs and FGPAs will become common features on desktop computers of the future, in which case many people will be able to mine with only a moderate efficiency handicap, which will probably never be the case for bitcoin.
785  Alternate cryptocurrencies / Marketplace (Altcoins) / Re: [SUSI] Looking for Sweet Daddies ♥ to Sponsor my Cryptocurrency Vision on: July 31, 2014, 02:11:03 AM
I am serious, this is not scam.

Funny how scams always feel compelled to state this...
786  Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle Speed Challenge; $2500 in bounties on: July 30, 2014, 10:19:02 PM
That SHA256 is defined in openssl/sha.h for MacOS - I wa smostly pointing out the deprecation.

For Ubuntu, I had to define a little macro to do the init/update/final.

Otherwise, I get a link error linking against SHA256.

I also updated the makefile to separate the LIBS, because the -l should appear *after* the .o files that reference them, and that was needed when I added -lssl for compiling on ubuntu.

Thanks for elaborating, and for providing a ptach file.
I thought that all openssl/sha.h define SHA256(), but apparently not.

Patches applied to https://github.com/tromp/cuckoo ...
787  Bitcoin / Development & Technical Discussion / Re: Cuckoo Cycle revisited on: July 30, 2014, 07:36:06 PM
Have you found a qualified ASIC expert with low-level hardware experience to evaluate it yet?

No; but that doesn't stop me from claiming that a good Cuckoo Cycle hardware setup might be an FPGA saturating (with random accesses) the bandwidth of a few dozen/hundred DRAM chips (are these the ASICs you're looking for?).
788  Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle Speed Challenge; $2500 in bounties on: July 30, 2014, 06:57:10 PM
Just an fyi:

Code:
unsigned char *SHA256(const unsigned char *d, size_t n,unsigned char *md) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;

Where do I have that code? I don't see that in my repository?!

Quote
And does not compile on Ubuntu 14.04.

I've put a quick hack patch for making it work under Linux -

http://www.cs.cmu.edu/~dga/crypto/cuckoo-linux.patch

but didn't test it again under macos. Smiley

I can compile fine on my SUSE Linux box. But of course, I'd like it to work on all Linux distributions. Before applying your patch though, I'd like to identify the exact issue you're having...
789  Bitcoin / Development & Technical Discussion / Cuckoo Cycle revisited on: July 30, 2014, 03:29:01 PM
This is about the graph theoretic proof-of-work algorithm that I originally proposed in January and which is still not in use in any crypto-currency. Please see

https://bitcointalk.org/index.php?topic=707879.0

for an introduction to Cuckoo Cycle and the bounties I am offering to support its claims of optimality, memory-hardness, and GPU resistance.
790  Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle Speed Challenge; $2500 in bounties on: July 30, 2014, 04:22:49 AM
Could you specify the exact program / command line arguments that must be compared against for each of those categories?
And how you expect the time to be measured, etc., etc.?

I will add relevant bounty Makefile targets that encompasses all the required runs, and reports total running time
for both my cuckoo_miner source and the alternate bounty_miner source.

The latest Makefile includes targets

Code:
bounty28:       cuckoo.h bounty_miner.h bounty_miner.cpp Makefile
        $(GPP) -o bounty28 -DSIZESHIFT=28 bounty_miner.cpp

bounty30:       cuckoo.h bounty_miner.h bounty_miner.cpp Makefile
        $(GPP) -o bounty30 -DSIZESHIFT=30 bounty_miner.cpp

bounty32:       cuckoo.h bounty_miner.h bounty_miner.cpp Makefile
        $(GPP) -o bounty32 -DSIZESHIFT=32 bounty_miner.cpp

cuda28: cuda_miner.cu Makefile
        nvcc -o cuda28 -DSIZESHIFT=28 -arch sm_20 cuda_miner.cu -lcrypto

cuda30: cuda_miner.cu Makefile
        nvcc -o cuda30 -DSIZESHIFT=30 -arch sm_20 cuda_miner.cu -lcrypto

cuda32: cuda_miner.cu Makefile
        nvcc -o cuda32 -DSIZESHIFT=32 -arch sm_20 cuda_miner.cu -lcrypto

cpubounty:      cuckoo28 bounty28 cuckoo30 bounty30 cuckoo32 bounty32 Makefile
        for c in 28 30 32; do for t in 1 2 4 8; do time for h in {0..9}; do ./cuckoo$$c -t $$t -h $$h; done; time for h in {0..9}; do ./bounty$$c -t $$t -h $$h; done;done; done

gpubounty:      cuckoo28 cuda28 cuckoo30 cuda30 cuckoo32 cuda32 Makefile
        for c in 28 30 32; do time for h in {0..9}; do ./cuckoo$$c -t 16 -h $$h; done; time for h in {0..9}; do ./cuda$$c -h $$h; done; done

Target cpubounty runs the required 3x4=12 time comparison tests for the first two bounties while target gpubounty runs the required 3 time comparison tests for the third bounty.

The output should indicate what cycles are found. Long cycles (length > 64) may be ignored. If your program doesn't find all short cycles, then the short cycle finding rate should be determined somehow and used to discount the speedup.
791  Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle Speed Challenge; $2500 in bounties on: July 29, 2014, 04:26:57 PM
I specified Xeon's because Core i7 doesn't offer that many threads. But those Xeons are rather prohibitively
priced, so I want to keep dollars out of that equation. However, we can consider your metric if a factor of 2x
or more is demonstrated for a GPU compared to a price/performance leading Core i7 running at 8 threads.

This news story

http://www.kitguru.net/components/cpu/anton-shilov/intel-core-i7-5960x-haswell-e-de-lidded-interesting-discoveries-found/

suggests that 8+ core Core i7 CPUs are not far off.
Even with an expected initial price premium, they are bound to be way more perfomance/price competitive than Xeons.
792  Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle Speed Challenge; $2500 in bounties on: July 29, 2014, 04:03:58 PM
Could you specify the exact program / command line arguments that must be compared against for each of those categories?
And how you expect the time to be measured, etc., etc.?

I will add relevant bounty Makefile targets that encompasses all the required runs, and reports total running time
for both my cuckoo_miner source and the alternate bounty_miner source.
Please allow me a day or so to update the repository.

Quote
What operating system will the test be conducted on?  What CPU(s) are acceptable for demonstrating the speedup?
Is a 2x speedup on some but not all CPUs adequate?

Tests are to be conducted on a common Linux distribution. Recent Intel Core i7 and Xeons are acceptable.
A 2x speedup on either of those is adequate.

Quote
For GPUs, I'd suggest a way of putting it that allows a little more flexibility in the hardware.
Perhaps cycles / second / hardware dollar?

I specified Xeon's because Core i7 doesn't offer that many threads. But those Xeons are rather prohibitively
priced, so I want to keep dollars out of that equation. However, we can consider your metric if a factor of 2x
or more is demonstrated for a GPU compared to a price/performance leading Core i7 running at 8 threads.

Quote
Also - am I correct that you've updated your algorithm to use the faster:
Code:
  2*nonce + {left or right side}  % HALFSIZE
where HALFSIZE is a power of two (expressed as a mask) instead of the previous %N0 and %N1 where N0 and N1 were not powers of two?

The main reason for that change is to avoid limiting Cuckoo Cycle to sizes of 2^32.
It is now well-defined for sizes up to 2^64.
I also wanted to get rid of the modulo operations which are not essential, cause some architectural bias,
slow down the critical path, and make the bipartite graph imperfectly balanced.
The goal is to make everything as simple and uniform as possible.
793  Alternate cryptocurrencies / Altcoin Discussion / Re: rpietila Altcoin Observer on: July 27, 2014, 10:30:44 PM
Zero transaction fees combined with a flexible blockchain structure that can support, e.g., arbitrary user data or extensibility are a recipe for letting people externalize their costs.

When you're consuming storage in a medium replicated 1000s of times around the world, paying a little rent is a good idea.  Otherwise, some clever schmuck will figure out how to store a copy of windows ME in the blockchain, and we'll never be rid of it.

In the presence of perpetual debasement, transaction fees could be optionally replaced by proof-of-work
where difficulty threshold is some function of the transaction amount (this function could be dynamically varied
based on recent block fill rates).

794  Alternate cryptocurrencies / Altcoin Discussion / Re: rpietila Altcoin Observer on: July 27, 2014, 03:37:34 PM
Note I don't expect tromp's Cuckoo hash to be GPU resistant, because it can be (even if sublinear) parallelized. I don't know if anyone has tested parallelizing it on more than the two dozen cores he tested?
I tested it for him on 32 cores. I don't know the results you would have to ask him (I just ran some code he gave me on an idle system).

I summarized the results in my private response on May 1, reproduced below.
Note that the version you tested is the older one that didn't include edge trimming.
From all my benchmarking, I notice that AMD opteron servers, as well as Intel Xeon
servers with quad and octa cpus, appear to saturate memory at under 32 threads.
Dual-cpu Xeons appear to have a superior (at least for Cuckoo Cycle) memory subsystem,
that allows them to perform well beyond 32 threads.


----------------------------------------------------------------------------
Thanks, smooth!

As I feared, Xeons perform much worse in octa-cpu configuration
than in dual-cpu. The speedup flattens out at about 16 threads.

If you run the speedup test on one of your dual-Xeon systems
with 16+ (hyper)threads, you'll see performance scaling much
better and presumably not flattening out,
like the green speedup30 curve in my paper at page 6 bottom left.

My favourite test target would be a system with dual
Xeon Ey-28(70/80/90) that can run 60 hyperthreads...

-John
795  Alternate cryptocurrencies / Altcoin Discussion / Re: rpietila Altcoin Observer on: July 26, 2014, 10:56:20 PM
I don't like any of the proof-of-work algorithms over Bitcoin's thus far (at least given what I think we know about Cuckoo hash thus far, i.e. seems to be highly parallelizable even if slightly sublinear thus I don't think it will keep GPUs at parity? It might have some role if the number of lightweight cores on mobile increases to some huge number).

What my Cuckoo Cycle benchmarking has shown so far is that 40 Xeon threads is not quite enough to saturate memory. But I imagine a few hundred will. An FPGA or ASIC will be able to generate the memory requests at a much faster rate using hardwired siphash24 computation, and so will hit the parallelization limit much earlier.

Because GPU memory is ill-suited for Cuckoo Cycle's random access to bitpairs in global memory,
which resist coalescing (1M consecutive accesses are on average 512 bytes apart on a large instance),
I expect the GPU to struggle to put hundreds of threads to use. That's why
I posted a $1000 bounty on the speed parity of GPUs and server CPUs for Cuckoo Cycle in
  https://bitcointalk.org/index.php?topic=707879.0
which is duplicated in the README at
  https://github.com/tromp/cuckoo

So you don't think my bounty is safe... care to have a go at it yourself?!
796  Alternate cryptocurrencies / Altcoin Discussion / Re: MemoryCoin 2.0 Proof Of Work on: July 26, 2014, 01:34:19 PM
Yes, big memory buffer needed to avoid recalculation of 64k blocks.

This makes PoW really memory bound.

It's not memory-hard though, as memory can trivially traded be traded off for computation time
(8192 times less memory with only 50x slowdown).
Which is a bit sad for a coin whose very name emphasizes memory:-(

Quote
CryptoNight PoW is somehow conceptually similar to MMC 2.0 PoW (besides of much smaller block size and different - more aggressive - data dependency construct).

CryptoNight is just hashcash with a hashfunction that bears some similarities to MMC2's verification
(note that the crucial overwriting of main memory blocks has no equivalent in MMC2).

MMC2 is not hashcash, but one of the few asymmetric PoWs (like Primecoin, Momentum, and Cuckoo Cycle) where proof attempts need to perform much more work than verification.

So overall, I'd say they're conceptually quite different.
797  Alternate cryptocurrencies / Altcoin Discussion / Re: MemoryCoin 2.0 Proof Of Work on: July 25, 2014, 08:29:13 PM
Ok, here's the latest modification to the PoW.

1. Generate 1GB PsuedoRandom data using SHA512
2. For each 64K block - Repeat 10 50 times
 2.1 Use the last 32bits as a pointer to another 64K block
 2.2 XOR the two 64K blocks together
 2.3 AES CBC encrypt the result using the last 256 bits as a key
3. Use the last 32bits%2^14 as the solution. If the solution==1968, block solved


If I understand this correctly, then the miner only needs to use 50*64KB of memory,
or 3.2MB, instead of the intended 1024MB.
Your next-block pointers create a random graph on 2^14 nodes with edges partitioned
into several cycles, each of which can be processed separately.

On each cycle, it only needs to maintain the block states of the last 50 starting positions.
So it SHA-generates the next block and updates each of those 50 states with it.
One of them has accumulated 50 updates and is checked and retired,
while another is started in its place, having only 1 update.
There's just 49 extra steps of SHA generation and updating when the cycle is completed
(easy to check with e.g. a bitmap of 2^14 bits).
According to random graph theory, the number of cycles is logarithmic in total number of nodes,
so this overhead is negligible.

In summary, the miner should generate blocks not in naive block index order, but
in next-block pointer order, and maintain 50 simultaneous block states, to drastically
reduce memory usage.

Hmm, looking at the actual implementation on github, I apparently misunderstood,
as the block-pointer depends on the updated state as well.

So the miner does need 1GB, at least to avoid having to regenerate each block 49 times on average
(as FPGA/ASICs would be expected to do).
798  Alternate cryptocurrencies / Altcoin Discussion / Re: MemoryCoin 2.0 Proof Of Work on: July 25, 2014, 05:46:27 PM
Ok, here's the latest modification to the PoW.

1. Generate 1GB PsuedoRandom data using SHA512
2. For each 64K block - Repeat 10 50 times
 2.1 Use the last 32bits as a pointer to another 64K block
 2.2 XOR the two 64K blocks together
 2.3 AES CBC encrypt the result using the last 256 bits as a key
3. Use the last 32bits%2^14 as the solution. If the solution==1968, block solved


If I understand this correctly, then the miner only needs to use 50*64KB of memory,
or 3.2MB, instead of the intended 1024MB.
Your next-block pointers create a random graph on 2^14 nodes with edges partitioned
into several cycles, each of which can be processed separately.

On each cycle, it only needs to maintain the block states of the last 50 starting positions.
So it SHA-generates the next block and updates each of those 50 states with it.
One of them has accumulated 50 updates and is checked and retired,
while another is started in its place, having only 1 update.
There's just 49 extra steps of SHA generation and updating when the cycle is completed
(easy to check with e.g. a bitmap of 2^14 bits).
According to random graph theory, the number of cycles is logarithmic in total number of nodes,
so this overhead is negligible.

In summary, the miner should generate blocks not in naive block index order, but
in next-block pointer order, and maintain 50 simultaneous block states, to drastically
reduce memory usage.

-John
799  Alternate cryptocurrencies / Altcoin Discussion / Re: Cuckoo Cycle Speed Challenge; $2500 in bounties on: July 24, 2014, 01:32:46 PM
Speedup Bounty:
===========
$500 for an open source implementation that finds 42-cycles twice as fast (disregarding memory use).

I should add that the current default settings

  #define LOGNBUCKETS 8
  #define BUCKETSIZE 256

in cuckoo_miner.h are chosen to maximize performance on an Intel Xeon.
For running optimally on a Core i7, these may have to be lowered a bit...
800  Alternate cryptocurrencies / Altcoin Discussion / Re: rpietila Altcoin Observer on: July 24, 2014, 01:27:14 PM
I tried to find some hashing algorithms that would favor RISC processors, since they dominate modern smartphones, but I've failed to find any. If anybody has some info about such algos it would be very nice to share that info here.

Sounds like you may want to have a crack at the Cuckoo Cycle GPU Speed Parity challenge at https://bitcointalk.org/index.php?topic=707879.msg7996954#msg7996954

Among all PoWs, Cuckoo Cycle may be best suited to smartphones since it minimizes computation, and spends 2/3 of its running time on memory latency.
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 [40] 41 42 43 44 45 46 47 48 49 »
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!