Bitcoin Forum
May 04, 2024, 07:23:59 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 »  All
  Print  
Author Topic: Cuckoo Cycle Speed Challenge; $2500 in bounties  (Read 6512 times)
tromp (OP)
Legendary
*
Offline Offline

Activity: 978
Merit: 1085


View Profile
July 24, 2014, 03:49:32 AM
Last edit: August 08, 2014, 08:51:57 PM by tromp
 #1

Cuckoo Cycle is the first graph-theoretic proof-of-work.

Proofs take the form of a length 42-cycle in a bipartite graph
with N nodes and N/2 edges, with N scalable from millions to billions and beyond.
This makes verification trivial: compute the 42*2 edge endpoints
with one initialising sha256 and 84 siphash24 computations, check that
each endpoint occurs twice, and that you come back to the
starting point only after traversing 42 edges.
A final sha256 can check whether the 42-cycle meets a difficulty target.

With siphash24 being a very lightweight hash function, this makes for
practically instant verification.

This is implemented in just 157 lines of C code (files cuckoo.h and cuckoo.c) at
https://github.com/tromp/cuckoo, where you can also find a whitepaper.

From this point of view, Cuckoo Cycle is a rather simple PoW.

Finding a 42-cycle, on the other hand, is far from trivial,
requiring considerable resources, and some luck
(for a given header, the odds of its graph having a 42-cycle are about 2.5%).

The algorithm implemented in cuckoo_miner.h runs in time linear in N.
(Note that running in sub-linear time is out of the question, as you could
only compute a fraction of all edges, and the odds of all 42 edges of a cycle
occurring in this fraction are astronomically small).
Memory-wise, it uses N/2 bits to maintain a subset of all edges (potential cycle edges)
and N additional bits (or N/2^k bits with corresponding slowdown)
to trim the subset in a series of edge trimming rounds.
Once the subset is small enough, an algorithm inspired by Cuckoo Hashing
is used to recognise all cycles, and recover those of the right length.

I'd like to claim that this implementation is a reasonably optimal Cuckoo miner,
and that trading off memory for running time, as implemented in tomato_miner.h,
incurs at least one order of magnitude extra slowdown.
I'd further like to claim that GPUs cannot achieve speed parity with server CPUs.

To that end, I offer the following bounties:

Speedup Bounty:

$500 for an open source implementation that finds 42-cycles twice as fast (disregarding memory use).

Linear Time-Memory Trade-Off Bounty:

$500 for an open source implementation that uses at most N/k bits while running up to 15*k times slower, for any k>=2.

Both of these bounties require N ranging over {2^28,2^30,2^32} and #threads ranging over {1,2,4,8},
and further assume a high-end Intel Core i7 or Xeon and recent gcc compiler with regular flags as in my Makefile.

GPU Speed Parity Bounty:

$1000 for an open source implementation for an AMD R9 280X or nVidia GeForce GTX 770 (or similar high-end GPU)
that is as fast as a high-end Intel Xeon running 16 threads. Again with N ranging over {2^28,2^30,2^32}.

Note that there is already a cuda_miner.cu, my attempted port of the edge trimming part
of cuckoo_miner.c to CUDA, but while it seems to run ok for medium graph sizes,
it crashes my computer at larger sizes, and I haven't had the chance to debug the issue yet
(I prefer to let my computer work on solving 8x8 connect-4:-)
Anyway, this could make a good starting point.

These bounties are to expire at the end of this year. They are admittedly modest in size, but then
claiming them might only require one or two insightful tweaks to my existing implementations.

I invite anyone who'd like to see my claims refuted to extend any of these bounties
with amounts of your crypto-currency of choice.

(I don't have any crypto-currencies to offer myself, but if need be, I might be able to convert
a bounty through a trusted 3rd party)

Happy bounty hunting!

-John
1714850639
Hero Member
*
Offline Offline

Posts: 1714850639

View Profile Personal Message (Offline)

Ignore
1714850639
Reply with quote  #2

1714850639
Report to moderator
1714850639
Hero Member
*
Offline Offline

Posts: 1714850639

View Profile Personal Message (Offline)

Ignore
1714850639
Reply with quote  #2

1714850639
Report to moderator
1714850639
Hero Member
*
Offline Offline

Posts: 1714850639

View Profile Personal Message (Offline)

Ignore
1714850639
Reply with quote  #2

1714850639
Report to moderator
The block chain is the main innovation of Bitcoin. It is the first distributed timestamping system.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714850639
Hero Member
*
Offline Offline

Posts: 1714850639

View Profile Personal Message (Offline)

Ignore
1714850639
Reply with quote  #2

1714850639
Report to moderator
1714850639
Hero Member
*
Offline Offline

Posts: 1714850639

View Profile Personal Message (Offline)

Ignore
1714850639
Reply with quote  #2

1714850639
Report to moderator
tromp (OP)
Legendary
*
Offline Offline

Activity: 978
Merit: 1085


View Profile
July 24, 2014, 01:32:46 PM
 #2

Speedup Bounty:
===========
$500 for an open source implementation that finds 42-cycles twice as fast (disregarding memory use).

I should add that the current default settings

  #define LOGNBUCKETS 8
  #define BUCKETSIZE 256

in cuckoo_miner.h are chosen to maximize performance on an Intel Xeon.
For running optimally on a Core i7, these may have to be lowered a bit...
dga
Hero Member
*****
Offline Offline

Activity: 737
Merit: 511


View Profile WWW
July 29, 2014, 02:59:47 PM
Last edit: July 29, 2014, 03:20:01 PM by dga
 #3

Could you specify the exact program / command line arguments that must be compared against for each of those categories?
And how you expect the time to be measured, etc., etc.?
What operating system will the test be conducted on?  What CPU(s) are acceptable for demonstrating the speedup?
Is a 2x speedup on some but not all CPUs adequate?

For GPUs, I'd suggest a way of putting it that allows a little more flexibility in the hardware.
Perhaps cycles / second / hardware dollar?

Also - am I correct that you've updated your algorithm to use the faster:

Code:
  2*nonce + {left or right side}  % HALFSIZE

where HALFSIZE is a power of two (expressed as a mask) instead of the previous %N0 and %N1 where N0 and N1 were not powers of two?

tromp (OP)
Legendary
*
Offline Offline

Activity: 978
Merit: 1085


View Profile
July 29, 2014, 04:03:58 PM
Last edit: July 30, 2014, 03:16:57 PM by tromp
 #4

Could you specify the exact program / command line arguments that must be compared against for each of those categories?
And how you expect the time to be measured, etc., etc.?

I will add relevant bounty Makefile targets that encompasses all the required runs, and reports total running time
for both my cuckoo_miner source and the alternate bounty_miner source.
Please allow me a day or so to update the repository.

Quote
What operating system will the test be conducted on?  What CPU(s) are acceptable for demonstrating the speedup?
Is a 2x speedup on some but not all CPUs adequate?

Tests are to be conducted on a common Linux distribution. Recent Intel Core i7 and Xeons are acceptable.
A 2x speedup on either of those is adequate.

Quote
For GPUs, I'd suggest a way of putting it that allows a little more flexibility in the hardware.
Perhaps cycles / second / hardware dollar?

I specified Xeon's because Core i7 doesn't offer that many threads. But those Xeons are rather prohibitively
priced, so I want to keep dollars out of that equation. However, we can consider your metric if a factor of 2x
or more is demonstrated for a GPU compared to a price/performance leading Core i7 running at 8 threads.

Quote
Also - am I correct that you've updated your algorithm to use the faster:
Code:
  2*nonce + {left or right side}  % HALFSIZE
where HALFSIZE is a power of two (expressed as a mask) instead of the previous %N0 and %N1 where N0 and N1 were not powers of two?

The main reason for that change is to avoid limiting Cuckoo Cycle to sizes of 2^32.
It is now well-defined for sizes up to 2^64.
I also wanted to get rid of the modulo operations which are not essential, cause some architectural bias,
slow down the critical path, and make the bipartite graph imperfectly balanced.
The goal is to make everything as simple and uniform as possible.
tromp (OP)
Legendary
*
Offline Offline

Activity: 978
Merit: 1085


View Profile
July 29, 2014, 04:26:57 PM
 #5

I specified Xeon's because Core i7 doesn't offer that many threads. But those Xeons are rather prohibitively
priced, so I want to keep dollars out of that equation. However, we can consider your metric if a factor of 2x
or more is demonstrated for a GPU compared to a price/performance leading Core i7 running at 8 threads.

This news story

http://www.kitguru.net/components/cpu/anton-shilov/intel-core-i7-5960x-haswell-e-de-lidded-interesting-discoveries-found/

suggests that 8+ core Core i7 CPUs are not far off.
Even with an expected initial price premium, they are bound to be way more perfomance/price competitive than Xeons.
tromp (OP)
Legendary
*
Offline Offline

Activity: 978
Merit: 1085


View Profile
July 30, 2014, 04:22:49 AM
 #6

Could you specify the exact program / command line arguments that must be compared against for each of those categories?
And how you expect the time to be measured, etc., etc.?

I will add relevant bounty Makefile targets that encompasses all the required runs, and reports total running time
for both my cuckoo_miner source and the alternate bounty_miner source.

The latest Makefile includes targets

Code:
bounty28:       cuckoo.h bounty_miner.h bounty_miner.cpp Makefile
        $(GPP) -o bounty28 -DSIZESHIFT=28 bounty_miner.cpp

bounty30:       cuckoo.h bounty_miner.h bounty_miner.cpp Makefile
        $(GPP) -o bounty30 -DSIZESHIFT=30 bounty_miner.cpp

bounty32:       cuckoo.h bounty_miner.h bounty_miner.cpp Makefile
        $(GPP) -o bounty32 -DSIZESHIFT=32 bounty_miner.cpp

cuda28: cuda_miner.cu Makefile
        nvcc -o cuda28 -DSIZESHIFT=28 -arch sm_20 cuda_miner.cu -lcrypto

cuda30: cuda_miner.cu Makefile
        nvcc -o cuda30 -DSIZESHIFT=30 -arch sm_20 cuda_miner.cu -lcrypto

cuda32: cuda_miner.cu Makefile
        nvcc -o cuda32 -DSIZESHIFT=32 -arch sm_20 cuda_miner.cu -lcrypto

cpubounty:      cuckoo28 bounty28 cuckoo30 bounty30 cuckoo32 bounty32 Makefile
        for c in 28 30 32; do for t in 1 2 4 8; do time for h in {0..9}; do ./cuckoo$$c -t $$t -h $$h; done; time for h in {0..9}; do ./bounty$$c -t $$t -h $$h; done;done; done

gpubounty:      cuckoo28 cuda28 cuckoo30 cuda30 cuckoo32 cuda32 Makefile
        for c in 28 30 32; do time for h in {0..9}; do ./cuckoo$$c -t 16 -h $$h; done; time for h in {0..9}; do ./cuda$$c -h $$h; done; done

Target cpubounty runs the required 3x4=12 time comparison tests for the first two bounties while target gpubounty runs the required 3 time comparison tests for the third bounty.

The output should indicate what cycles are found. Long cycles (length > 64) may be ignored. If your program doesn't find all short cycles, then the short cycle finding rate should be determined somehow and used to discount the speedup.
dga
Hero Member
*****
Offline Offline

Activity: 737
Merit: 511


View Profile WWW
July 30, 2014, 05:46:31 PM
 #7

Just an fyi:

Code:
unsigned char *SHA256(const unsigned char *d, size_t n,unsigned char *md) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;

And does not compile on Ubuntu 14.04.

I've put a quick hack patch for making it work under Linux -

http://www.cs.cmu.edu/~dga/crypto/cuckoo-linux.patch

but didn't test it again under macos. Smiley

tromp (OP)
Legendary
*
Offline Offline

Activity: 978
Merit: 1085


View Profile
July 30, 2014, 06:57:10 PM
 #8

Just an fyi:

Code:
unsigned char *SHA256(const unsigned char *d, size_t n,unsigned char *md) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;

Where do I have that code? I don't see that in my repository?!

Quote
And does not compile on Ubuntu 14.04.

I've put a quick hack patch for making it work under Linux -

http://www.cs.cmu.edu/~dga/crypto/cuckoo-linux.patch

but didn't test it again under macos. Smiley

I can compile fine on my SUSE Linux box. But of course, I'd like it to work on all Linux distributions. Before applying your patch though, I'd like to identify the exact issue you're having...
dga
Hero Member
*****
Offline Offline

Activity: 737
Merit: 511


View Profile WWW
July 30, 2014, 09:48:12 PM
 #9

Just an fyi:

Code:
unsigned char *SHA256(const unsigned char *d, size_t n,unsigned char *md) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;

Where do I have that code? I don't see that in my repository?!

Quote
And does not compile on Ubuntu 14.04.

I've put a quick hack patch for making it work under Linux -

http://www.cs.cmu.edu/~dga/crypto/cuckoo-linux.patch

but didn't test it again under macos. Smiley

I can compile fine on my SUSE Linux box. But of course, I'd like it to work on all Linux distributions. Before applying your patch though, I'd like to identify the exact issue you're having...

That SHA256 is defined in openssl/sha.h for MacOS - I wa smostly pointing out the deprecation.

For Ubuntu, I had to define a little macro to do the init/update/final.

Otherwise, I get a link error linking against SHA256.

I also updated the makefile to separate the LIBS, because the -l should appear *after* the .o files that reference them, and that was needed when I added -lssl for compiling on ubuntu.

tromp (OP)
Legendary
*
Offline Offline

Activity: 978
Merit: 1085


View Profile
July 30, 2014, 10:19:02 PM
 #10

That SHA256 is defined in openssl/sha.h for MacOS - I wa smostly pointing out the deprecation.

For Ubuntu, I had to define a little macro to do the init/update/final.

Otherwise, I get a link error linking against SHA256.

I also updated the makefile to separate the LIBS, because the -l should appear *after* the .o files that reference them, and that was needed when I added -lssl for compiling on ubuntu.

Thanks for elaborating, and for providing a ptach file.
I thought that all openssl/sha.h define SHA256(), but apparently not.

Patches applied to https://github.com/tromp/cuckoo ...
peter378
Member
**
Offline Offline

Activity: 86
Merit: 10


View Profile
July 30, 2014, 10:56:26 PM
 #11



This news story

http://www.kitguru.net/components/cpu/anton-shilov/intel-core-i7-5960x-haswell-e-de-lidded-interesting-discoveries-found/

suggests that 8+ core Core i7 CPUs are not far off.
Even with an expected initial price premium, they are bound to be way more perfomance/price competitive than Xeons.

They should be called Core i8 CPUs or Core i9 CPUs if they have more than 7 cores. Core i7 does not sound right for an 8+ core.
mtve
Newbie
*
Offline Offline

Activity: 9
Merit: 0


View Profile WWW
July 31, 2014, 10:20:58 AM
 #12

Just a wild idea - wouldn't it be easier to start one more altcoin and see market reaction?
fluffypony
Donator
Legendary
*
Offline Offline

Activity: 1274
Merit: 1060


GetMonero.org / MyMonero.com


View Profile WWW
July 31, 2014, 10:45:22 AM
 #13

Just a wild idea - wouldn't it be easier to start one more altcoin and see market reaction?

The market reaction is irrelevant, this thread revolves around deep technical tests.

itod
Legendary
*
Offline Offline

Activity: 1974
Merit: 1076


^ Will code for Bitcoins


View Profile
July 31, 2014, 11:44:30 AM
 #14

Just a wild idea - wouldn't it be easier to start one more altcoin and see market reaction?

The market reaction is irrelevant, this thread revolves around deep technical tests.

I agree with you that this is not about the "market", but you may have misunderstood the question. I believe he asked if launching the altcoin will bring far greater incentive than $2500 for skilled developers to work on this, since making the GPU miner while all others mine on CPU can get you reward a few orders of magnitude bigger than $2500. That way these technical test could be done much, much quicker. Not a bad idea.
dga
Hero Member
*****
Offline Offline

Activity: 737
Merit: 511


View Profile WWW
July 31, 2014, 11:55:13 AM
 #15

Just a wild idea - wouldn't it be easier to start one more altcoin and see market reaction?

The market reaction is irrelevant, this thread revolves around deep technical tests.

I agree with you that this is not about the "market", but you may have misunderstood the question. I believe he asked if launching the altcoin will bring far greater incentive than $2500 for skilled developers to work on this, since making the GPU miner while all others mine on CPU can get you reward a few orders of magnitude bigger than $2500. That way these technical test could be done much, much quicker. Not a bad idea.

It would.  It's what most other proof-of-works have done.

But John is doing it the ethical way by trying to have a firm foundation for it established in a way that doesn't let someone (ahem - like me) come along and make a huge profit first.  (And I don't think he wants to run a coin or have his name associated with yet-another-ripoff coin!)

That's the *right* way to do it if you believe that mining should be relatively equitable.

fluffypony
Donator
Legendary
*
Offline Offline

Activity: 1274
Merit: 1060


GetMonero.org / MyMonero.com


View Profile WWW
July 31, 2014, 11:56:32 AM
 #16

I agree with you that this is not about the "market", but you may have misunderstood the question. I believe he asked if launching the altcoin will bring far greater incentive than $2500 for skilled developers to work on this, since making the GPU miner while all others mine on CPU can get you reward a few orders of magnitude bigger than $2500. That way these technical test could be done much, much quicker. Not a bad idea.

The downside to that is that the GPU miner will be kept on the DL for weeks or even months until the coin has been bled dry. This approach is far more ethical, and it means that as and when an existing cryptocurrency adopts Cuckoo Cycle it will already have reasonably performant CPU and GPU miners.

itod
Legendary
*
Offline Offline

Activity: 1974
Merit: 1076


^ Will code for Bitcoins


View Profile
July 31, 2014, 12:23:47 PM
 #17

I agree with you that this is not about the "market", but you may have misunderstood the question. I believe he asked if launching the altcoin will bring far greater incentive than $2500 for skilled developers to work on this, since making the GPU miner while all others mine on CPU can get you reward a few orders of magnitude bigger than $2500. That way these technical test could be done much, much quicker. Not a bad idea.

It would.  It's what most other proof-of-works have done.

But John is doing it the ethical way by trying to have a firm foundation for it established in a way that doesn't let someone (ahem - like me) come along and make a huge profit first.  (And I don't think he wants to run a coin or have his name associated with yet-another-ripoff coin!)

That's the *right* way to do it if you believe that mining should be relatively equitable.

The downside to that is that the GPU miner will be kept on the DL for weeks or even months until the coin has been bled dry. This approach is far more ethical, and it means that as and when an existing cryptocurrency adopts Cuckoo Cycle it will already have reasonably performant CPU and GPU miners.

Don't want to troll this wonderful topic, so my last post just to clarify the point of view: If there's already a bounty on the bitcointalk.org, then this is not quite an "academic only" attempt. There's no harm in making the bounty much, much bigger, and there is no scam in launching such a coin with this background. I also believe that tromp is expecting this challenge to be unclaimed, then why not making it more explicit by involving the highest quality coders, which will not make a move for $2500?

If he doesn't want to mess with launching the coin, there are more than enough people who would like to help him doing it. I would help without expecting any monetary reward, simply because I believe this algo is the future and we really need the coin which is CPU only for cryptocurrencies to be widely adopted. Tromp just has to ask and the team around the coin would be gathered in no time.
fluffypony
Donator
Legendary
*
Offline Offline

Activity: 1274
Merit: 1060


GetMonero.org / MyMonero.com


View Profile WWW
July 31, 2014, 12:41:27 PM
 #18

Don't want to troll this wonderful topic, so my last post just to clarify the point of view: If there's already a bounty on the bitcointalk.org, then this is not quite an "academic only" attempt. There's no harm in making the bounty much, much bigger, and there is no scam in launching such a coin with this background. I also believe that tromp is expecting this challenge to be unclaimed, then why not making it more explicit by involving the highest quality coders, which will not make a move for $2500?

If he doesn't want to mess with launching the coin, there are more than enough people who would like to help him doing it. I would help without expecting any monetary reward, simply because I believe this algo is the future and we really need the coin which is CPU only for cryptocurrencies to be widely adopted. Tromp just has to ask and the team around the coin would be gathered in no time.

I think at this stage the bounty is more to refine Cuckoo Cycle at a working-PoC level (for want of a better term). If, thereafter, a cryptocurrency were to adopt it, you can bet they'd discuss it with John and would put paid-for resources on cryptanalysis and code optimisation of the PoW, which would negate the need to launch a separate cryptocurrency merely as a vehicle for this.

tromp (OP)
Legendary
*
Offline Offline

Activity: 978
Merit: 1085


View Profile
July 31, 2014, 02:29:26 PM
 #19

This thread in Bitcoin Development & Technical Discussion may also be of interest:

https://bitcointalk.org/index.php?topic=717267.0

For the record, I have no interest in developing my own coin. As a computer scientist, proof-of-work algorithms interest me, and Cuckoo Cycle has become one of my hobbies. Beyond the proof-of-work, crypto currencies have too many complexities (e.g. key management, networking issues) that I don't want to have to deal with.

Cuckoo Cycle will be adopted eventually. I suggested that the Ethereum guys take a serious look at it. Let's see what they come up with and how it compares...
dga
Hero Member
*****
Offline Offline

Activity: 737
Merit: 511


View Profile WWW
July 31, 2014, 02:48:36 PM
 #20

This thread in Bitcoin Development & Technical Discussion may also be of interest:

https://bitcointalk.org/index.php?topic=717267.0

For the record, I have no interest in developing my own coin. As a computer scientist, proof-of-work algorithms interest me, and Cuckoo Cycle has become one of my hobbies. Beyond the proof-of-work, crypto currencies have too many complexities (e.g. key management, networking issues) that I don't want to have to deal with.

Cuckoo Cycle will be adopted eventually. I suggested that the Ethereum guys take a serious look at it. Let's see what they come up with and how it compares...

Hi, John -

One more follow-up.  I posted in my public review that, beyond the speedups I suggested to you, I was somewhat nervous about algorithmic speedups to the problem still being possible, and I can't shake that nervousness yet.  (But I haven't been able to exploit it either, but my failure to do so doesn't mean it's not possible, just that it's not immediately obvious).

One thing I'm curious about: your current union-find solution seems to echo that of Monien.  Before wasting my time thinking about it more if you've already done the work, did you attempt to apply Yuster & Zwick's algorithm (in "Finding Even Cycles Even Faster")?  The only difference is a factor of inverse-of-ackerman's in (V), but it's mostly intriguing because it's simpler from a data structure perspective, which might lead to a faster real-world implementation.  Or not. Smiley

My nervousness remains because of the unique structure of your graphs - as you note, they're like GNMp graphs - and it's not clear that there are specific tricks that can be used here.  I'm a little less worried about this than I was, but I've still got this concern that the algorithm could be vulnerable to non-breakthrough-level theory developments (i.e., things easier than breaking discrete logs. :-).  Still trying to wrap my head around a good formulation of it, and people shouldn't take me too seriously on this, because it's a tickling hunch, not a "gosh, I'm sure" kind of feeling.

I still compare it in my head to Momentum, where there's less structure sitting around, and therefore I'm more confident in the analysis, which I view as a good property in a PoW.  But we know that Momentum, even if it were modified to use SIPhash, basically turns into a DRAM bandwidth problem.  That's not bad from an ASIC standpoint, but it is comparatively GPU-friendly.

Pages: [1] 2 3 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!