Bitcoin Forum
May 14, 2024, 12:32:02 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Cuckoo Cycle proof-of-work: $5000 in bounties  (Read 1775 times)
tromp (OP)
Legendary
*
Offline Offline

Activity: 978
Merit: 1087


View Profile
November 09, 2016, 08:52:37 PM
Merited by ABCbits (1)
 #1

Cuckoo Cycle is an instantly verifiable memory bound Proof-of-Work,
whose run time is dominated by memory latency.

I invite anyone to try claim one of the following bounties for performance doubling
of my C++/CUDA solvers:

CPU   $2000
TMTO $2000
GPU   $1000

or half the above bounties for a sqrt(2) performance improvement. Details at

https://github.com/tromp/cuckoo

Happy bounty hunting...
1715646722
Hero Member
*
Offline Offline

Posts: 1715646722

View Profile Personal Message (Offline)

Ignore
1715646722
Reply with quote  #2

1715646722
Report to moderator
1715646722
Hero Member
*
Offline Offline

Posts: 1715646722

View Profile Personal Message (Offline)

Ignore
1715646722
Reply with quote  #2

1715646722
Report to moderator
1715646722
Hero Member
*
Offline Offline

Posts: 1715646722

View Profile Personal Message (Offline)

Ignore
1715646722
Reply with quote  #2

1715646722
Report to moderator
In order to get the maximum amount of activity points possible, you just need to post once per day on average. Skipping days is OK as long as you maintain the average.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
tromp (OP)
Legendary
*
Offline Offline

Activity: 978
Merit: 1087


View Profile
March 03, 2017, 09:31:04 PM
 #2

Since the bounties are (partially) backed by bitcoin in the Cuckoo Cycle Bounty Fund,
the recent boost in bitcoin value allows me to increase the bounties by 50%:

performance improvement   4x           2x           sqrt(2)x
CPU                                    $6000     $3000     $1500
TMTO                                  $6000     $3000     $1500
GPU                                    $3000     $1500     $ 750

Happy bounty hunting...
AlexGR
Legendary
*
Offline Offline

Activity: 1708
Merit: 1049



View Profile
March 04, 2017, 03:09:55 PM
 #3

I tried to compile this but failed due to non avx2 use, so it did compile some of the files... so I run cuckoo28:

perf stat -d -d -d ./cuckoo28

Looking for 42-cycle on cuckoo28("",0) with 50% edges, 7 trims, 1 threads
Using 16MB edge and 32MB node memory, 1-way siphash, and 4-byte counters
initial load 3200%
round  1 partition loads U0 2022% V0  947%
round  2 partition loads U0  560% V0  373%
round  3 partition loads U0  267% V0  201%
round  4 partition loads U0  157% V0  126%
round  5 partition loads U0  103% V0   86%
round  6 partition loads U0   73% V0   63%
round  7 partition loads U0   55% V0   48%
nonce 0: 7 trims completed  final load 48%
   4-cycle found at 0:67%
0 total solutions

 Performance counter stats for './cuckoo28':

      79535.505140      task-clock (msec)         #    0.998 CPUs utilized          
             1,567      context-switches          #    0.020 K/sec                  
                 2      cpu-migrations            #    0.000 K/sec                  
             5,366      page-faults               #    0.067 K/sec                  
   146,892,319,199      cycles                    #    1.847 GHz                      (6.66%)
   104,304,503,009      instructions              #    0.71  insn per cycle          (13.35%)
     4,551,218,923      branches                  #   57.222 M/sec                    (13.35%)
       348,290,653      branch-misses             #    7.65% of all branches          (6.67%)
    16,992,792,470      L1-dcache-loads           #  213.650 M/sec                    (6.68%)
     1,043,305,676      L1-dcache-load-misses     #    6.14% of all L1-dcache hits    (6.67%)
     1,034,482,209      LLC-loads                 #   13.007 M/sec                    (6.67%)
       698,460,299      LLC-load-misses           #    0.95% of all LL-cache hits     (6.67%)
   145,872,608,950      L1-icache-loads           # 1834.056 M/sec                    (6.67%)
        11,774,481      L1-icache-load-misses                                         (6.66%)
    16,954,141,954      dTLB-loads                #  213.164 M/sec                    (6.67%)
       384,883,546      dTLB-load-misses          #    2.27% of all dTLB cache hits   (6.67%)
   104,269,293,009      iTLB-loads                # 1310.978 M/sec                    (13.34%)
            17,157      iTLB-load-misses          #    0.00% of all iTLB cache hits   (13.34%)
         9,380,542      L1-dcache-prefetches      #    0.118 M/sec                    (6.67%)
   <not supported>      L1-dcache-prefetch-misses                                  

      79.674461632 seconds time elapsed


(that's on a q8200 running @ 1.86 GHz). Ins/cycle are seemingly quite low - which is normal-ish for ram tasks, but still).

Further perf stats indicate a lot of overhead here (the cmpxchg stalls the jump?): http://imgur.com/a/zwEmP
(data obtained by running perf top while ./cuckoo28 was running).

Given that this was on a single thread, couldn't this be done with plain unlocked moves? Also, in multithreaded applications, would it be possible to gamble faster unlocked moves trading data accuracy in order to produce a faster version with possible data corruption?

Just some thoughts...
ArcCsch
Full Member
***
Offline Offline

Activity: 224
Merit: 117


▲ Portable backup power source for mining.


View Profile
March 05, 2017, 12:17:44 AM
 #4

the recent boost in bitcoin value allows me to increase the bounties by 50%:
This seems to imply you are holding your bounties in bitcoin.
Since this is targeted at the bitcoin community, I assume you are also paying them in bitcoin.
Why, then, do you advertise the bounty amounts in dollars instead of in bitcoins?

If you don't have sole and complete control over the private keys, you don't have any bitcoin!  Signature campaigns are OK, zero tolorance for spam!
1JGYXhfhPrkiHcpYkiuCoKpdycPhGCuswa
tromp (OP)
Legendary
*
Offline Offline

Activity: 978
Merit: 1087


View Profile
March 05, 2017, 03:26:51 AM
 #5

I tried to compile this but failed due to non avx2 use, so it did compile some of the files... so I run cuckoo28:

perf stat -d -d -d ./cuckoo28

Further perf stats indicate a lot of overhead here (the cmpxchg stalls the jump?): http://imgur.com/a/zwEmP
(data obtained by running perf top while ./cuckoo28 was running).

Given that this was on a single thread, couldn't this be done with plain unlocked moves? Also, in multithreaded applications, would it be possible to gamble faster unlocked moves trading data accuracy in order to produce a faster version with possible data corruption?

Indeed, for single threaded use (-t 1, which is the default) one should run the singlethreaded
./cuckoo28st instead. That one doesn't use atomic updates like the cmpxchg.

For multi-threaded use, increasing the number of threads also increases the likelihood of overlooking cycles
due to data-races. It's not clear though at how many threads the use of atomics becomes well-advised.
tromp (OP)
Legendary
*
Offline Offline

Activity: 978
Merit: 1087


View Profile
March 05, 2017, 03:30:36 AM
 #6

the recent boost in bitcoin value allows me to increase the bounties by 50%:
This seems to imply you are holding your bounties in bitcoin.
Since this is targeted at the bitcoin community, I assume you are also paying them in bitcoin.
Why, then, do you advertise the bounty amounts in dollars instead of in bitcoins?

Two reasons:
1) I want to limit my financial liability, and if people keep claiming bounties while bitcoin explodes in value, I might need to buy bitcoin at exorbitant prices.
2) I thought the dollar amounts sound more impressive:-)

But, on request, and while supplies last, I'd be happy to make a payout in bitcoin.
AlexGR
Legendary
*
Offline Offline

Activity: 1708
Merit: 1049



View Profile
March 20, 2017, 09:06:14 AM
 #7

I tried to compile this but failed due to non avx2 use, so it did compile some of the files... so I run cuckoo28:

perf stat -d -d -d ./cuckoo28

Further perf stats indicate a lot of overhead here (the cmpxchg stalls the jump?): http://imgur.com/a/zwEmP
(data obtained by running perf top while ./cuckoo28 was running).

Given that this was on a single thread, couldn't this be done with plain unlocked moves? Also, in multithreaded applications, would it be possible to gamble faster unlocked moves trading data accuracy in order to produce a faster version with possible data corruption?

Indeed, for single threaded use (-t 1, which is the default) one should run the singlethreaded
./cuckoo28st instead. That one doesn't use atomic updates like the cmpxchg.

For multi-threaded use, increasing the number of threads also increases the likelihood of overlooking cycles
due to data-races. It's not clear though at how many threads the use of atomics becomes well-advised.

Aha, ok, yes much better now. I haven't got an avx2 cpu so it breaks down when I try to compile everything and I had just some of the files to check out. Btw, I see that there are some versions with very large ram windows, which obviously exceed cache levels. There is a (v)movntdq(a) instruction (sse 4.1+ I think) that bypasses cache operations for faster mem writing if the data involved aren't going to be cached in any useful way. You might want to check whether it's useful for this case.
tromp (OP)
Legendary
*
Offline Offline

Activity: 978
Merit: 1087


View Profile
March 20, 2017, 02:04:30 PM
 #8

There is a (v)movntdq(a) instruction (sse 4.1+ I think) that bypasses cache operations for faster mem writing if the data involved aren't going to be cached in any useful way. You might want to check whether it's useful for this case.

I spend most time updating random bits, which I access at word (32 bit) granularity.
I tried using the corresponding intrinsic _mm_stream_si32, but it is a 45% slowdown...
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!