Bitcoin Forum
November 10, 2024, 07:22:30 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 [6]  All
  Print  
Author Topic: MemoryCoin 2.0 Proof Of Work  (Read 21422 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
Evil-Knievel
Legendary
*
Offline Offline

Activity: 1260
Merit: 1168



View Profile
January 14, 2014, 09:00:07 PM
Last edit: April 17, 2016, 09:24:15 PM by Evil-Knievel
 #101

This message was too old and has been purged
rmmh
Newbie
*
Offline Offline

Activity: 1
Merit: 0


View Profile
February 03, 2014, 11:16:49 PM
 #102

All a MemoryCoin 2.0 ASIC requires is two SHA512 blocks going into an AES block-- if you can generate the pseudorandom data on the fly quickly enough, there's no need to store it in memory at all.
tromp
Legendary
*
Offline Offline

Activity: 990
Merit: 1110


View Profile
February 12, 2014, 03:33:24 AM
Last edit: February 12, 2014, 04:24:13 AM by tromp
 #103

With all the expertise on CPU oriented PoWs in this thread,
can anyone comment on my Cuckoo Cycle design, which is
focused entirely on main memory random access latency?

https://github.com/tromp/cuckoo has a whitepaper and implementation.

Feedback is most welcome.
AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
February 13, 2014, 03:28:41 AM
 #104

With all the expertise on CPU oriented PoWs in this thread,
can anyone comment on my Cuckoo Cycle design, which is
focused entirely on main memory random access latency?

https://github.com/tromp/cuckoo has a whitepaper and implementation.

Feedback is most welcome.

Too slow to use 16 GB to defeat botnets:

https://github.com/tromp/cuckoo

Quote
6) running time is under 24s/GB for the current implementation on high end x86.

I stopped there because it isn't important for me to see if the rest of your design makes sense, since botnet resistance is a critical requirement in my opinion. MemoryCoin has none also.

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
tromp
Legendary
*
Offline Offline

Activity: 990
Merit: 1110


View Profile
February 13, 2014, 06:18:26 PM
Last edit: February 13, 2014, 06:33:11 PM by tromp
 #105

With all the expertise on CPU oriented PoWs in this thread,
can anyone comment on my Cuckoo Cycle design, which is
focused entirely on main memory random access latency?

https://github.com/tromp/cuckoo has a whitepaper and implementation.

Feedback is most welcome.

Too slow to use 16 GB to defeat botnets:

https://github.com/tromp/cuckoo

Quote
6) running time is under 24s/GB for the current implementation on high end x86.

I stopped there because it isn't important for me to see if the rest of your design makes sense, since botnet resistance is a critical requirement in my opinion. MemoryCoin has none also.

Since yesterday, the implementation allows for multi-threading, and the README now says:

6) running time for the current implementation on high end x86 is under 24s/GB single-threaded,
   and under 3s/GB for 12 threads.

So 16GB would take well under a minute on a server.

In any case, I disagree that 16GB is needed to defeat botnets. Most machines in a botnet
have less than 4GB of *unused* memory, and running a 4GB cuckoo would send them into
swap-hell, not only making them useless for mining, but also alerting their owner.
So Cuckoo Cycle might actually help to shrink botnets...
AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
February 15, 2014, 11:27:39 AM
 #106

I have the survey on botnets. They target Asian gaming machines (rich boys) that typically have 8 GB.

Validation has to be several orders-of-magnitude faster than finding a block, else anti-spam and anti-DDoS doesn't work.

8 minutes is way too slow. You are orders-of-magnitude from solving the cpu-only nut. I've already solved it.

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
tromp
Legendary
*
Offline Offline

Activity: 990
Merit: 1110


View Profile
February 15, 2014, 03:30:46 PM
 #107

I have the survey on botnets. They target Asian gaming machines (rich boys) that typically have 8 GB.

Validation has to be several orders-of-magnitude faster than finding a block, else anti-spam and anti-DDoS doesn't work.

8 minutes is way too slow. You are orders-of-magnitude from solving the cpu-only nut. I've already solved it.

Cuckoo Cycle lets you set whatever memory requirement you want.
If you think 4GB is too little, then just go for more.
Validation is instant in any case. Why do you bring up these point that were addressed already?

You haven't solved the major problem of your PoW yet.
AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
February 15, 2014, 07:57:50 PM
Last edit: February 15, 2014, 08:35:45 PM by AnonyMint
 #108

Validation is instant in any case. Why do you bring up these point that were addressed already?

I didn't know validation is instant. Nevertheless 8 seconds may be too slow for finding blocks if block times are reduced to a minute or less.

Do you have a succinct pseudo-code description of your algorithm so I can analyze it quickly?

You haven't solved the major problem of your PoW yet.

You can't know that because you haven't seen mine.

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
tromp
Legendary
*
Offline Offline

Activity: 990
Merit: 1110


View Profile
February 15, 2014, 08:05:02 PM
 #109

Validation is instant in any case. Why do you bring up these point that were addressed already?

I didn't know validation is instant. Nevertheless 8 seconds may be too slow for finding blocks if block times are reduced to a minute or less.

Do you have a succinct pseudo-code description of your algorithm so I can analyze it quickly?

You haven't solved the major problem of your PoW yet.

You can't know that because you haven't seen mine.

Cuckoo is not suitable for very short block intervals (unless you shrink the memory requirement
below 1GB, but then you lose botnet resistance).
For a description of the algorithm, see the paper. It's not that long.

The problem of your PoW is the very fact that you have not published it...
AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
February 15, 2014, 08:34:34 PM
Last edit: February 15, 2014, 09:28:19 PM by AnonyMint
 #110

I see you haven't discovered how I made it fast. Thus Cuckoo can't support very low block times (with botnet resistance) and will increase variance.

Since Cuckoo is only main memory latency bound, someone could design a lower-cost, more efficient ASIC coupled with standard DRAM.

Thus I conclude yours is memory-coin, not cpu-only.

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
tromp
Legendary
*
Offline Offline

Activity: 990
Merit: 1110


View Profile
February 16, 2014, 12:54:02 AM
 #111

I see you haven't discovered how I made it fast. Thus Cuckoo can't support very low block times (with botnet resistance) and will increase variance.

Since Cuckoo is only main memory latency bound, someone could design a lower-cost, more efficient ASIC coupled with standard DRAM.

Thus I conclude yours is memory-coin, not cpu-only.

I feel the importance of low block interval times is overrated.

If a cheap multicore cpu like the http://hackaday.com/2012/09/28/massively-parallel-64-core-computer-costs-99/
can already saturate DRAM's latency, then there's not much point in developing an ASIC.

If you want to call Cuckoo Cycle a memory-only PoW rather than a CPU-only PoW,
that suits me fine.
AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
February 20, 2014, 01:22:48 AM
Last edit: February 20, 2014, 02:16:45 AM by AnonyMint
 #112

I see you haven't discovered how I made it fast. Thus Cuckoo can't support very low block times (with botnet resistance) and will increase variance.

Since Cuckoo is only main memory latency bound, someone could design a lower-cost, more efficient ASIC coupled with standard DRAM.

Thus I conclude yours is memory-coin, not cpu-only.

I feel the importance of low block interval times is overrated.

I disagree. I think it may be one of the most important features of superior altcoin if it is done correctly because it could help prevent off-chain fractional reserve banking as was failing in the 1800s (and thus centralization and right back to central banking again):

https://bitcointalk.org/index.php?topic=465474.msg5166632#msg5166632
https://bitcointalk.org/index.php?topic=465474.msg5182519#msg5182519

If a cheap multicore cpu like the http://hackaday.com/2012/09/28/massively-parallel-64-core-computer-costs-99/
can already saturate DRAM's latency, then there's not much point in developing an ASIC.

Good point. Smiley

Defeating those high-end server CPUs was another of my objectives.

If you want to call Cuckoo Cycle a memory-only PoW rather than a CPU-only PoW,
that suits me fine.

To be more specific, not a personal computer cpu-only. As you pointed out, the high-end server CPUs from Oracle, etc could be used.

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
tromp
Legendary
*
Offline Offline

Activity: 990
Merit: 1110


View Profile
February 20, 2014, 02:03:10 AM
 #113

Defeating those high-end server CPUs was another of my objectives (and I think I succeeded).

When can we expect a publication detailing your PoW?

If you want to call Cuckoo Cycle a memory-only PoW rather than a CPU-only PoW,
that suits me fine.

To be more specific, not a personal computer cpu-only. As you pointed out, the high-end server CPUs from Oracle, etc could be used.

But the latter is not as cost effective; it costs way more per thread and per GB of memory,
and thus gets handily beaten by a farm of pc's of the same total cost.
AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
February 20, 2014, 02:21:28 AM
Last edit: February 20, 2014, 02:37:44 AM by AnonyMint
 #114

Defeating those high-end server CPUs was another of my objectives (and I think I succeeded).

When can we expect a publication detailing your PoW?

I am trying to figure out which altcoin to give my algorithm to. I want it to have a first-mover advantage. Otherwise we end up with a sea of copycat coins and no focused challenger to Bitcoin.

If you want to call Cuckoo Cycle a memory-only PoW rather than a CPU-only PoW,
that suits me fine.

To be more specific, not a personal computer cpu-only. As you pointed out, the high-end server CPUs from Oracle, etc could be used.

But the latter is not as cost effective; it costs way more per thread and per GB of memory,
and thus gets handily beaten by a farm of pc's of the same total cost.

Granted Oracle's chip is not the most cost effective. I was too lazy to go research for the name (Tilera) of the lower cost per thread CPU.

Are you claiming that massively parallel CPUs cost more per thread than Intel and AMD CPUs for consumer-level PCs?

Doesn't the link you provided refute that, or the Tilera CPU that someone else mentioned.

Your algorithm is not computation intensive and is main memory DRAM latency bound, thus lower-powered threads suitable for server tasks would be a better fit than the very high-powered PC CPUs. For example, web-server or chat-server threads are typically not compute bound rather I/O bound. The Oracle chip is more high powered I think because it is designed for database threads.

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
Agamemnus
Full Member
***
Offline Offline

Activity: 267
Merit: 100



View Profile
May 28, 2014, 07:35:38 AM
 #115

I saw a whole lot of talk from AnthonyMint and a whole lack of any kind of proof. Just words, no good links, no algorithms of his own. Just a whole lot of crap-talk.

And I had a good giggle over it.

AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
July 19, 2014, 12:59:56 AM
 #116

In Upthread discussion, reorder claimed the GPU implementation was random access bound, and I claimed that with enough threads it would likely be AES computation bound instead.

I add now the point that I don't think he accounted for what percentage of the execution time is random access bound on the CPU?

Thus I am positing that as the number of the threads on the GPU increases, then the random access portion of the algorithm is actually detrimental to giving an advantage to the CPU.

Does this make sense?

And I had a good giggle over it.

That is why you are a nobody, because you don't comprehend and you laugh at those who do (or are in the process of) instead of participating in sharing and learning.

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
tromp
Legendary
*
Offline Offline

Activity: 990
Merit: 1110


View Profile
July 25, 2014, 05:46:27 PM
 #117

Ok, here's the latest modification to the PoW.

1. Generate 1GB PsuedoRandom data using SHA512
2. For each 64K block - Repeat 10 50 times
 2.1 Use the last 32bits as a pointer to another 64K block
 2.2 XOR the two 64K blocks together
 2.3 AES CBC encrypt the result using the last 256 bits as a key
3. Use the last 32bits%2^14 as the solution. If the solution==1968, block solved


If I understand this correctly, then the miner only needs to use 50*64KB of memory,
or 3.2MB, instead of the intended 1024MB.
Your next-block pointers create a random graph on 2^14 nodes with edges partitioned
into several cycles, each of which can be processed separately.

On each cycle, it only needs to maintain the block states of the last 50 starting positions.
So it SHA-generates the next block and updates each of those 50 states with it.
One of them has accumulated 50 updates and is checked and retired,
while another is started in its place, having only 1 update.
There's just 49 extra steps of SHA generation and updating when the cycle is completed
(easy to check with e.g. a bitmap of 2^14 bits).
According to random graph theory, the number of cycles is logarithmic in total number of nodes,
so this overhead is negligible.

In summary, the miner should generate blocks not in naive block index order, but
in next-block pointer order, and maintain 50 simultaneous block states, to drastically
reduce memory usage.

-John
tromp
Legendary
*
Offline Offline

Activity: 990
Merit: 1110


View Profile
July 25, 2014, 08:29:13 PM
 #118

Ok, here's the latest modification to the PoW.

1. Generate 1GB PsuedoRandom data using SHA512
2. For each 64K block - Repeat 10 50 times
 2.1 Use the last 32bits as a pointer to another 64K block
 2.2 XOR the two 64K blocks together
 2.3 AES CBC encrypt the result using the last 256 bits as a key
3. Use the last 32bits%2^14 as the solution. If the solution==1968, block solved


If I understand this correctly, then the miner only needs to use 50*64KB of memory,
or 3.2MB, instead of the intended 1024MB.
Your next-block pointers create a random graph on 2^14 nodes with edges partitioned
into several cycles, each of which can be processed separately.

On each cycle, it only needs to maintain the block states of the last 50 starting positions.
So it SHA-generates the next block and updates each of those 50 states with it.
One of them has accumulated 50 updates and is checked and retired,
while another is started in its place, having only 1 update.
There's just 49 extra steps of SHA generation and updating when the cycle is completed
(easy to check with e.g. a bitmap of 2^14 bits).
According to random graph theory, the number of cycles is logarithmic in total number of nodes,
so this overhead is negligible.

In summary, the miner should generate blocks not in naive block index order, but
in next-block pointer order, and maintain 50 simultaneous block states, to drastically
reduce memory usage.

Hmm, looking at the actual implementation on github, I apparently misunderstood,
as the block-pointer depends on the updated state as well.

So the miner does need 1GB, at least to avoid having to regenerate each block 49 times on average
(as FPGA/ASICs would be expected to do).
yvg1900
Member
**
Offline Offline

Activity: 66
Merit: 10

Dev of yam miner


View Profile
July 26, 2014, 04:37:11 AM
 #119

Yes, big memory buffer needed to avoid recalculation of 64k blocks.

This makes PoW really memory bound.

CryptoNight PoW is somehow conceptually similar to MMC 2.0 PoW (besides of much smaller block size and different - more aggressive - data dependency construct).


 

Follow @yvg1900 on Twitter for yam miner updates
tromp
Legendary
*
Offline Offline

Activity: 990
Merit: 1110


View Profile
July 26, 2014, 01:34:19 PM
 #120

Yes, big memory buffer needed to avoid recalculation of 64k blocks.

This makes PoW really memory bound.

It's not memory-hard though, as memory can trivially traded be traded off for computation time
(8192 times less memory with only 50x slowdown).
Which is a bit sad for a coin whose very name emphasizes memory:-(

Quote
CryptoNight PoW is somehow conceptually similar to MMC 2.0 PoW (besides of much smaller block size and different - more aggressive - data dependency construct).

CryptoNight is just hashcash with a hashfunction that bears some similarities to MMC2's verification
(note that the crucial overwriting of main memory blocks has no equivalent in MMC2).

MMC2 is not hashcash, but one of the few asymmetric PoWs (like Primecoin, Momentum, and Cuckoo Cycle) where proof attempts need to perform much more work than verification.

So overall, I'd say they're conceptually quite different.
Pages: « 1 2 3 4 5 [6]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!