Bitcoin Forum

Alternate cryptocurrencies => Altcoin Discussion => Topic started by: FreeTrade on December 03, 2013, 06:41:10 AM



Title: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 03, 2013, 06:41:10 AM
I'm replacing the Proof of Work in MemoryCoin for the new 2.0 version. There was some speculation that the MemoryCoin PoW might not be GPU resistant and the long verification time was causing numerous speed and stability issues.

I'm basing the new PoW on ByteMaster's original Momentum PoW and updated ideas,
http://bitsharestalk.org/index.php?topic=962.0

and with respect to Anonymint's concerns that memory latency rather than memory bandwidth was the real limiting factor.
https://bitcointalk.org/index.php?topic=325261.msg3520992#msg3520992


Here is a description of the new PoW (already implemented in code).

Step 1. Generate 512MB of PsuedoRandom data using SHA512 Scrypt
Step 2. XOR each 512K chunk against each other 512K chunk
Step 2.1 With the result of each XOR, treat the result as an array of 32 bit ints, XOR each 32bit int sequentially - the final result is the 'answer'
Step 2.2 If the answer is < X, it is a solution or match
Step 2.3 The locations of the two chunks, as well as the penultimate result of the 32bit XOR are attached to the block and a SHA256 is performed - this is the hash of the block.

The principles that should help in GPU and ASIC resistance are thus -
1. 512MB must be allocated to store the data. (It is much faster to read the data from memory than to generate it)
2. Each read from memory is a 512K chunk (so we're saturating the memory bandwidth rather than latency)
3. The operation is on two 512K chunks (this should take place in fast L2 cache, which is limited in supply)
4. Relatively fast verification (only 1MB of psuedorandom data needs to be generated to verify - 10 millisecs 60 millisecs )

Thanks to ByteMaster and AnonyMint for their contributions - there will be a premine in MemoryCoin 2.0 for some of the beta participants of the first coin, I'll include tips for you guys too.




Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Kazahstanec on December 03, 2013, 07:48:46 AM
And when at last start? ? ? About beta dough: I am ready to take in it part, there are some cars of a various configuration.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 03, 2013, 07:52:38 AM
You are making progress towards CPU only, but as far as I see you haven't defeated botnets. They are a serious problem. You can buy them for as low as $100 and including asian gaming machines that contain up to 8 - 16 GB of memory. Their owners prefer warez than to pay for software, thus are easy targets.

http://www.forbes.com/sites/eliseackerman/2012/05/19/i-run-a-small-botnet-and-sell-stolen-information-ask-me-anything/

Is your domain name for sale? IMO, would be more efficient for you to piggyback on a coin which has really solved the cpu only design quest. But I don't want to discourage you also.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 03, 2013, 07:56:46 AM
You are making progress towards CPU only, but as far as I see you haven't defeated botnets.

Thanks. I have not included any measures to defeat botnets. I wish there was, but I don't see there is any way to combat botnets that is compatible with a wide mining base.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 03, 2013, 07:59:33 AM
You are making progress towards CPU only, but as far as I see you haven't defeated botnets.

Thanks. I have not included any measures to defeat botnets. I wish there was, but I don't see there is any way to combat botnets that is compatible with a wide mining base.

There is a tradeoff for sure. Should we ask users to add 16 GB to their PC so coins are going to the owners of the capital instead of the hackers and botnets thieves?

Then the technical issue of designing a hash that can work at that memory scale.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Kazahstanec on December 03, 2013, 08:03:34 AM
16 GB? ? ? ? ? Then the coin will lose huge audience of users, and you which stands up for such restrictions remove profit and you will escape on other coin.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: superresistant on December 03, 2013, 08:03:57 AM
You are making progress towards CPU only, but as far as I see you haven't defeated botnets.
Thanks. I have not included any measures to defeat botnets. I wish there was, but I don't see there is any way to combat botnets that is compatible with a wide mining base.
There is a tradeoff for sure. Should we ask users to add 16 GB to their PC so coins are going to the users instead of the hackers and botnets?
Then the technical issue of designing a hash that can work at that memory scale.

Do you mean that putting a requirement of 16GB for mining will defeat botnets ?


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 03, 2013, 08:06:00 AM
You are making progress towards CPU only, but as far as I see you haven't defeated botnets.
Thanks. I have not included any measures to defeat botnets. I wish there was, but I don't see there is any way to combat botnets that is compatible with a wide mining base.
There is a tradeoff for sure. Should we ask users to add 16 GB to their PC so coins are going to the users instead of the hackers and botnets?
Then the technical issue of designing a hash that can work at that memory scale.

Do you mean that putting a requirement of 16GB for mining will defeat botnets ?

It won't defeat all bots, yet it will greatly diminish their scale.

16 GB? ? ? ? ? Then the coin will lose huge audience of users, and you which stands up for such restrictions remove profit and you will escape on other coin.

Profit scales to the investment required for the difficulty as set by the free market. Actually the more investment, the more the coin is worth. You can verify this by correlating mining hash rate growth and bitcoin's price.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: superresistant on December 03, 2013, 08:08:23 AM
16 GB? ? ? ? ? Then the coin will lose huge audience of users, and you which stands up for such restrictions remove profit and you will escape on other coin.

Yes but if a CPU coin have no botnets it is more valuable, so it could be seen as an investment for miners.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 03, 2013, 08:11:01 AM
A slow hash is a huge problem in terms of denial of service attacks on the mining nodes.

This is not an easily surmountable issue.

You won't solve it by thinking about a high-level algorithm for a few hours.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: superresistant on December 03, 2013, 08:12:43 AM
Hey wait...

MemoryCoin = the coin that require memory


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 03, 2013, 08:13:26 AM
Hey wait...

MemoryCoin = the coin that require memory

That is why I am asking him if his domain might be for sale at a high enough price, assuming he owns memorycoin.com and .org. And hopefully memcoin.* also.

Memory is generally available. It is an investment in an asset you use. The extra memory can be employed in a ramdisk when not mining to aid compiling speed and other tasks. ASICs not.

Edit: Memory is fungible, meaning you can sell/repurpose it in parts and separate from the PC.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Kazahstanec on December 03, 2013, 08:17:26 AM
In something you are right. But understand huge audience in the world (I think not less than 80% of cars) has less than 16 GB of memory.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 03, 2013, 08:21:03 AM
In something you are right. But understand huge audience in the world (I think not less than 80% of cars) has less than 16 GB of memory.

Do we want to create a socialistic coin where everything is equal (e.g. ObamaCare requires men to carry maternity insurance) or do you want a capitalist coin where people are rewarded for effort, iniative, insight, and correct calculations on expenditures?

Motivating people to improve their computers and differentiate from botnets can't be nearly as bad as motivating them to buy useless ASIC bricks. When your ASIC is outdated, you can't repurpose it, e.g. if the Bitcoin ponzi crashes (okay you might not agree with that ponzi assertion, yet ASICs might become outdated if you prefer to mine MemoryCoin, selling used goods is sometimes lossy and time hassle losses too).

Edit: The advantage of a general purpose computer is it is general purpose, so economically it is not fragile and is a replacement good for many things even unforeseen.

Hope my feedback was helpful. Buzz me if I forget to come back and missed something important. Thanks for helping me refine my thoughts on ASICs.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Kazahstanec on December 03, 2013, 08:27:40 AM
And you understand that the coin will be popular and to be in use only if it will be in use in the big mass of people?

Addition: Instead of in a limited narrow circle of mayner who will lift up complexity to heavens and will safely forget about it. (An example - TRC)


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 03, 2013, 08:31:57 AM
A slow hash is a huge problem in terms of denial of service attacks on the mining nodes.

The hash looks like being 0.01 seconds - maybe faster with some optimization. That should be sufficiently fast.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 03, 2013, 08:37:26 AM
Do we want to create a socialistic coin where everything is equal (e.g. ObamaCare requires men to carry maternity insurance) or do you want a capitalist coin where people are rewarded for effort, iniative, insight, and correct calculations on expenditures?

I primarily concerned with creating a wide distribution - the hope is to do that by making mining unprofitable as a business (because of the capital acquisition cost, or capital rental cost), but profitable where capital is already a sunk cost (the PC, laptop, game console etc) and alternative acquisition costs are high (banking, regulation etc).


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 03, 2013, 08:41:25 AM
That is why I am asking him if his domain might be for sale at a high enough price, assuming he owns memorycoin.com and .org. And hopefully memcoin.* also.

Just the .org - but not for sale.

It's an interesting idea regarding having very large memory requirements. It would certainly be better and more de-centralized than ASICs. For me though, it doesn't allow for the distribution I want.

I want to see news stories on TV about how simple it is for ordinary people to download software, switch it on, and start to earn coins.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Kazahstanec on December 03, 2013, 08:42:49 AM
But if the mining isn't profitable, the small group of enthusiasts like us will be engaged in it. But for a large number to the people purchase of accessories just for the hell of it to possess them - a utopia


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: bahamapascal on December 03, 2013, 11:36:05 AM
regarding the botnet,

is it possible to force the client to only mine when 50% or more of the CPU is used? As far as I know botnets are working in the background with little CPU use to stay undetected, so if the client can only mine when the CPU is used with more then 50% it could stop botnets I hope :)
Is that technical possible?


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 03, 2013, 12:23:22 PM
Not with open source :(, even with closed source, it would be tough to impose limitations like that.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Sharky444 on December 03, 2013, 12:44:37 PM
regarding the botnet,

is it possible to force the client to only mine when 50% or more of the CPU is used? As far as I know botnets are working in the background with little CPU use to stay undetected, so if the client can only mine when the CPU is used with more then 50% it could stop botnets I hope :)
Is that technical possible?

It's not possible, because an alternate miner would just use 100%.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: bahamapascal on December 03, 2013, 12:47:41 PM
Well OK :(
I guess there is a way to stop botnets, we just have to get the right Idea, but this one dose not seem to be the right one :D


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: superresistant on December 03, 2013, 02:10:45 PM
I like the RAM idea, anyone can afford some RAM contrary to a $10000 ASIC hardware for Bitcoin.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Stinky_Pete on December 03, 2013, 03:07:34 PM
I like the RAM idea too. It will rule out many of the PCs in a botnet - assuming that PC enthusiasts with 16GB are the sort of people who will notice that their machines are compromised. But it will also cut out many potential users, which is not a good thing. Perhaps (hastily checks own machines) 8GB is the right level?

It's a tricky one - to get mass adoption of the coins requires it to run on the most basic of machines, which are also those most likely to be in a botnet. Do machines in a botnet automatically send their mined coins to the same address?


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 03, 2013, 04:03:40 PM
You might consider how much DRAM is necessary to cause the user to notice his PC isn't performing correctly even with CPU usage scaled down to 50%. If his paged virtual memory in his games are now swapping to hard-disk, they may slow down considerably.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 03, 2013, 04:07:52 PM
A slow hash is a huge problem in terms of denial of service attacks on the mining nodes.

The hash looks like being 0.01 seconds - maybe faster with some optimization. That should be sufficiently fast.

That can only be true because you've eliminated what you thought I meant to eliminate and thus made the GPU faster. I said you were getting closer, meaning you are going to learn an important lesson.

Sorry I can't give away my algorithms sooner. They will soon be open sourced.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 03, 2013, 04:17:28 PM
That can only be true because you've eliminated what you thought I meant to eliminate and thus made the GPU faster. I said you were getting closer, meaning you are going to learn an important lesson.

Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Not following you. But a specific hash takes .01 seconds, but to perform related hashes in bulk on a CPU is more like .0006 seconds per hash . . a GPU can't just scale up because it is missing the memory bandwidth and L2 cache.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 03, 2013, 05:15:05 PM
That can only be true because you've eliminated what you thought I meant to eliminate and thus made the GPU faster. I said you were getting closer, meaning you are going to learn an important lesson.

Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Not following you. But a specific hash takes .01 seconds, but to perform related hashes in bulk on a CPU is more like .0006 seconds per hash . . a GPU can't just scale up because it is missing the memory bandwidth and L2 cache.

Top-of-the-line GPUs have nearly the same main memory bandwidth as L2 cache or within a factor of 2 or 3, e.g. 2012 model AMD Taihiti at 264 GB per second. Some latest GPUs may be 1 TB per second even nearly as fast as L1.

Worse as far as I can see your algorithm can be trivially parallel.

P.S. ASICs scale too well and result in centralization of mining:

http://www.kotaku.com.au/2013/11/bitcoin-mining-is-getting-out-of-control/


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 03, 2013, 06:31:10 PM
Top-of-the-line GPUs have nearly the same main memory bandwidth as L2 cache or within a factor of 2 or 3, e.g. 2012 model AMD Taihiti at 264 GB per second. Some latest GPUs may be 1 TB per second even nearly as fast as L1.

That's fine - as long as those GPUs are of a similar cost and/or have higher energy requirements than comparable CPUs. Even if the GPUs are 2 or 3 times more efficient than CPUs - the capital investment still precludes it from being a viable business, which is the aim.

Worse as far as I can see your algorithm can be trivially parallel.

The memory bus is the bottleneck - you can parallelize until you run out of bandwidth there.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: ludd on December 04, 2013, 07:02:05 AM
All my Core i7 are ready - just waiting a sign! :)


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 04, 2013, 08:58:14 AM
Top-of-the-line GPUs have nearly the same main memory bandwidth as L2 cache or within a factor of 2 or 3, e.g. 2012 model AMD Taihiti at 264 GB per second. Some latest GPUs may be 1 TB per second even nearly as fast as L1.

That's fine - as long as those GPUs are of a similar cost and/or have higher energy requirements than comparable CPUs. Even if the GPUs are 2 or 3 times more efficient than CPUs - the capital investment still precludes it from being a viable business, which is the aim.

However, the L2 are 256 KB on Intel so you need to adjust your 512 KB downwards.

Also AMD has no L2 and the L3 is significantly slower. Maybe you are not concerned about losing those who run AMD.

Worse as far as I can see your algorithm can be trivially parallel.

The memory bus is the bottleneck - you can parallelize until you run out of bandwidth there.

I see one definite problem that makes your assumption false and potentially a second problem, but if I tell you what they are then I will give away a lot of the work I have done to make a truly CPU-only proof-of-work.

CPU-only will always have a slow hash. There is no way around it.

I guess you will find out when you release this and it is attacked by GPUs (and botnets), and if I release my open source, then you can copy it (although you won't be able to because the slow hash won't work in your overall design). I don't want to give you first mover advantage by telling you now.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Sharky444 on December 04, 2013, 09:09:40 AM
Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Will you release the algo so that it could be used in Memorycoin, or will it be a separate coin?


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 04, 2013, 09:11:05 AM
Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Will you release the algo so that it could be used in Memorycoin, or will it be a separate coin?

Both. But I don't know if MemoryCoin can use it, because the hash is necessarily slow. This impacts on denial of service rejection. There must be a holistic design to deal with that I think. But I don't claim to be omniscient. I wish FreeTrade the best in all his endeavors.

And I hate to talk about my vaporware. FreeTrade invited me to comment on this thread. I wish I could help him more now. The best way for me to help him, is rush my release.

He has other features and ideas for his coin, so there is probably much room for differentiation. Let a 1000 flowers bloom. May the best be picked for a bouquet.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Sharky444 on December 04, 2013, 09:14:04 AM
However, the L2 are 256 MB on Intel so you need to adjust your 512 MB downwards.

L2 on Intel Core Prozessors is 1-2MB, not 256.

And I hate to talk about my vaporware. FreeTrade invited me to comment on this thread. I wish I could help him more now. The best way for me to help him, is rush my release.

Why don't you create a coin together with him? This will be best for the community.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 04, 2013, 09:15:23 AM
However, the L2 are 256 MB on Intel so you need to adjust your 512 MB downwards.

L2 on Intel Core Prozessors is 1-2MB, not 256.

Excuse me I meant 256 KB. I will correct my typo.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 04, 2013, 09:18:55 AM
However, the L2 are 256 MB on Intel so you need to adjust your 512 MB downwards.

L2 on Intel Core Prozessors is 1-2MB, not 256.

Excuse me I meant 256 KB. I will correct my typo.

Also as a separate problem I see which I will reveal, it appears he is hitting main memory bandwidth on the CPU not L2 bandwidth due to the 512 MB, which is 10 - 30 times slower than the GPU's main memory bandwidth.

It appears from the OP that he thinks he is staying within L2 by computing XORs in 512 KB chunks, but appears to me that he reads from the entire written 512 MB space thus there is no locality of cache.

I haven't verified any of this with his algorithm, so he would need to test to verify. I can only go by what I believe the description means it is doing. Perhaps my interpretation is incorrect.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: traderCJ on December 04, 2013, 09:36:56 AM
Sorry I can't give away my algorithms sooner. They will soon be open sourced.

Will you release the algo so that it could be used in Memorycoin, or will it be a separate coin?

Both. But I don't know if MemoryCoin can use it, because the hash is necessarily slow. This impacts on denial of service rejection. There must be a holistic design to deal with that I think. But I don't claim to be omniscient. I wish FreeTrade the best in all his endeavors.

And I hate to talk about my vaporware. FreeTrade invited me to comment on this thread. I wish I could help him more now. The best way for me to help him, is rush my release.

He has other features and ideas for his coin, so there is probably much room for differentiation. Let a 1000 flowers bloom. May the best be picked for a bouquet.

Looking forward to seeing what you're working on!


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Sharky444 on December 04, 2013, 10:02:34 AM
Also as a separate problem I see which I will reveal, it appears he is hitting main memory bandwidth on the CPU not L2 bandwidth due to the 512 MB, which is 10 - 30 times slower than the GPU's main memory bandwidth.

It appears from the OP that he thinks he is staying within L2 by computing XORs in 512 KB chunks, but appears to me that he reads from the entire written 512 MB space thus there is no locality of cache.

I haven't verified any of this with his algorithm, so he would need to test to verify. I can only go by what I believe the description means it is doing. Perhaps my interpretation is incorrect.

Yes, it will not be in L2 (especially since it's 256KB per core), but probably in L3. GPUs have a L2 Cache of 256-768KB, but usually no L3. The problem is GDDR5 bandwidth is probably as good as Intels L3 bandwidth, but latency with the L3 is much shorter.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 04, 2013, 10:42:26 AM
So - small update to the algorithm. Rather than use SHA512 to fill the psuedorandom data, I've decided to use Scrypt instead. This takes longer to generate, so should protect against the possibility of a GPU just generating the psuedorandom data and processing it as it needs it rather than storing it and fetching it from main memory.

Here's a comparison of Intel and AMD processors and includes measures of L2 and L3 cache -

http://en.wikipedia.org/wiki/Comparison_of_AMD_processors
http://en.wikipedia.org/wiki/Comparison_of_Intel_processors

I think the L2/L3 caches on newer and older processors are not directly comparable, and it'll be difficult to tell how efficient a given processor will be without testing it.

In order for a process to be efficient it'll need

1. Reasonably Fast access to 512MB memory - main memory
2. Very Fast access to 512KB memory  - L2/L3 cache memory

The first few processes on a GPU will have these, but run out of them in the same way a CPU does. The additional processing power the GPU won't help it, because it won't have data to operate on.




Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Sharky444 on December 04, 2013, 10:45:40 AM
Freetrade you forget that GPUs can have up to 300GB/s main memory bandwidth. So you get only a latency advantage with L3, not a bandwidth advantage. The data will not be in L2 at all, as you have only 256KB/core.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 04, 2013, 11:18:17 AM
Freetrade you forget that GPUs can have up to 300GB/s main memory bandwidth. So you get only a latency advantage with L3, not a bandwidth advantage. The data will not be in L2 at all, as you have only 256KB/core.

Wow, yes actually I hadn't realized the differential between newer GPUs and CPUs was so great. I'll need to reconsider.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 04, 2013, 07:10:13 PM
Freetrade you forget that GPUs can have up to 300GB/s main memory bandwidth. So you get only a latency advantage with L3, not a bandwidth advantage. The data will not be in L2 at all, as you have only 256KB/core.

Wow, yes actually I hadn't realized the differential between newer GPUs and CPUs was so great. I'll need to reconsider.

As far as I can see, Scrypt won't suffice. I think Perceival was mistaken (http://bitbin.it/E68HeKkM) when he wrote 512 MB would defeat the GPU, because the GPU has much faster main memory bandwidth. And latency can be masked on the GPU by running many threads. And computation is faster on the GPU by running many threads.

You could try to run a 4GB Scrypt to cut down on the number of threads the GPU can run, but it will be so slow (1 minute per hash (https://bitcointalk.org/index.php?topic=122256.msg1318485#msg1318485)) your denial of service rejection and pools likely won't work well. Also its still vulnerable because Scrypt requires your latency be significantly less than your BlockMix execution time, else "lookup gap" (cpuminer) can replace the memory with computation, and GPUs can accelerate the computation of BlockMix because Salsa20 is partially parallelizable.

Perceival thinks (http://bitbin.it/7bmKZqTx) Litecoin reduced the advantage of ASICs by a factor of 10.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Adamlm on December 04, 2013, 07:14:00 PM
TL;DR

I was mining first MemoryCoin - will I be able to keep the wallet from that client and convert all mined coins to 2.0 ?


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Etlase2 on December 04, 2013, 08:37:21 PM
Excuse me I meant 256 KB. I will correct my typo.

It wasn't a typo, there was a logic error there too, and this is the second time you've made this exact mistake. Curious for someone who seems to be very well versed in the subject matter.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: superresistant on December 04, 2013, 08:43:50 PM
TL;DR
I was mining first MemoryCoin - will I be able to keep the wallet from that client and convert all mined coins to 2.0 ?

No.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 05, 2013, 04:25:26 AM
Excuse me I meant 256 KB. I will correct my typo.

It wasn't a typo, there was a logic error there too, and this is the second time you've made this exact mistake. Curious for someone who seems to be very well versed in the subject matter.

Possibly. I've been sleepless lately, even just woke up with a headache and dizzy. Feel free to point it out if you want. I hadn't been working on the overall aspect of the CPU-only mining aspect lately (note there are 3 components to Scrypt the overall ROMix, the inner BlockMix, and the innermost Salsa20 choice), so I had to reload into mind the various issues, and I was simultaneously taking on a wide range of topics throughout the forums.

Note I challenged some rich folks on the forum to see if they can spend their money to develop a CPU-only independent of me. So perhaps someone will beat me to it.

There are also very wealthy Chinese investors lurking behind the scenes who want to buy into altcoin development, for example you see the $500,000 that was injected into bytemaster's corporation by Chinese investors. I heard his ProtoShares launch already has a market cap of $24 million but I did not verify.

We could possibly see a proliferation of altcoins soon. I am hoping the quality ones can still be distinguished from the chaff.

Etlase2 remember I told you in April or so that it was urgent and your altcoin needed to be finished within 2013 or mid-2014 at the latest. And you scoffed at me. Do I need to go quote that post for you? I hope you are progressing well and I wish you the best of course. I wish you would stop the occasional spiteful remarks. Let everything be decided on the merits of the code released. "Talk is cheap, show us the code." - Linus Torvalds.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 05, 2013, 08:17:08 AM
Freetrade you forget that GPUs can have up to 300GB/s main memory bandwidth. So you get only a latency advantage with L3, not a bandwidth advantage. The data will not be in L2 at all, as you have only 256KB/core.

Wow, yes actually I hadn't realized the differential between newer GPUs and CPUs was so great. I'll need to reconsider.


I'm coming to the conclusion that the only place (other than size of main memory) where CPUs can keep pace with GPUs is the L2 cache (either size or bandwidth of L2), but that the GPU can compensate by running more processes slowly from main memory. Maybe it is only possible to delay GPUs with sheer complexity - ala Quark.
 


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 05, 2013, 08:40:46 AM
Warning - offtopic and inflammatory posts will be removed. Please stay on topic.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: cryptrol on December 05, 2013, 10:01:01 AM

I'm coming to the conclusion that the only place (other than size of main memory) where CPUs can keep pace with GPUs is the L2 cache (either size or bandwidth of L2), but that the GPU can compensate by running more processes slowly from main memory. Maybe it is only possible to delay GPUs with sheer complexity - ala Quark.

I think the conclusion you are coming to is the right one.

IMHO trying to defeat GPU, Botnets or future ASICs is nonsense, and a waste of resources, it just can't be done. That's specially true for botnets for obvious reasons and in the end GPU's are just massively parrallell slow CPUs.

I would just focus on improving some of the well known PoW algorithms, most are just fine but can be fine tuned to get better results or more GPU resistance (you did it with scrypt).

Just my 2 cents.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 05, 2013, 11:11:41 AM
Maybe I can help you soon FreeTrade, or you can help me. Lets see when there is something tangible to evaluate. Feel free to delete if off-topic. I don't like to leave this hanging, but I better not speak more at this time.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: ludd on December 09, 2013, 06:40:28 PM
Any news about MEC 2.0?


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Sharky444 on December 10, 2013, 03:01:09 PM
I'm coming to the conclusion that the only place (other than size of main memory) where CPUs can keep pace with GPUs is the L2 cache (either size or bandwidth of L2), but that the GPU can compensate by running more processes slowly from main memory. Maybe it is only possible to delay GPUs with sheer complexity - ala Quark.
 

I came to the same conclusion after thinking about it for a week. You can only get an advantage from latency (the L2 bandwith does no matter much, as GPUs have also L2 for every core block, but a smaller size than CPU), which means you would need a shitload of different memory operations within L2 (I would make it about ~ 200 KB, not 256, to ensure that it really sticks to L2). GPU could compensate by running 200 threads at a much slower pace, but still at 10x the speed of the CPU overall. If you make the algo complex enough a GPU miner will still take at least 1 month to code, maybe 3-5, so CPU users get a head start.



Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 13, 2013, 08:18:09 AM
Ok, here's the latest modification to the PoW.

1. Generate 1GB PsuedoRandom data using SHA512
2. For each 64K block - Repeat 10 50 times
 2.1 Use the last 32bits as a pointer to another 64K block
 2.2 XOR the two 64K blocks together
 2.3 AES CBC encrypt the result using the last 256 bits as a key
3. Use the last 32bits%2^14 as the solution. If the solution==1968, block solved

Expect 1 solution per set.

This will offer a good level of GPU resistance for the following reasons -
1. Complexity - requires SHA512 hashing and AES CBC encryption
2. CPU instruction sets - many have specific AES instruction sets, GPU's don't
3. SHA512 - more efficient with 64 bit operations, but most GPUs are 32bit
4. Multiple AES encryption and XORing will keep the L2 cache busy, GPUs will be forced to use slower memory for massive parallelization.

I think more efficient GPU miners will be possible, but they should be delayed and not offer performance gains of more than 2X or 3X.

 


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Sharky444 on December 13, 2013, 08:23:30 AM
A dedicated CPU miner that uses hardware AES should be at least 10 times faster. I hope such a miner is released to the public soon after launch and not being kept private.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 13, 2013, 07:11:28 PM
A dedicated CPU miner that uses hardware AES should be at least 10 times faster. I hope such a miner is released to the public soon after launch and not being kept private.

Can you explain more about what you mean by hardware AES? Are you talking about the AES instructions built into CPUs? If so then I'm hoping these will be compiled into the QT client so we should have a pretty efficient miner there off the bat for any chips with the AES instruction sets - more details here -

http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni

and

http://en.wikipedia.org/wiki/AES_instruction_set#Supporting_CPUs


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Sharky444 on December 13, 2013, 07:31:13 PM
Yes, I did mean those.

p.s.

Freetrade please read the PM I've sent you yesterday.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 15, 2013, 12:47:09 PM
I didn't see this until just now.

3. Use the last 32bits%2^14 as the solution. If the solution==1968, block solved

How is the difficulty altered if the solution is a fixed value and not instead a variable value which the output of the hash must be less than?

Ok, here's the latest modification to the PoW.

1. Generate 1GB PsuedoRandom data using SHA512
2. For each 64K block - Repeat 10 50 times
 2.1 Use the last 32bits as a pointer to another 64K block
 2.2 XOR the two 64K blocks together
 2.3 AES CBC encrypt the result using the last 256 bits as a key
3. Use the last 32bits%2^14 as the solution. If the solution==1968, block solved

Expect 1 solution per set.

This will offer a good level of GPU resistance for the following reasons -
1. Complexity - requires SHA512 hashing and AES CBC encryption
2. CPU instruction sets - many have specific AES instruction sets, GPU's don't
3. SHA512 - more efficient with 64 bit operations, but most GPUs are 32bit
4. Multiple AES encryption and XORing will keep the L2 cache busy, GPUs will be forced to use slower memory for massive parallelization.

I think more efficient GPU miners will be possible, but they should be delayed and not offer performance gains of more than 2X or 3X.

Even Haswell has 10X less FLOPS than top-of-the-line GPUs. Memory latency will be masked away to 0, if the GPU can run enough parallel copies. Your algorithm is going to be hobbled relative to the GPU by the 10X slower bandwidth of 20 GB/s speed to write out the initial 1 GB.

The use of dedicated AES instructions on CPUs might help a little but I doubt enough to stop the GPU from being 10X faster, and  ASICs could implement AES to run faster.

The latency on L2 is still several cycles and the latency on the CPU will asymptotically go to 0, although you have that 1 GB to limit the number of copies the GPU can run, i.e. 6 parallel copies on a 6 GB GPU (which might be sufficient to eliminate most latency).

You might get the 3X faster expected result, but I am not confident of that.

How does this algorithm validate faster and with less memory? I think I know, but I don't want to say. I want to know what you came up with.

On the rough guesstimate (note I am quite sleepy at the moment), I can't see you've accomplished anything the Litecoin did not already except the extra DRAM requirement, i.e. probably 10X faster on GPU and vulnerable to ASICs (running with DRAM). Or did I miss something?

Note Litecoin ASICs are apparently in development now.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 15, 2013, 01:02:18 PM

How is the difficulty altered if the solution is a fixed value and not instead a variable value which the output of the hash must be less than?

So the solution is then SHA256'd and this value is the variable.

Even Haswell has 10X less FLOPS than top-of-the-line GPUs. Memory latency will be masked away to 0, if the GPU can run enough parallel copies. Your algorithm is going to be hobbled relative to the GPU by the 10X slower bandwidth of 20 GB/s speed to write out the initial 1 GB.

The 1GB write is a small fraction of the overall time required - the 50GB AES encryption is the lion's share of the hash.

The use of dedicated AES instructions on CPUs might help a little but I doubt enough to stop the GPU from being 10 - 100X faster, and  ASICs could implement AES to run faster.

The latency on L2 is still several cycles and the latency on the CPU will asymptotically go to 0, although you have that 1 GB to limit the number of copies the GPU can run, i.e. 6 parallel copies on a 6 GB GPU (which might be sufficient to eliminate most latency).

You might get the 3X faster expected result, but I am not confident of that.

So there are two possible bottlenecks - the memory-bus access and AES speed. A GPU miner will need to solve both. Agreed both bottlenecks can be addressed in a GPU miner - but with the lack of AES-NI, slower cores, and lack of L2 cache to match the cores, I'm hoping it'll be 2X to 3X max. And it should take some time too.


How does this algorithm validate faster and with less memory? I think I know, but I don't want to say. I want to know what you came up with.

The algorithm essentially looks for a pattern in the data. The validation is told where the pattern is, so only produces a fraction of the psuedo-random data, and doesn't need to search for the start location of the pattern.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 15, 2013, 01:20:49 PM
Even Haswell has 10X less FLOPS than top-of-the-line GPUs. Memory latency will be masked away to 0, if the GPU can run enough parallel copies. Your algorithm is going to be hobbled relative to the GPU by the 10X slower bandwidth of 20 GB/s speed to write out the initial 1 GB.

The 1GB write is a small fraction of the overall time required - the 50GB AES encryption is the lion's share of the hash.

So the hash rate is slower than 1 hash per second per CPU core?

All CPU cores share the same main memory bottleneck, so on an 8 core CPU the 1 GB per core is 8GB relative to the 50 GB.

Also the GPU can use its 10X greater FLOPS to likely remove the speed advantage of the AES instructions on the CPU. And this massive parallelization will likely eliminate the L2 advantage on memory latency, while (although I am not sure without studying in detail the code) cache memory bandwidth will not be the bottleneck rather computation of the AES.

I know you don't expect to beat the GPU, perhaps you are only hoping to be near par on hashes per watt.

My wild guesstimate is you see 10X instead of 3X and have a 3X disadvantage on hash per watt. Readers I don't know. I am only wild guessing.

The use of dedicated AES instructions on CPUs might help a little but I doubt enough to stop the GPU from being 10 - 100X faster, and  ASICs could implement AES to run faster.

The latency on L2 is still several cycles and the latency on the CPU will asymptotically go to 0, although you have that 1 GB to limit the number of copies the GPU can run, i.e. 6 parallel copies on a 6 GB GPU (which might be sufficient to eliminate most latency).

You might get the 3X faster expected result, but I am not confident of that.

So there are two possible bottlenecks - the memory-bus access and AES speed. A GPU miner will need to solve both.

Parallelization attacks main memory latency and computation, but not maximum main memory bandwidth. L2 might win on cumulative cache bandwidth with multiple cores (since each cache is independent). Some GPUs have large caches now though.

Agreed both bottlenecks can be addressed in a GPU miner - but with the lack of AES-NI, slower cores, and lack of L2 cache to match the cores, I'm hoping it'll be 2X to 3X max. And it should take some time too.

Maybe. I can't say for sure. I am just giving you feedback.


How does this algorithm validate faster and with less memory? I think I know, but I don't want to say. I want to know what you came up with.

The algorithm essentially looks for a pattern in the data. The validation is told where the pattern is, so only produces a fraction of the psuedo-random data, and doesn't need to search for the start location of the pattern.

That is what I expected. That is what I am doing.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 15, 2013, 01:35:35 PM
My wild guesstimate is you see 10X instead of 3X and have a 3X disadvantage on hash per watt. Readers I don't know. I am only wild guessing.

I think to really boil it down - it's going to be about how fast GPUs can perform AES compression.

If we have a look at at gKrypt -

http://gkrypt.com/

they're talking about 80 gigabits per second on a single GPU on their hopepage. That's 10 GB/s

On an i7 4770, haswell, I'm seeing about 4 hashes per minute - that's 200GB per minute, or about 3.3GB/s

So after some optimization, and 64bit compile, we'll hopefully see that up to 4 or 5GB/s.

So I'm sticking with my 2X or 3X - and interesting point you make about power consumption - much more power efficient on CPU.






Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 15, 2013, 01:55:32 PM
Okay that seems correct. I was focused on the GPU eliminating the L2 memory advantages at a guessed 10X factor and I didn't know how much faster GPUs could compute AES (now you show evidence of only 2 - 3X relative to Intel's AES-NI), but then if computation bound is the goal, it seems the 1GB memory is not necessary except I guess you are aiming at complexity to implement an ASIC that interfaces with DRAM.

And ASICs can provide dedicated AES circuits.

So the main threat will come from ASICs but that won't be until your market cap is large enough to justify the ASIC development.

The reason I don't favor being compute bound is because I want to eliminate ASICs entirely.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 15, 2013, 02:01:44 PM
3X would be a significant improvement over Litecoin's 15X w.r.t. to GPUs and an order-of-magnitude better than the 30X for Bitcoin:

https://bitsharestalk.org/index.php?topic=22.msg2663#msg2663

However, both Litecoin and Bitcoin will be ASICs dominated, so it is irrelevant except that both got their start by being CPU, then GPU.

Appears MemoryCoin will be CPU then ASIC and mostly skip the GPU stage (due to GPU being less energy efficient than Intel's AES-NI even though slightly faster). This conclusion hinges on gkrypt being fully optimized for GPUs.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 15, 2013, 02:35:46 PM
I see 150 Kgates and 2 milliWatts per GBps (not Gbps) of throughput with ASICs:

http://www.martes-itea.org/public/papers/Hamalainen-Design_and_Implementation_2.pdf#page=6

ASICs have fast caches.

That should be very inexpensive to produce and obliterate the AES-NI both on performance and hashes per watt. Perhaps the memory bandwidth of the cache becomes the limiting factor.

I don't know about the DRAM memory controller.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 15, 2013, 03:39:32 PM
I don't understand this.

https://github.com/memorycoin/memorycoin/blob/psforkinit/src/momentum.cpp#L71

Code:
                 int searchNumber=comparisonSize/totalThreads;
                int startLoc=threadNumber*searchNumber;

Does that mean each thread searches a different section of the pseudo-random 1GB? But wouldn't that mean the result found could vary depending on the number of threads? Since the first value of 1968 found is taken as the solution.

I am thinking that is a design bug. Or perhaps I just don't understand the algorithm employed yet.


P.S. you typo-ed pseudo as 'psuedo' in the code.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 15, 2013, 03:45:25 PM
Does that mean each thread searches a different section of the pseudo-random 1GB? But wouldn't that mean the result found could vary depending on the number of threads? Since the first value of 1968 found is taken as the solution.

I am thinking that is a design bug. Or perhaps I just don't understand the algorithm employed yet.


P.S. you typo-ed pseudo as 'psuedo' in the code.

There are 16,000 different starting points - each thread takes a section to search. But there are 50 steps from each starting point, and each step can range over the whole 1GB, so every thread needs to have random access to the whole 1GB.

Every 1968 found is a solution, and can create a different SHA256 result. On average there should be 1 per 1GB data, but there might be 0 or 2 or more.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 15, 2013, 04:04:57 PM
Okay so now I understand you are sharing the same 1 GB for all threads and starting their walk from one of the 16,384 chunks in the 1GB. Chunk size is 64 KB.

So the GPU and ASIC will only need 1 GB for up to 16,384 (1<<14) threads. This was one of the criticisms about massive parallelization I made against Momentum for ProtoShares.

So the GPU will be compute bound on AES, and the ASIC will likely be memory bandwidth bound.

I don't see how L2 is even being employed in your algorithm. You are reading 64KB chunks of data from 1 GB, so you are not even in L3. So it appears you are compute bound on AES.

The main memory bandwidth on the CPU is 20 GB/s on desktop grade Intel Core. So clearly your algorithm is AES compute bound at 3 GB/s.

I don't know what it will cost to put a fast memory bandwidth interface together with an ASIC. It should be orders-of-magnitude faster and lower power than the CPU.

The only thing holding the GPU back is the lack of specialized AES circuitry, but note my prior post documenting that such circuitry doesn't require many transistors. GPU manufacturers could decide to add this perhaps or perhaps someone figure out a way to piggyback a cheap $50 ASIC on a mid-range GPU memory bus to get 50X performance.

Also perhaps someone can put an ASIC (or FPGA) or several Intel Core i5 on a PCIe card, since there is no sequential memory bound in this algorithm.

Note there is a way to make an scrypt-like hash sequential, CPU-only, and fast validating. That was my major breakthrough recently.

It appears you bought some time against GPUs and ASICs, but as far as I see you don't have a CPU-only coin forever into the future.

P.S. I am guessing 1968 is the year you were born.  ;D


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Sharky444 on December 15, 2013, 06:11:40 PM
It appears you bought some time against GPUs and ASICs, but as far as I see you don't have a CPU-only coin forever into the future.

P.S. I am guessing 1968 is the year you were born.  ;D

There cannot be a home-CPU only algo for ever, no matter what you do, since ASICs are CPUs too, although not programmable. But this algo is way more difficult to implement in a cheap way compared to SHA256 or scrypt. So it buys a lot of time for home-cpu users. (and botnets, but since is uses 1 GB of RAM and 90% CPU time many users that had their PCs overtaken by a botnet should notice that something is wrong).


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 15, 2013, 06:28:43 PM
So the GPU will be compute bound on AES, and the ASIC will likely be memory bandwidth bound.

I don't see how L2 is even being employed in your algorithm. You are reading 64KB chunks of data from 1 GB, so you are not even in L3. So it appears you are compute bound on AES.

So there's some XORing going on before the AES, hoping the L2 cache gives a bit of an advantage there too.

I don't know what it will cost to put a fast memory bandwidth interface together with an ASIC. It should be orders-of-magnitude faster and lower power than the CPU.

The only thing holding the GPU back is the lack of specialized AES circuitry, but note my prior post documenting that such circuitry doesn't require many transistors. GPU manufacturers could decide to add this perhaps or perhaps someone figure out a way to piggyback a cheap $50 ASIC on a mid-range GPU memory bus to get 50X performance.

Also perhaps someone can put an ASIC (or FPGA) or several Intel Core i5 on a PCIe card, since there is no sequential memory bound in this algorithm.

Note there is a way to make an scrypt-like hash sequential, CPU-only, and fast validating. That was my major breakthrough recently.

It appears you bought some time against GPUs and ASICs, but as far as I see you don't have a CPU-only coin forever into the future.

I'll take that as high praise! Appreciate all your analysis. ASIC design and production is not my area of expertise, but I'm skeptical that an ASIC can be designed and manufactured at a rate that make mining a viable business against the vast multitude of CPU owners with zero capital costs.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: NineLives on December 15, 2013, 07:24:04 PM
Is PTS crew behind this?


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 16, 2013, 06:02:26 AM
It appears you bought some time against GPUs and ASICs, but as far as I see you don't have a CPU-only coin forever into the future.

P.S. I am guessing 1968 is the year you were born.  ;D

There cannot be a home-CPU only algo for ever, no matter what you do, since ASICs are CPUs too

I assure you it can be done.

So the GPU will be compute bound on AES, and the ASIC will likely be memory bandwidth bound.

I don't see how L2 is even being employed in your algorithm. You are reading 64KB chunks of data from 1 GB, so you are not even in L3. So it appears you are compute bound on AES.

So there's some XORing going on before the AES, hoping the L2 cache gives a bit of an advantage there too.

I don't think so, because appears you are reading both source chunks from 1GB and writing back to 1GB. I see you copying to separate buffers and I don't understand why you do that instead of xoring directly from their original memory location.

I don't know what it will cost to put a fast memory bandwidth interface together with an ASIC. It should be orders-of-magnitude faster and lower power than the CPU.

The only thing holding the GPU back is the lack of specialized AES circuitry, but note my prior post documenting that such circuitry doesn't require many transistors. GPU manufacturers could decide to add this perhaps or perhaps someone figure out a way to piggyback a cheap $50 ASIC on a mid-range GPU memory bus to get 50X performance.

Also perhaps someone can put an ASIC (or FPGA) or several Intel Core i5 on a PCIe card, since there is no sequential memory bound in this algorithm.

Note there is a way to make an scrypt-like hash sequential, CPU-only, and fast validating. That was my major breakthrough recently.

It appears you bought some time against GPUs and ASICs, but as far as I see you don't have a CPU-only coin forever into the future.

I'll take that as high praise! Appreciate all your analysis. ASIC design and production is not my area of expertise, but I'm skeptical that an ASIC can be designed and manufactured at a rate that make mining a viable business against the vast multitude of CPU owners with zero capital costs.

Well imho yes you've apparently done better than Litecoin. The ASIC can definitely be done if your coin has a high enough market cap and at power efficiency looks to be several orders-of-magnitude so it should wipe out the CPUs, but by then you will be rich any way. ;)

Your near-term threat is botnets. They could 51% attack your coin.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: reorder on December 18, 2013, 06:31:28 PM
Guys, I have implemented a GPU miner for the coin and have some numbers to share. So far it yields 4hpm on 7870 gigaherz and just above 10hpm on 280X. Even with some handcrafted prefetch it is still heavily RAM latency-bound. I believe there is not much space for optimization left.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Sharky444 on December 18, 2013, 06:35:34 PM
Guys, I have implemented a GPU miner for the coin and have some numbers to share. So far it yields 4hpm on 7870 gigaherz and just above 10hpm on 280X. Even with some handcrafted prefetch it is still heavily RAM latency-bound. I believe there is not much space for optimization left.

Good news! This is what we've hoped for!


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: markj113 on December 18, 2013, 06:48:52 PM
Is PTS crew behind this?

yes


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 18, 2013, 09:05:46 PM
Guys, I have implemented a GPU miner for the coin and have some numbers to share. So far it yields 4hpm on 7870 gigaherz and just above 10hpm on 280X. Even with some handcrafted prefetch it is still heavily RAM latency-bound. I believe there is not much space for optimization left.

Thanks so much for sharing. It's really good news for the coin - any plans for your GPU miner?


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 18, 2013, 09:16:37 PM

also no.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: reorder on December 18, 2013, 09:54:21 PM
Guys, I have implemented a GPU miner for the coin and have some numbers to share. So far it yields 4hpm on 7870 gigaherz and just above 10hpm on 280X. Even with some handcrafted prefetch it is still heavily RAM latency-bound. I believe there is not much space for optimization left.

Thanks so much for sharing. It's really good news for the coin - any plans for your GPU miner?

For now, to try to make an Nvidia build and try some Amazon mining, but no further plans yet.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: reorder on December 18, 2013, 10:17:39 PM
And by the way, when trying to build an optimized Quark miner, I have noticed that AES-NI version of Groestl hash performed worse than AVX version on Intel when called by multiple threads  For single thread it was other way around. It could be an implementation fault, but it could also mean, for example, a single on-chip AES module shared by hyperthreads with serialized access. Maybe it has some implications for MemoryCoin as well.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: reorder on December 19, 2013, 02:07:45 PM
A bit of (not so) bad news: by coalescing main RAM access I have sped it up by ~40%, and opened some more vectors for minor optimizations. For now it runs at 5.86hpm on 7870.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 19, 2013, 02:11:46 PM
And by the way, when trying to build an optimized Quark miner, I have noticed that AES-NI version of Groestl hash performed worse than AVX version on Intel when called by multiple threads  For single thread it was other way around. It could be an implementation fault, but it could also mean, for example, a single on-chip AES module shared by hyperthreads with serialized access. Maybe it has some implications for MemoryCoin as well.

Hmm - seeing hashing improvements linear with number of cores, so think those NI must be part of each core.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 19, 2013, 02:14:33 PM
A bit of (not so) bad news: by coalescing main RAM access I have sped it up by ~40%, and opened some more vectors for minor optimizations. For now it runs at 5.86hpm on 7870.

Okay thanks. Worth noting our CPU algorithm hasn't been optimised or tuned at all, so we may have some room to catch-up. What are you using for the aes encryption?


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: reorder on December 19, 2013, 02:24:37 PM
And by the way, when trying to build an optimized Quark miner, I have noticed that AES-NI version of Groestl hash performed worse than AVX version on Intel when called by multiple threads  For single thread it was other way around. It could be an implementation fault, but it could also mean, for example, a single on-chip AES module shared by hyperthreads with serialized access. Maybe it has some implications for MemoryCoin as well.

Hmm - seeing hashing improvements linear with number of cores, so think those NI must be part of each core.

Well, here is what I get on two E5-2620 Xeons (6 cores, 12 threads each):
Code:
[root@xxx ~]# openssl speed aes-256-cbc -multi 12
...
aes-256 cbc     382580.34k   517842.22k   521875.46k   525670.06k   527021.40k
[root@xxx ~]# openssl speed aes-256-cbc -multi 24
...
aes-256 cbc     588586.78k   611764.04k   617288.53k   618816.17k   619241.47k

Not linear at all. Hashing does also not scale linearly with number of threads.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: reorder on December 19, 2013, 02:32:52 PM
A bit of (not so) bad news: by coalescing main RAM access I have sped it up by ~40%, and opened some more vectors for minor optimizations. For now it runs at 5.86hpm on 7870.

Okay thanks. Worth noting our CPU algorithm hasn't been optimised or tuned at all, so we may have some room to catch-up. What are you using for the aes encryption?

I use a bitsliced implementation from OpenSSL, reversed to little-endian to avoid conversion. I am still bounded by RAM latencies rather than computation. IIRC random global memory access is 0.18 words per cycle on Radeons, despite the huge bandwidth, so it is the major bottleneck.

In fact, I do not see room for much improvement of CPU hashing, OpenSSL is already (almost) perfect at AES.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 19, 2013, 04:30:15 PM
Well, here is what I get on two E5-2620 Xeons (6 cores, 12 threads each):
Code:
[root@xxx ~]# openssl speed aes-256-cbc -multi 12
...
aes-256 cbc     382580.34k   517842.22k   521875.46k   525670.06k   527021.40k
[root@xxx ~]# openssl speed aes-256-cbc -multi 24
...
aes-256 cbc     588586.78k   611764.04k   617288.53k   618816.17k   619241.47k

Not linear at all. Hashing does also not scale linearly with number of threads.

Scales linearly with the number of cores maybe - so each core might have dedicated AES-NI instructions, but 2 or 4 processes for each core might not be able to access them.  I bet you see linear scaling with 1, 2, 4 cores . . . then little drop off at 8, bigger at 16, and massive drop off at 32.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 19, 2013, 04:31:32 PM
In fact, I do not see room for much improvement of CPU hashing, OpenSSL is already (almost) perfect at AES.

Yeah, but you should see my code!


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: reorder on December 19, 2013, 04:52:24 PM
Well, here is what I get on two E5-2620 Xeons (6 cores, 12 threads each):
Code:
[root@xxx ~]# openssl speed aes-256-cbc -multi 12
...
aes-256 cbc     382580.34k   517842.22k   521875.46k   525670.06k   527021.40k
[root@xxx ~]# openssl speed aes-256-cbc -multi 24
...
aes-256 cbc     588586.78k   611764.04k   617288.53k   618816.17k   619241.47k

Not linear at all. Hashing does also not scale linearly with number of threads.

Scales linearly with the number of cores maybe - so each core might have dedicated AES-NI instructions, but 2 or 4 processes for each core might not be able to access them.  I bet you see linear scaling with 1, 2, 4 cores . . . then little drop off at 8, bigger at 16, and massive drop off at 32.

So this is what I was trying to say - single AES circuit for both hyperthreads. Of course, all cores are identical and there has to be the circuit in each.

In fact, I do not see room for much improvement of CPU hashing, OpenSSL is already (almost) perfect at AES.

Yeah, but you should see my code!

It's not like I could skip your momentum.cpp writing the miner :) Other than somewhat hard to read (all those constants), pretty straightforward. I cannot think of a way it could be optimized significantly. Page-lock those caches maybe, but you cannot do that in portable way.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: eddilicious on December 20, 2013, 06:23:21 AM
Hey wait...

MemoryCoin = the coin that require memory

I just figured out when I start to read this thread, memorycoin is not my good old memory of my first gf, but the computer memory on the latest standard. so, to be a miner, I either buy 16GB memory, or buy R9 280. I either complain about people setting up 125 GPU farm(my rig only have 4 so far), or complain ppl who have lots of credit on amazon server farm, b/c i only have one droplet on a cloud hashing at 0.26. it is a money game, one way or another, and a faith game, how much I am willing to commit to it.

so, as a small timer miner, all we need, is just a pool.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Stinky_Pete on December 21, 2013, 12:05:59 AM
Hey wait...

MemoryCoin = the coin that require memory

I just figured out when I start to read this thread, memorycoin is not my good old memory of my first gf, but the computer memory on the latest standard. so, to be a miner, I either buy 16GB memory, or buy R9 280. I either complain about people setting up 125 GPU farm(my rig only have 4 so far), or complain ppl who have lots of credit on amazon server farm, b/c i only have one droplet on a cloud hashing at 0.26. it is a money game, one way or another, and a faith game, how much I am willing to commit to it.

so, as a small timer miner, all we need, is just a pool.

MemoryCoin only needs 1GB to run.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: reorder on December 22, 2013, 01:47:13 PM
I have some more numbers from GPU mining field to share. Currently 7870GE mines at 8.42hpm at stock clocks, this is 7.12s per work. 4 of these 7 seconds are spent loading precalculated hashes from global RAM. I have also attempted calculating sha hashes on the fly instead of storing them, but, obviously, calculating each hash 50 times is about 20 times slower than caching it once.

Essentially, it is not AES that makes it GPU-hostile but huge amount of random RAM access required.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: FreeTrade on December 22, 2013, 02:44:12 PM
I have some more numbers from GPU mining field to share. Currently 7870GE mines at 8.42hpm at stock clocks, this is 7.12s per work. 4 of these 7 seconds are spent loading precalculated hashes from global RAM. I have also attempted calculating sha hashes on the fly instead of storing them, but, obviously, calculating each hash 50 times is about 20 times slower than caching it once.

Essentially, it is not AES that makes it GPU-hostile but huge amount of random RAM access required.

Music to my ears, thank you!

For commercialization, it looks like we're going to have pools soon, so there might be a good opportunity to run a GPU miners pool. Alternatively you could consider a binary release of the GPU miner that sends a small percentage of each block mined to you.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 27, 2013, 09:53:19 AM
Essentially, it is not AES that makes it GPU-hostile but huge amount of random RAM access required.

On the GPU, increase the parallelization of computation with random access, so that random access is masked by computation.

Also try to coalesce memory accesses so that latency is masked by the memory bandwidth that can load data faster than the CPU. If I am not mistaken, I believe you can essentially accomplish this statistically by running more copies of the same hash simultaneously, so it means you need to increase the amount of memory on your GPU to say 16 or 32GB. I believe if you increase this enough (128 GB?), you will eventually become computation bound. From upthread conjecture, we would expect the performance would top out at roughly 12 hashes per minute and be AES computation bound.

Can you experiment and confirm, as it impacts what I am designing as well as Memorycoin 2.0?


Also a reminder on upthread conjecture (https://bitcointalk.org/index.php?topic=355532.msg3977088#msg3977088), that an ASIC would not need to use a GPU's very slow memory latency design, thus Memorycoin remains vulnerable to ASICs.


Okay the 4 hashes per minute is very slow, but that is okay if the validation is much faster than the search for a hash solution. How much faster? Because denial-of-service is a threat. If each peer can only validate say 10 hashes per second, how will your system fend off a botnet denial-of-service attack that floods the network with bogus hashes?


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: reorder on December 27, 2013, 10:31:18 AM
Essentially, it is not AES that makes it GPU-hostile but huge amount of random RAM access required.

You've discovered my key insight (which I had mentioned during the Protoshares launch). But you probably don't yet know how to capitalize on it to make a CPU-only that is ASIC-resistant. You will eventually figure it out, but probably not before I have released the whitepaper.

On the GPU, increase the parallelization of computation with random access, so that random access is masked by computation.

Also try to coalesce memory accesses so that latency is masked by the memory bandwidth that can load data faster than the CPU. If I am not mistaken, I believe you can essentially accomplish this statistically by running more copies of the same hash simultaneously, so it means you need to increase the amount of memory on your GPU to say 16 or 32GB. I believe if you increase this enough (128 GB?), you will eventually become computation bound. From upthread conjecture, we would expect the performance would top out at roughly 12 hashes per minute and be AES computation bound.

Can you experiment and confirm, as it impacts what I am designing as well as Memorycoin 2.0?


Also a reminder on upthread conjecture (https://bitcointalk.org/index.php?topic=355532.msg3977088#msg3977088), that an ASIC would not need to use a GPU's very slow memory latency design, thus Memorycoin remains vulnerable to ASICs.


Also I wasn't paying attention before. 4 hashes per minute! Are you kidding me? I assumed per second. That is much too slow to fend of denial-of-service attacks. How are you doing to test whether hashes solutions are valid fast enough to prevent a denial-of-service attack on the proof-of-work?


To begin with, there are no consumer-grade GPUs with more than 6GB on the market. Besides, I have already done all coalescing possible, both statistically and logically, and overall it only yields 3x-4x advantage over CPU (10hpm on 7870). The PoW itself is not parallelizable due to CBC encryption.

Of course, ASIC may employ different techniques to reduce the latency, 3D memory etc, as the memory is not exactly randomly accessed like in scrypt, but in 64-byte chunks of 64k linear ranges. It can even ditch the RAM entirely replacing it with SHA calculation on the fly. Good luck in designing such an ASIC though.. But GPU is pretty much limited in what it can and what it can not.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 27, 2013, 10:57:35 AM
To begin with, there are no consumer-grade GPUs with more than 6GB on the market.

Demand is a funny economics 101 thing, it causes supply to rise to meet it on the price curve. Assuming Memorycoin became significant. Although ASICs would probably take over before that point any way.

The point is we want to test what are the technical limitations, not just what the market currently bears, because a coin needs to be future-proof.

Because if GPUs can become more efficient at solving the hash by adding more memory, then we need to factor that into our analysis.

However see below, I now don't think more memory is necessary to increase parallelization.

Besides, I have already done all coalescing possible, both statistically and logically, and overall it only yields 3x-4x advantage over CPU (10hpm on 7870).

10 hpm is 2.5x correct? (FreeTrade reported 4 hpm on CPU) That is faster than the last report I had seen from you on this thread.

That is congruent with the conjecture when it is AES computation bound. Do you have any measurement giving an estimate of how close to compute bound your implementation is?


The PoW itself is not parallelizable due to CBC encryption.

I am forgetting from upthread discussion that the hash can run up to 16,384 threads simultaneously without needing more than 1GB.

How many threads are you running? Did you try increasing the number of threads?

The point I believe is to get multiple random memory accesses to overlap statistically and they will be stored in the 768 KB cache so latency is masked by memory bandwidth. Although I am not sure how sophisticated the GPU is on merging coincident random memory accesses across threads into a sequential memory access.

Of course, ASIC may employ different techniques to reduce the latency, 3D memory etc,

As far as I can see, it simply needs to have a similar main memory as the CPU (and perhaps an L2 cache), or perhaps even be PCIe card that runs on your PC.

The point is AES can be made to run much faster, if the CPU is compute bound, as I showed (see the link to the upthread post).

but in 64-byte chunks of 64k linear ranges.

I thought it was working on a random chunk of 64 KB in size? So the random access latency shouldn't be a factor, except that perhaps 64 KB is loaded so fast due to the very fast memory bandwidth of the GPU. I wondering if you did something wrong or are misinterpreting some statistics you've analyzed or I am not understanding the algorithm? Or if you are not running enough threads to statistically mask the latency?

Good luck in designing such an ASIC though.

Upthread I cited references for low transistor counts ASIC designs which run AES much faster.

But GPU is pretty much limited in what it can and what it can not.

GPU is limited only by very slow memory latency. And the lack of specialized AES instructions. The former can't be rectified as it is fundamental to what makes the memory bandwidth so fast. The latter could maybe be added to GPUs, since the transistor counts required are relatively small as I cited with references upthread.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 27, 2013, 11:26:57 AM

Okay the 4 hashes per minute is very slow, but that is okay if the validation is much faster than the search for a hash solution. How much faster? Because denial-of-service is a threat. If each peer can only validate say 10 hashes per second, how will your system fend off a botnet denial-of-service attack that floods the network with bogus hashes?

Additionally how will pool share hashes work if the hash rate is only 4 per minute for each pool miner?

Won't the variance be incredibly high for block times in the few minute range.

Aren't you going to need at least a 10 minute block time thus no improvement over Bitcoin?


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: reorder on December 27, 2013, 11:27:53 AM
Do you have any measurement giving an estimate of how close to compute bound your implementation is?
It is about 50% now.

How many threads are you running? Did you try increasing the number of threads?
As many as GPU can start. 7870 has 4 CUs of 256 threads (and some more not exposed to OpenCL I believe).

The point I believe is to get multiple random memory accesses to overlap statistically and they will be stored in the 768 KB cache so latency is masked by memory bandwidth.
GPUs do not have automatically controlled L2 cache. I'd guess you are referring to 'local' memory which is a totally different beast. Controller is smart enough though to 'stream' simultaneous access to adjacent RAM areas.

I thought it was working on a random chunk of 64 KB in size? So the random access latency shouldn't be a factor, except that perhaps 64 KB is loaded so fast due to the very fast memory bandwidth of the GPU. I wondering if you did something wrong or are misinterpreting some statistics you've analyzed or I am not understanding the algorithm? Or if you are not running enough threads to statistically mask the latency?
You cannot load 64K anywhere, you have about 240*4 bytes of registers per thread and about 128 bytes of that 'local' memory per thread, and that's to it.



Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 27, 2013, 11:35:39 AM
Thanks for the feedback. It would interesting to see if more latency is statistically masked with higher number of threads. I wonder if there is any GPU simulator or actual GPU which can run more than 1024 threads?

In any case, your results are in the range expected by FreeTrade, even if you eliminate the remaining 50% of memory latency bound. So I suppose he is happy. It appears to be an improvement over Litecoin, yet I have some pending questions above about the impact of the slow hash rate.

Did you do any power measurements? Because one of the points I made upthread is that GPU may be less power efficient even though it achieves a faster hash rate. However if it is latency bound (idle) 50% of the time, it may not be maxing out its power consumption.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: reorder on December 27, 2013, 11:52:46 AM
Thanks for the feedback. It would interesting to see if more latency is statistically masked with higher number of threads. I wonder if there is any GPU simulator or actual GPU which can run more than 1024 threads?
Of course it would be masked. Random global RAM access is about 0.18 words/thread (<1 byte) per cycle in Radeons on average, while sequentially you can load/store 16 bytes/thread. Higher range Teslas have ~2048 threads I believe (and compiler that crashes on my kernel, so I did not test with it yet). Also Nvidia ships a pretty nice analyzer for CUDA where you can see subsystems utilisation in runtime.

In any case, your results are in the range expected by FreeTrade, even if you eliminate the remaining 50% of memory latency bound. So I suppose he is happy. It appears to be an improvement over Litecoin, yet I have some pending questions above about the impact of the slow hash rate.
Yes, it is like Litecoin without the infamous lookup-gap shortcut.

Did you do any power measurements? Because one of the points I made upthread is that GPU may be less power efficient even though it achieves a faster hash rate.
~200W/10hpm for 7870, but this varies across GPUs of course.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on December 27, 2013, 11:58:34 AM
~200W/10hpm for 7870, but this varies across GPUs of course.

So very roughly parity with the CPU on power efficiency, assuming the CPU is maxed out at 80W. Although CPU systems usually consume more than 100W when they are not idle, so a rack of GPUs might be slightly more power efficient.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: hownowbrowncow on January 14, 2014, 10:16:31 AM
The latest CPU optimizations are fantastic improvement.

I am running close to 2000HPM without using any GPU...

Thank you all!!!


What is your setup?


Title: This message was too old and has been purged
Post by: Evil-Knievel on January 14, 2014, 07:35:37 PM
This message was too old and has been purged


Title: This message was too old and has been purged
Post by: Evil-Knievel on January 14, 2014, 09:00:07 PM
This message was too old and has been purged


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: rmmh on February 03, 2014, 11:16:49 PM
All a MemoryCoin 2.0 ASIC requires is two SHA512 blocks going into an AES block-- if you can generate the pseudorandom data on the fly quickly enough, there's no need to store it in memory at all.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: tromp on February 12, 2014, 03:33:24 AM
With all the expertise on CPU oriented PoWs in this thread,
can anyone comment on my Cuckoo Cycle design, which is
focused entirely on main memory random access latency?

https://github.com/tromp/cuckoo has a whitepaper and implementation.

Feedback is most welcome.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on February 13, 2014, 03:28:41 AM
With all the expertise on CPU oriented PoWs in this thread,
can anyone comment on my Cuckoo Cycle design, which is
focused entirely on main memory random access latency?

https://github.com/tromp/cuckoo has a whitepaper and implementation.

Feedback is most welcome.

Too slow to use 16 GB to defeat botnets:

https://github.com/tromp/cuckoo

Quote
6) running time is under 24s/GB for the current implementation on high end x86.

I stopped there because it isn't important for me to see if the rest of your design makes sense, since botnet resistance is a critical requirement in my opinion. MemoryCoin has none also.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: tromp on February 13, 2014, 06:18:26 PM
With all the expertise on CPU oriented PoWs in this thread,
can anyone comment on my Cuckoo Cycle design, which is
focused entirely on main memory random access latency?

https://github.com/tromp/cuckoo has a whitepaper and implementation.

Feedback is most welcome.

Too slow to use 16 GB to defeat botnets:

https://github.com/tromp/cuckoo

Quote
6) running time is under 24s/GB for the current implementation on high end x86.

I stopped there because it isn't important for me to see if the rest of your design makes sense, since botnet resistance is a critical requirement in my opinion. MemoryCoin has none also.

Since yesterday, the implementation allows for multi-threading, and the README now says:

6) running time for the current implementation on high end x86 is under 24s/GB single-threaded,
   and under 3s/GB for 12 threads.

So 16GB would take well under a minute on a server.

In any case, I disagree that 16GB is needed to defeat botnets. Most machines in a botnet
have less than 4GB of *unused* memory, and running a 4GB cuckoo would send them into
swap-hell, not only making them useless for mining, but also alerting their owner.
So Cuckoo Cycle might actually help to shrink botnets...


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on February 15, 2014, 11:27:39 AM
I have the survey on botnets. They target Asian gaming machines (rich boys) that typically have 8 GB.

Validation has to be several orders-of-magnitude faster than finding a block, else anti-spam and anti-DDoS doesn't work.

8 minutes is way too slow. You are orders-of-magnitude from solving the cpu-only nut. I've already solved it.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: tromp on February 15, 2014, 03:30:46 PM
I have the survey on botnets. They target Asian gaming machines (rich boys) that typically have 8 GB.

Validation has to be several orders-of-magnitude faster than finding a block, else anti-spam and anti-DDoS doesn't work.

8 minutes is way too slow. You are orders-of-magnitude from solving the cpu-only nut. I've already solved it.

Cuckoo Cycle lets you set whatever memory requirement you want.
If you think 4GB is too little, then just go for more.
Validation is instant in any case. Why do you bring up these point that were addressed already?

You haven't solved the major problem of your PoW yet.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on February 15, 2014, 07:57:50 PM
Validation is instant in any case. Why do you bring up these point that were addressed already?

I didn't know validation is instant. Nevertheless 8 seconds may be too slow for finding blocks if block times are reduced to a minute or less (https://bitcointalk.org/index.php?topic=455141.msg5157380#msg5157380).

Do you have a succinct pseudo-code description of your algorithm so I can analyze it quickly?

You haven't solved the major problem of your PoW yet.

You can't know that because you haven't seen mine.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: tromp on February 15, 2014, 08:05:02 PM
Validation is instant in any case. Why do you bring up these point that were addressed already?

I didn't know validation is instant. Nevertheless 8 seconds may be too slow for finding blocks if block times are reduced to a minute or less (https://bitcointalk.org/index.php?topic=455141.msg5157380#msg5157380).

Do you have a succinct pseudo-code description of your algorithm so I can analyze it quickly?

You haven't solved the major problem of your PoW yet.

You can't know that because you haven't seen mine.

Cuckoo is not suitable for very short block intervals (unless you shrink the memory requirement
below 1GB, but then you lose botnet resistance).
For a description of the algorithm, see the paper. It's not that long.

The problem of your PoW is the very fact that you have not published it...


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on February 15, 2014, 08:34:34 PM
I see you haven't discovered how I made it fast. Thus Cuckoo can't support very low block times (with botnet resistance) and will increase variance.

Since Cuckoo is only main memory latency bound, someone could design a lower-cost, more efficient ASIC coupled with standard DRAM.

Thus I conclude yours is memory-coin, not cpu-only.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: tromp on February 16, 2014, 12:54:02 AM
I see you haven't discovered how I made it fast. Thus Cuckoo can't support very low block times (with botnet resistance) and will increase variance.

Since Cuckoo is only main memory latency bound, someone could design a lower-cost, more efficient ASIC coupled with standard DRAM.

Thus I conclude yours is memory-coin, not cpu-only.

I feel the importance of low block interval times is overrated.

If a cheap multicore cpu like the http://hackaday.com/2012/09/28/massively-parallel-64-core-computer-costs-99/
can already saturate DRAM's latency, then there's not much point in developing an ASIC.

If you want to call Cuckoo Cycle a memory-only PoW rather than a CPU-only PoW,
that suits me fine.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on February 20, 2014, 01:22:48 AM
I see you haven't discovered how I made it fast. Thus Cuckoo can't support very low block times (with botnet resistance) and will increase variance.

Since Cuckoo is only main memory latency bound, someone could design a lower-cost, more efficient ASIC coupled with standard DRAM.

Thus I conclude yours is memory-coin, not cpu-only.

I feel the importance of low block interval times is overrated.

I disagree. I think it may be one of the most important features of superior altcoin if it is done correctly because it could help prevent off-chain fractional reserve banking as was failing in the 1800s (and thus centralization and right back to central banking again):

https://bitcointalk.org/index.php?topic=465474.msg5166632#msg5166632
https://bitcointalk.org/index.php?topic=465474.msg5182519#msg5182519

If a cheap multicore cpu like the http://hackaday.com/2012/09/28/massively-parallel-64-core-computer-costs-99/
can already saturate DRAM's latency, then there's not much point in developing an ASIC.

Good point. :)

Defeating those high-end server CPUs was another of my objectives.

If you want to call Cuckoo Cycle a memory-only PoW rather than a CPU-only PoW,
that suits me fine.

To be more specific, not a personal computer cpu-only. As you pointed out, the high-end server CPUs from Oracle, etc could be used.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: tromp on February 20, 2014, 02:03:10 AM
Defeating those high-end server CPUs was another of my objectives (and I think I succeeded).

When can we expect a publication detailing your PoW?

If you want to call Cuckoo Cycle a memory-only PoW rather than a CPU-only PoW,
that suits me fine.

To be more specific, not a personal computer cpu-only. As you pointed out, the high-end server CPUs from Oracle, etc could be used.

But the latter is not as cost effective; it costs way more per thread and per GB of memory,
and thus gets handily beaten by a farm of pc's of the same total cost.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on February 20, 2014, 02:21:28 AM
Defeating those high-end server CPUs was another of my objectives (and I think I succeeded).

When can we expect a publication detailing your PoW?

I am trying to figure out which altcoin to give my algorithm to. I want it to have a first-mover advantage. Otherwise we end up with a sea of copycat coins and no focused challenger to Bitcoin.

If you want to call Cuckoo Cycle a memory-only PoW rather than a CPU-only PoW,
that suits me fine.

To be more specific, not a personal computer cpu-only. As you pointed out, the high-end server CPUs from Oracle, etc could be used.

But the latter is not as cost effective; it costs way more per thread and per GB of memory,
and thus gets handily beaten by a farm of pc's of the same total cost.

Granted Oracle's chip is not the most cost effective. I was too lazy to go research for the name (Tilera) of the lower cost per thread CPU.

Are you claiming that massively parallel CPUs cost more per thread than Intel and AMD CPUs for consumer-level PCs?

Doesn't the link (http://hackaday.com/2012/09/28/massively-parallel-64-core-computer-costs-99/) you provided refute that, or the Tilera CPU that someone else mentioned (https://bitcointalk.org/index.php?topic=342848.msg3681618#msg3681618).

Your algorithm is not computation intensive and is main memory DRAM latency bound, thus lower-powered threads suitable for server tasks would be a better fit than the very high-powered PC CPUs. For example, web-server or chat-server threads are typically not compute bound rather I/O bound. The Oracle chip is more high powered I think because it is designed for database threads.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: Agamemnus on May 28, 2014, 07:35:38 AM
I saw a whole lot of talk from AnthonyMint and a whole lack of any kind of proof. Just words, no good links, no algorithms of his own. Just a whole lot of crap-talk.

And I had a good giggle over it.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: AnonyMint on July 19, 2014, 12:59:56 AM
In Upthread discussion (https://bitcointalk.org/index.php?topic=355532.msg4088580#msg4088580), reorder claimed the GPU implementation was random access bound, and I claimed that with enough threads it would likely be AES computation bound instead.

I add now the point that I don't think he accounted for what percentage of the execution time is random access bound on the CPU?

Thus I am positing that as the number of the threads on the GPU increases, then the random access portion of the algorithm is actually detrimental to giving an advantage to the CPU.

Does this make sense?

And I had a good giggle over it.

That is why you are a nobody, because you don't comprehend and you laugh at those who do (or are in the process of) instead of participating in sharing and learning.


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: tromp on July 25, 2014, 05:46:27 PM
Ok, here's the latest modification to the PoW.

1. Generate 1GB PsuedoRandom data using SHA512
2. For each 64K block - Repeat 10 50 times
 2.1 Use the last 32bits as a pointer to another 64K block
 2.2 XOR the two 64K blocks together
 2.3 AES CBC encrypt the result using the last 256 bits as a key
3. Use the last 32bits%2^14 as the solution. If the solution==1968, block solved


If I understand this correctly, then the miner only needs to use 50*64KB of memory,
or 3.2MB, instead of the intended 1024MB.
Your next-block pointers create a random graph on 2^14 nodes with edges partitioned
into several cycles, each of which can be processed separately.

On each cycle, it only needs to maintain the block states of the last 50 starting positions.
So it SHA-generates the next block and updates each of those 50 states with it.
One of them has accumulated 50 updates and is checked and retired,
while another is started in its place, having only 1 update.
There's just 49 extra steps of SHA generation and updating when the cycle is completed
(easy to check with e.g. a bitmap of 2^14 bits).
According to random graph theory, the number of cycles is logarithmic in total number of nodes,
so this overhead is negligible.

In summary, the miner should generate blocks not in naive block index order, but
in next-block pointer order, and maintain 50 simultaneous block states, to drastically
reduce memory usage.

-John


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: tromp on July 25, 2014, 08:29:13 PM
Ok, here's the latest modification to the PoW.

1. Generate 1GB PsuedoRandom data using SHA512
2. For each 64K block - Repeat 10 50 times
 2.1 Use the last 32bits as a pointer to another 64K block
 2.2 XOR the two 64K blocks together
 2.3 AES CBC encrypt the result using the last 256 bits as a key
3. Use the last 32bits%2^14 as the solution. If the solution==1968, block solved


If I understand this correctly, then the miner only needs to use 50*64KB of memory,
or 3.2MB, instead of the intended 1024MB.
Your next-block pointers create a random graph on 2^14 nodes with edges partitioned
into several cycles, each of which can be processed separately.

On each cycle, it only needs to maintain the block states of the last 50 starting positions.
So it SHA-generates the next block and updates each of those 50 states with it.
One of them has accumulated 50 updates and is checked and retired,
while another is started in its place, having only 1 update.
There's just 49 extra steps of SHA generation and updating when the cycle is completed
(easy to check with e.g. a bitmap of 2^14 bits).
According to random graph theory, the number of cycles is logarithmic in total number of nodes,
so this overhead is negligible.

In summary, the miner should generate blocks not in naive block index order, but
in next-block pointer order, and maintain 50 simultaneous block states, to drastically
reduce memory usage.

Hmm, looking at the actual implementation on github, I apparently misunderstood,
as the block-pointer depends on the updated state as well.

So the miner does need 1GB, at least to avoid having to regenerate each block 49 times on average
(as FPGA/ASICs would be expected to do).


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: yvg1900 on July 26, 2014, 04:37:11 AM
Yes, big memory buffer needed to avoid recalculation of 64k blocks.

This makes PoW really memory bound.

CryptoNight PoW is somehow conceptually similar to MMC 2.0 PoW (besides of much smaller block size and different - more aggressive - data dependency construct).


 


Title: Re: MemoryCoin 2.0 Proof Of Work
Post by: tromp on July 26, 2014, 01:34:19 PM
Yes, big memory buffer needed to avoid recalculation of 64k blocks.

This makes PoW really memory bound.

It's not memory-hard though, as memory can trivially traded be traded off for computation time
(8192 times less memory with only 50x slowdown).
Which is a bit sad for a coin whose very name emphasizes memory:-(

Quote
CryptoNight PoW is somehow conceptually similar to MMC 2.0 PoW (besides of much smaller block size and different - more aggressive - data dependency construct).

CryptoNight is just hashcash with a hashfunction that bears some similarities to MMC2's verification
(note that the crucial overwriting of main memory blocks has no equivalent in MMC2).

MMC2 is not hashcash, but one of the few asymmetric PoWs (like Primecoin, Momentum, and Cuckoo Cycle) where proof attempts need to perform much more work than verification.

So overall, I'd say they're conceptually quite different.