Bitcoin Forum
May 08, 2024, 07:49:29 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 [4] 5 6 »  All
  Print  
Author Topic: MemoryCoin 2.0 Proof Of Work  (Read 21394 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
FreeTrade (OP)
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
December 15, 2013, 01:35:35 PM
 #61

My wild guesstimate is you see 10X instead of 3X and have a 3X disadvantage on hash per watt. Readers I don't know. I am only wild guessing.

I think to really boil it down - it's going to be about how fast GPUs can perform AES compression.

If we have a look at at gKrypt -

http://gkrypt.com/

they're talking about 80 gigabits per second on a single GPU on their hopepage. That's 10 GB/s

On an i7 4770, haswell, I'm seeing about 4 hashes per minute - that's 200GB per minute, or about 3.3GB/s

So after some optimization, and 64bit compile, we'll hopefully see that up to 4 or 5GB/s.

So I'm sticking with my 2X or 3X - and interesting point you make about power consumption - much more power efficient on CPU.





Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715154569
Hero Member
*
Offline Offline

Posts: 1715154569

View Profile Personal Message (Offline)

Ignore
1715154569
Reply with quote  #2

1715154569
Report to moderator
AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
December 15, 2013, 01:55:32 PM
Last edit: December 15, 2013, 02:16:13 PM by AnonyMint
 #62

Okay that seems correct. I was focused on the GPU eliminating the L2 memory advantages at a guessed 10X factor and I didn't know how much faster GPUs could compute AES (now you show evidence of only 2 - 3X relative to Intel's AES-NI), but then if computation bound is the goal, it seems the 1GB memory is not necessary except I guess you are aiming at complexity to implement an ASIC that interfaces with DRAM.

And ASICs can provide dedicated AES circuits.

So the main threat will come from ASICs but that won't be until your market cap is large enough to justify the ASIC development.

The reason I don't favor being compute bound is because I want to eliminate ASICs entirely.

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
December 15, 2013, 02:01:44 PM
 #63

3X would be a significant improvement over Litecoin's 15X w.r.t. to GPUs and an order-of-magnitude better than the 30X for Bitcoin:

https://bitsharestalk.org/index.php?topic=22.msg2663#msg2663

However, both Litecoin and Bitcoin will be ASICs dominated, so it is irrelevant except that both got their start by being CPU, then GPU.

Appears MemoryCoin will be CPU then ASIC and mostly skip the GPU stage (due to GPU being less energy efficient than Intel's AES-NI even though slightly faster). This conclusion hinges on gkrypt being fully optimized for GPUs.

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
December 15, 2013, 02:35:46 PM
Last edit: December 15, 2013, 02:51:28 PM by AnonyMint
 #64

I see 150 Kgates and 2 milliWatts per GBps (not Gbps) of throughput with ASICs:

http://www.martes-itea.org/public/papers/Hamalainen-Design_and_Implementation_2.pdf#page=6

ASICs have fast caches.

That should be very inexpensive to produce and obliterate the AES-NI both on performance and hashes per watt. Perhaps the memory bandwidth of the cache becomes the limiting factor.

I don't know about the DRAM memory controller.

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
December 15, 2013, 03:39:32 PM
 #65

I don't understand this.

https://github.com/memorycoin/memorycoin/blob/psforkinit/src/momentum.cpp#L71

Code:
                 int searchNumber=comparisonSize/totalThreads;
                int startLoc=threadNumber*searchNumber;

Does that mean each thread searches a different section of the pseudo-random 1GB? But wouldn't that mean the result found could vary depending on the number of threads? Since the first value of 1968 found is taken as the solution.

I am thinking that is a design bug. Or perhaps I just don't understand the algorithm employed yet.


P.S. you typo-ed pseudo as 'psuedo' in the code.

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
FreeTrade (OP)
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
December 15, 2013, 03:45:25 PM
 #66

Does that mean each thread searches a different section of the pseudo-random 1GB? But wouldn't that mean the result found could vary depending on the number of threads? Since the first value of 1968 found is taken as the solution.

I am thinking that is a design bug. Or perhaps I just don't understand the algorithm employed yet.


P.S. you typo-ed pseudo as 'psuedo' in the code.

There are 16,000 different starting points - each thread takes a section to search. But there are 50 steps from each starting point, and each step can range over the whole 1GB, so every thread needs to have random access to the whole 1GB.

Every 1968 found is a solution, and can create a different SHA256 result. On average there should be 1 per 1GB data, but there might be 0 or 2 or more.

Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
December 15, 2013, 04:04:57 PM
Last edit: December 15, 2013, 05:09:09 PM by AnonyMint
 #67

Okay so now I understand you are sharing the same 1 GB for all threads and starting their walk from one of the 16,384 chunks in the 1GB. Chunk size is 64 KB.

So the GPU and ASIC will only need 1 GB for up to 16,384 (1<<14) threads. This was one of the criticisms about massive parallelization I made against Momentum for ProtoShares.

So the GPU will be compute bound on AES, and the ASIC will likely be memory bandwidth bound.

I don't see how L2 is even being employed in your algorithm. You are reading 64KB chunks of data from 1 GB, so you are not even in L3. So it appears you are compute bound on AES.

The main memory bandwidth on the CPU is 20 GB/s on desktop grade Intel Core. So clearly your algorithm is AES compute bound at 3 GB/s.

I don't know what it will cost to put a fast memory bandwidth interface together with an ASIC. It should be orders-of-magnitude faster and lower power than the CPU.

The only thing holding the GPU back is the lack of specialized AES circuitry, but note my prior post documenting that such circuitry doesn't require many transistors. GPU manufacturers could decide to add this perhaps or perhaps someone figure out a way to piggyback a cheap $50 ASIC on a mid-range GPU memory bus to get 50X performance.

Also perhaps someone can put an ASIC (or FPGA) or several Intel Core i5 on a PCIe card, since there is no sequential memory bound in this algorithm.

Note there is a way to make an scrypt-like hash sequential, CPU-only, and fast validating. That was my major breakthrough recently.

It appears you bought some time against GPUs and ASICs, but as far as I see you don't have a CPU-only coin forever into the future.

P.S. I am guessing 1968 is the year you were born.  Grin

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
Sharky444
Hero Member
*****
Offline Offline

Activity: 724
Merit: 500


View Profile
December 15, 2013, 06:11:40 PM
 #68

It appears you bought some time against GPUs and ASICs, but as far as I see you don't have a CPU-only coin forever into the future.

P.S. I am guessing 1968 is the year you were born.  Grin

There cannot be a home-CPU only algo for ever, no matter what you do, since ASICs are CPUs too, although not programmable. But this algo is way more difficult to implement in a cheap way compared to SHA256 or scrypt. So it buys a lot of time for home-cpu users. (and botnets, but since is uses 1 GB of RAM and 90% CPU time many users that had their PCs overtaken by a botnet should notice that something is wrong).

Radix - just imagine
FreeTrade (OP)
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
December 15, 2013, 06:28:43 PM
 #69

So the GPU will be compute bound on AES, and the ASIC will likely be memory bandwidth bound.

I don't see how L2 is even being employed in your algorithm. You are reading 64KB chunks of data from 1 GB, so you are not even in L3. So it appears you are compute bound on AES.

So there's some XORing going on before the AES, hoping the L2 cache gives a bit of an advantage there too.

I don't know what it will cost to put a fast memory bandwidth interface together with an ASIC. It should be orders-of-magnitude faster and lower power than the CPU.

The only thing holding the GPU back is the lack of specialized AES circuitry, but note my prior post documenting that such circuitry doesn't require many transistors. GPU manufacturers could decide to add this perhaps or perhaps someone figure out a way to piggyback a cheap $50 ASIC on a mid-range GPU memory bus to get 50X performance.

Also perhaps someone can put an ASIC (or FPGA) or several Intel Core i5 on a PCIe card, since there is no sequential memory bound in this algorithm.

Note there is a way to make an scrypt-like hash sequential, CPU-only, and fast validating. That was my major breakthrough recently.

It appears you bought some time against GPUs and ASICs, but as far as I see you don't have a CPU-only coin forever into the future.

I'll take that as high praise! Appreciate all your analysis. ASIC design and production is not my area of expertise, but I'm skeptical that an ASIC can be designed and manufactured at a rate that make mining a viable business against the vast multitude of CPU owners with zero capital costs.

Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
NineLives
Sr. Member
****
Offline Offline

Activity: 252
Merit: 250



View Profile WWW
December 15, 2013, 07:24:04 PM
 #70

Is PTS crew behind this?

Bitcoin Mining Hardware:   www.mininghardware.co.uk
AnonyMint
Hero Member
*****
Offline Offline

Activity: 518
Merit: 521


View Profile
December 16, 2013, 06:02:26 AM
Last edit: December 16, 2013, 06:28:47 PM by AnonyMint
 #71

It appears you bought some time against GPUs and ASICs, but as far as I see you don't have a CPU-only coin forever into the future.

P.S. I am guessing 1968 is the year you were born.  Grin

There cannot be a home-CPU only algo for ever, no matter what you do, since ASICs are CPUs too

I assure you it can be done.

So the GPU will be compute bound on AES, and the ASIC will likely be memory bandwidth bound.

I don't see how L2 is even being employed in your algorithm. You are reading 64KB chunks of data from 1 GB, so you are not even in L3. So it appears you are compute bound on AES.

So there's some XORing going on before the AES, hoping the L2 cache gives a bit of an advantage there too.

I don't think so, because appears you are reading both source chunks from 1GB and writing back to 1GB. I see you copying to separate buffers and I don't understand why you do that instead of xoring directly from their original memory location.

I don't know what it will cost to put a fast memory bandwidth interface together with an ASIC. It should be orders-of-magnitude faster and lower power than the CPU.

The only thing holding the GPU back is the lack of specialized AES circuitry, but note my prior post documenting that such circuitry doesn't require many transistors. GPU manufacturers could decide to add this perhaps or perhaps someone figure out a way to piggyback a cheap $50 ASIC on a mid-range GPU memory bus to get 50X performance.

Also perhaps someone can put an ASIC (or FPGA) or several Intel Core i5 on a PCIe card, since there is no sequential memory bound in this algorithm.

Note there is a way to make an scrypt-like hash sequential, CPU-only, and fast validating. That was my major breakthrough recently.

It appears you bought some time against GPUs and ASICs, but as far as I see you don't have a CPU-only coin forever into the future.

I'll take that as high praise! Appreciate all your analysis. ASIC design and production is not my area of expertise, but I'm skeptical that an ASIC can be designed and manufactured at a rate that make mining a viable business against the vast multitude of CPU owners with zero capital costs.

Well imho yes you've apparently done better than Litecoin. The ASIC can definitely be done if your coin has a high enough market cap and at power efficiency looks to be several orders-of-magnitude so it should wipe out the CPUs, but by then you will be rich any way. Wink

Your near-term threat is botnets. They could 51% attack your coin.

unheresy.com - Prodigiously Elucidating the Profoundly ObtuseTHIS FORUM ACCOUNT IS NO LONGER ACTIVE
reorder
Sr. Member
****
Offline Offline

Activity: 462
Merit: 250


View Profile
December 18, 2013, 06:31:28 PM
 #72

Guys, I have implemented a GPU miner for the coin and have some numbers to share. So far it yields 4hpm on 7870 gigaherz and just above 10hpm on 280X. Even with some handcrafted prefetch it is still heavily RAM latency-bound. I believe there is not much space for optimization left.
Sharky444
Hero Member
*****
Offline Offline

Activity: 724
Merit: 500


View Profile
December 18, 2013, 06:35:34 PM
 #73

Guys, I have implemented a GPU miner for the coin and have some numbers to share. So far it yields 4hpm on 7870 gigaherz and just above 10hpm on 280X. Even with some handcrafted prefetch it is still heavily RAM latency-bound. I believe there is not much space for optimization left.

Good news! This is what we've hoped for!

Radix - just imagine
markj113
Legendary
*
Offline Offline

Activity: 2254
Merit: 1043



View Profile
December 18, 2013, 06:48:52 PM
 #74

Is PTS crew behind this?

yes
FreeTrade (OP)
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
December 18, 2013, 09:05:46 PM
 #75

Guys, I have implemented a GPU miner for the coin and have some numbers to share. So far it yields 4hpm on 7870 gigaherz and just above 10hpm on 280X. Even with some handcrafted prefetch it is still heavily RAM latency-bound. I believe there is not much space for optimization left.

Thanks so much for sharing. It's really good news for the coin - any plans for your GPU miner?

Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
FreeTrade (OP)
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
December 18, 2013, 09:16:37 PM
 #76


also no.

Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
reorder
Sr. Member
****
Offline Offline

Activity: 462
Merit: 250


View Profile
December 18, 2013, 09:54:21 PM
 #77

Guys, I have implemented a GPU miner for the coin and have some numbers to share. So far it yields 4hpm on 7870 gigaherz and just above 10hpm on 280X. Even with some handcrafted prefetch it is still heavily RAM latency-bound. I believe there is not much space for optimization left.

Thanks so much for sharing. It's really good news for the coin - any plans for your GPU miner?

For now, to try to make an Nvidia build and try some Amazon mining, but no further plans yet.
reorder
Sr. Member
****
Offline Offline

Activity: 462
Merit: 250


View Profile
December 18, 2013, 10:17:39 PM
 #78

And by the way, when trying to build an optimized Quark miner, I have noticed that AES-NI version of Groestl hash performed worse than AVX version on Intel when called by multiple threads  For single thread it was other way around. It could be an implementation fault, but it could also mean, for example, a single on-chip AES module shared by hyperthreads with serialized access. Maybe it has some implications for MemoryCoin as well.
reorder
Sr. Member
****
Offline Offline

Activity: 462
Merit: 250


View Profile
December 19, 2013, 02:07:45 PM
 #79

A bit of (not so) bad news: by coalescing main RAM access I have sped it up by ~40%, and opened some more vectors for minor optimizations. For now it runs at 5.86hpm on 7870.
FreeTrade (OP)
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
December 19, 2013, 02:11:46 PM
 #80

And by the way, when trying to build an optimized Quark miner, I have noticed that AES-NI version of Groestl hash performed worse than AVX version on Intel when called by multiple threads  For single thread it was other way around. It could be an implementation fault, but it could also mean, for example, a single on-chip AES module shared by hyperthreads with serialized access. Maybe it has some implications for MemoryCoin as well.

Hmm - seeing hashing improvements linear with number of cores, so think those NI must be part of each core.

Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
Pages: « 1 2 3 [4] 5 6 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!