AnonyMint
|
|
August 18, 2013, 01:27:34 AM |
|
And all Alt Coin advances are incremental by nature.
Not necessarily. Stay tuned... You can only achieve some degree of "GPU resistance"... Without taking drastic, undemocratic measures like saying... "Must be run on 64-bit with 16 GB RAM", etc.
Correct, but we can make that potential advantage (i.e. GPUs with any amount of GDDR memory the user chooses) memory-bound: https://bitcointalk.org/index.php?topic=273197.msg2950451#msg2950451https://bitcointalk.org/index.php?topic=273197.msg2954801#msg2954801So if the coin requires 32 KB inner Scrypt, then HD 7970 is going to be at the 0.25 to 0.5 TB/s of the GDDR RAM but with much latency and only 4 threads so much slower than the CPU. Even if the coin requires only 16 KB inner Scrypt or later version of the GPU has 32 KB L1 cache, the GPU is still going to be employing only 4 threads same as for the CPU, but may run at twice the speed of the CPU because of the double L1 cache speed.
...
My other idea is to force the total memory requirement of the outer Scrypt higher than any GPU, since I know of no GPU which allows addon GDDR memory. There is no retail market for GDDR memory.
The idea for nested scrypt is to keep the duty-cycle of memory-bound to L1 cache near to 100% and it also provides the flexibility to control the duty-cycle of the main memory, i.e. 100% - percentage of the time that the algorithm is memory-bound in L1 cache while not writing out to the main memory. Lowering the latter duty-cycle will increase the execution-time of the hash, yet also lower the individual and society-wide electricity requirements of the coin, because for the same hardware less electricity is consumed.
Without taking drastic, undemocratic measures like saying... "Must be run on 64-bit with 16 GB RAM", etc.
I don't consider that to be undemocratic. It kills Botnets also if serious users spend $50 to upgrade their memory.
|
|
|
|
AnonyMint
|
|
August 18, 2013, 01:33:20 AM |
|
Thanks for your comments and links on this. I haven't had time to fully digest them - there's a lot of information there and there are some competing demands on my time.
Wow a guy who is worth working with. Let's see if we could combine our resources to get the job done?
|
|
|
|
AnonyMint
|
|
August 18, 2013, 02:08:32 AM Last edit: August 18, 2013, 02:25:53 AM by AnonyMint |
|
1. Got the Scrypt wrong and can still be supermined by GPUs
Thanks for your comments and links on this. I haven't had time to fully digest them - there's a lot of information there and there are some competing demands on my time. From what I've seen, adding cores does not produce linear improvements, even with the small numbers of cores in a CPU. But if MC is vulnerable to GPU mining, I'd like to see it fail fast, so I'd encourage you to post any ideas you have for how one might go about creating a GPU miner. It doesn't need to be linear, because the FLOPS cost in GPUs is so much lower than in a CPU system. It appears to me that what happens in a GPU (which is why Intel's hyperthreading is faster than just 4 hardware cores) is that when there are many logical threads, then thread blocks on main memory latency are not a factor, because some other thread can run which has already loaded its main memory access into cache. Thus the GPU is always able to achieve the 200+GB/s main memory throughput, because the latency is masked by the probability of numerous threads. My idea is the way to defeat this is require so much memory for the Scrypt that the GPU can not run enough threads to get that probability to work in its favor. (Or even better run in a larger main memory footprint than any known GPU can handle, but this is not an absolute since demand for GPUs from miners can in theory drive larger main memories). But to hash over such a large main memory footprint makes the hash slow, as MemoryCoin demonstrates at a claimed 1 second. Also CPU main memory bandwidth is an order-of-magnitude slower than for top-of-line GPUs. So my idea is to run an inner Scrypt (think of a for loop inside of a for loop) and outer Scrypt, linking them together in cryptographic sequentiality (will explain this later in pseudo-code), such that the entire algorithm becomes memory-bound at the speed of L1 cache, i.e. we force the CPU to compete with us on L1 cache speed not on FLOPS and main-memory bandwidth. So then the computation of the hash is very fast (relatively speaking to MemoryCoin's thrashing of main memory latency) and the GPU then has to compete on L1 cache speed and on having enough multiples of the main memory foot print to run that many multiples of extra threads. So then the GPU-cost-ROI becomes tied to GDDR memory cost versus CPU system cost. I worked some numbers, and the GPU can't gain an order-of-magnitude advantage in any theoretical case (not to mention market dynamics pragmaticism). Also we need to make the Salsa hash in the BlockMix run much more slowly than the Salsa hash in the ROMix so that GPUs can't use their superior FLOPS (and idle CU cores due to our large main memory footprint) to lower the memory requirements by recomputing at say modulo 4 of every element. I am willing to rewrite your Scrypt if you think my idea has merit? 4. Afaik, has made no advances on improved anonymity
Actually, MC has gone backwards on this a little by always sending change to first address in a wallet to support the voting system, and because this is more intuitive for new users. Generally with MC where there is a choice between ease of use and anonymity, I'll choose ease of use. I care about privacy too, but I care about adoption more. The privacy options of Bitcoin are obviously still there in the protocol, but users will need to take more care to ensure they are utilized. I am going to avoid talking about your holistic design because we have some disagreement, but it does not preclude me from helping you on the Scrypt refinements, which will help me prove them for the holistic coin design I want. Open source (sharing what is mutual, separating orthogonal concepts so we can) at its best!
|
|
|
|
jasinlee
|
|
August 18, 2013, 02:40:26 AM |
|
1. Got the Scrypt wrong and can still be supermined by GPUs
Thanks for your comments and links on this. I haven't had time to fully digest them - there's a lot of information there and there are some competing demands on my time. From what I've seen, adding cores does not produce linear improvements, even with the small numbers of cores in a CPU. But if MC is vulnerable to GPU mining, I'd like to see it fail fast, so I'd encourage you to post any ideas you have for how one might go about creating a GPU miner. 4. Afaik, has made no advances on improved anonymity
Actually, MC has gone backwards on this a little by always sending change to first address in a wallet to support the voting system, and because this is more intuitive for new users. Generally with MC where there is a choice between ease of use and anonymity, I'll choose ease of use. I care about privacy too, but I care about adoption more. The privacy options of Bitcoin are obviously still there in the protocol, but users will need to take more care to ensure they are utilized. Hey Freetrade, read the surrounding posts to where he linked. Be careful who you associate with, he has a tendency to expect worship....unless you were planning on changing religion then thats cool I guess.
|
|
|
|
FreeTrade (OP)
Legendary
Offline
Activity: 1470
Merit: 1030
|
|
August 18, 2013, 03:00:57 AM |
|
It doesn't need to be linear, because the FLOPS cost in GPUs is so much lower than in a CPU system.
It appears to me that what happens in a GPU (which is why Intel's hyperthreading is faster than just 4 hardware cores) is that when there are many logical threads, then thread blocks on main memory latency are not a factor, because some other thread can run which has already loaded its main memory access into cache. Thus the GPU is always able to achieve the 200+GB/s main memory throughput, because the latency is masked by the probability of numerous threads.
So it's not just that the improvements in adding cores are not linear . . . it's that they appear to be converging to an upper value, suggesting that they're hitting a limitation other than cycles per second. I'm assuming that's related to memory access, either cache memory size or main memory access. I'm assuming too that a GPU will hit that limit too, and any increased performance because of wider memory bus, speed or cache would be offset by the slower cores. You make other good points and I don't have the time to give them the attention they deserve immediately. I'm sure there are improvements to be made in the hashing algorithm to make it more GPU resistant, but I'm more concerned with having one that is 'good enough' rather than a perfect one that might fork the coin or cause loss of momentum. I am going to avoid talking about your holistic design because we have some disagreement,
Yes, there's a lot we disagree on. Have you considered implementing your ideas in a new coin? That's what I did when I decided all the other coins had it wrong!
|
RepNet is a reputational social network blockchain for uncensored Twitter/Reddit style discussion. 10% Interest On All Balances. 100% Distributed to Users and Developers.
|
|
|
FreeTrade (OP)
Legendary
Offline
Activity: 1470
Merit: 1030
|
|
August 18, 2013, 04:29:30 AM |
|
|
RepNet is a reputational social network blockchain for uncensored Twitter/Reddit style discussion. 10% Interest On All Balances. 100% Distributed to Users and Developers.
|
|
|
SlyWax
|
|
August 18, 2013, 04:53:53 AM Last edit: August 18, 2013, 06:42:53 AM by SlyWax |
|
Here is a new script, It's meant to recap every thing when you come back to your mining PC. What I like about it is that if I've mined some block since the last time I checked, it will display a blue bar with the number of coin mined for this period. There is always some suspense when I run it, and the good thing is I don't have to remember my last balance to see if I catched something. The output : path to .memorycoin -- the date/time you executed the sript New minted coin or balance change since last run (nothing if nothing new) Stats: { "blocks" : 2993, "currentblocksize" : 1000, "currentblocktx" : 0, "difficulty" : 0.00023274, "errors" : "", "generate" : true, "genproclimit" : X, "hashespersec" : xx.xxxxxx, "pooledtx" : 0, "testnet" : false } Confirmed Balance: xxx.xxxxxxxx Immature Balance: xxx.xxxxxxxx TX: x Total Balance: xxx.xxxxxxxx Last 3 minted date : date in your local time (system pref)Blocks since start ( date of start UTC ) : XXX Connections: XXX -- Save the sript below to a file named mpeek for example, type : chmod 700 mpeek to make it executable. To use it type ./mpeek or just mpeek if it's in your $PATH. replace the line "miner=memorycoind" with "miner=/pathTo/theNameOfYourExe" if you changed the name of the memorycoin executable. #!/bin/bash coindir=~/.memorycoin miner=memorycoind
echo ${coindir} -- $(date)
conf=$(${miner} getbalance) transaction=$( ${miner} listtransactions "" 9999 ) immature=$( echo "${transaction}" | grep -A1 immature | grep amount | awk -F':' '{ print $2 }'| awk -F',' '{ s1+=$1 }END { printf "%5.8f", s1 }') total=$(echo "scale=4;${conf} + ${immature}" | bc) mybalance=$(cat ${coindir}/mybalance)
if test ! "$mybalance" = "$total" then echo -en "\033[46m " minted=$(echo "scale=8;${total} - ${mybalance}" | bc) echo -e "${minted} \033[0m" echo -n "${total}">${coindir}/mybalance fi
echo "Stats: $(${miner} getmininginfo)" printf "Confirmed Balance: %*s\n" 13 "${conf}" printf "Immature Balance: %*s" 13 "${immature}" echo " TX: " $(echo "${transaction}" | grep immature | wc -l) printf "Total Balance: \033[32m%*s \033[0m \n" 13 "${total}" echo echo Last 3 minted date : echo "${transaction}" | egrep -A9 "\"generate\"|immature" | grep timereceived | tail -n 3 | cut --delimiter=":" --fields=2 | while read VAR ; do date -d "@$VAR" ; done echo
numligneA=$( grep -n "Startup time:" ${coindir}/debug.log |tail -n 1 ) numligne=$( echo "${numligneA}" | awk -F: '{printf $1}' ) numligneDate=$( echo "${numligneA}" | awk -F: '{print $3":"$4":"$5}' | sed 's/^ *//g' )
echo -en "Blocks since start ( \033[35m${numligneDate} UTC\033[0m ) : " tail -n +${numligne} ${coindir}/debug.log | grep SetBestChain: | grep -c date
echo "Connections: $(${miner} getconnectioncount)" echo
Edit : you have to run it once to initialize the balance.
|
|
|
|
AnonyMint
|
|
August 18, 2013, 05:45:00 AM |
|
It doesn't need to be linear, because the FLOPS cost in GPUs is so much lower than in a CPU system.
It appears to me that what happens in a GPU (which is why Intel's hyperthreading is faster than just 4 hardware cores) is that when there are many logical threads, then thread blocks on main memory latency are not a factor, because some other thread can run which has already loaded its main memory access into cache. Thus the GPU is always able to achieve the 200+GB/s main memory throughput, because the latency is masked by the probability of numerous threads.
So it's not just that the improvements in adding cores are not linear . . . it's that they appear to be converging to an upper value, suggesting that they're hitting a limitation other than cycles per second. I'm assuming that's related to memory access, either cache memory size or main memory access. I'm assuming too that a GPU will hit that limit too, and any increased performance because of wider memory bus, speed or cache would be offset by the slower cores. I am not sure until I run some tests, but my strong belief (based on good analysis) is that that you are hitting memory latency in your CPU with the 2MB Scrypt, but the GPU will not be. Thus it is going to toast your CPU. This is not a minor problem, rather I conjecture it is total failure to get what you intended. The reason I suspect this, is because the GPU can run 6 x 1024 MB ÷ 2 MB = 3096 threads and thus entirely mask away the latency. Whereas you do nothing to mask the latency on the CPU as far as I can see in your code. You make other good points and I don't have the time to give them the attention they deserve immediately. I'm sure there are improvements to be made in the hashing algorithm to make it more GPU resistant, but I'm more concerned with having one that is 'good enough' rather than a perfect one that might fork the coin or cause loss of momentum.
I think yours is worse than Litecoin's by a factor of 10 or 100, i.e. 100 to 1000 times slower CPU than the GPU since Litecoin is already 10 times slower CPU than GPU! At least Litecoin stays within L2 cache, and it is not just the memory bandwidth but the TLB and L2 cache reloads that run 100+ cycles that kill you. You appear to have a serious fail here, but I would need to build a CPU miner to test my hypothesis. I am going to avoid talking about your holistic design because we have some disagreement,
Yes, there's a lot we disagree on. Have you considered implementing your ideas in a new coin? That's what I did when I decided all the other coins had it wrong! Yes I am, but as I said, if I can help you and at the same time benefit from having my idea tested earlier, then it is win-win for both us, in spite of our disagreement on orthogonal features of MemoryCoin. There is no need for me to comment further about your grants. Best is you try it, then we all learn from the result. I wouldn't copy your grants, nor your 2% deflationary coin, so there is no competition coming from me against MemoryCoin on those features. I am delighted you introduced a coin with continuous inflation (albeit only 2%) and not ridiculously high as Inflatacoin, so if you get adoption then it opens the door for a coin with slightly higher rate of inflation. If instead you don't want to share synergy or simply have other priorities demanding your attention, then that is fine too.
|
|
|
|
AnonyMint
|
|
August 18, 2013, 05:57:46 AM |
|
I think it is better I let you launch as is, then later when I build my test suite, I will confirm if I was correct or incorrect.
Agreed?
I don't have the time immediately right now to create a build environment and compile and test your code, thus the best I could offer would be code or pseudo-code and without a test suite you are not likely to believe my hypothesis, thus you wouldn't be motivated to make my code work properly.
So at least I made you aware of this possibility, then we deal with it later when I have the goods in hand.
|
|
|
|
FreeTrade (OP)
Legendary
Offline
Activity: 1470
Merit: 1030
|
|
August 18, 2013, 06:02:37 AM |
|
I am not sure until I run some tests, but my strong belief (based on good analysis) is that that you are hitting memory latency in your CPU with the 2MB Scrypt, but the GPU will not be. Thus it is going to toast your CPU. This is not a minor problem, rather I conjecture it is total failure to get what you intended. The reason I suspect this, is because the GPU can run 6 x 1024 MB ÷ 2 MB = 3096 threads and thus entirely mask away the latency. Whereas you do nothing to mask the latency on the CPU as far as I can see in your code.
Well it's anticipated that specialized hardware will be able to run parts of the algorithm more efficiently. However to run the whole hash efficiently you'd need to run all three parts efficiently (four counting SHA512), and specialized hardware to do that would look a lot like . . . . . . a CPU. You appear to have a serious fail here, but I would need to build a CPU miner to test my hypothesis.
Interested in any diagnostics you're seeing on the CPU miners we've got available. If instead you don't want to share synergy or simply have other priorities demanding your attention, then that is fine too.
I would have been interested in a detailed discussion on these points 2 weeks ago when there was more scope for re-engineering. As it stands now, the show is on the road so I don't want to be switching the engine unless it's clear the one we've got is failing. If you can demonstrate that you will have the pleasure of my undivided attention.
|
RepNet is a reputational social network blockchain for uncensored Twitter/Reddit style discussion. 10% Interest On All Balances. 100% Distributed to Users and Developers.
|
|
|
SlyWax
|
|
August 18, 2013, 06:05:45 AM |
|
Now that we have some nice script, let's try something :
I'm calling you, the small miners, to make a revolution and take those grant from the hand of the wealthiest. We need to kick the butt of those hidden rich people who are taking our grant away!
That is why I'll open a new REVOLUTION POOL GRANT MINING.
The pool will give you back a % of the grant proportional to your vote (coin in the voting address). You have to vote for this pool address :
MVTEoEoiMwtXEeHDUYfuwA9ZvbKSH8Jfqb
The address has to be your first choice (send the smallest amount to this address), so that I can calculate your reward. Feel free to vote for other address like memorycoin foundation as second choice.
When we reach a grant, I'll calculate your share and send it to you. For the few first grant, I'll double check manually to see that every thing is working well (so expect some delay), then I'll go on fully automatic, so you get your reward earlier. For this work I'll only take 1 coin from the grant, and it might be even lower after some time if automatic works flawlessly.
PS: I'm reusing the previous grant address since it wasn't used anyway.
|
|
|
|
AnonyMint
|
|
August 18, 2013, 06:20:45 AM |
|
Well it's anticipated that specialized hardware will be able to run parts of the algorithm more efficiently. However to run the whole hash efficiently you'd need to run all three parts efficiently (four counting SHA512), and specialized hardware to do that would look a lot like . . . . . . a CPU.
I bet you find the 2MB Scrypt is the dominant time factor. I would have been interested in a detailed discussion on these points 2 weeks ago when there was more scope for re-engineering.
Understood. I just discovered my deeper insight into Scrypt and Litecoin this week. I will get back to you when I have something concrete to grab your attention, even if it proves I was wrong, so that the issue is resolved.
|
|
|
|
FreeTrade (OP)
Legendary
Offline
Activity: 1470
Merit: 1030
|
|
August 18, 2013, 07:31:20 AM |
|
I bet you find the 2MB Scrypt is the dominant time factor.
I bet you'll find (on a CPU) the 64MB Scrypt takes twelve times as long as the 2MB Scrypt which takes ten times as long as the 128K Scrypt, so that when the 128K Scrypt is run 120 times interleaved with the 2MB Scrypt that is run 12 times and the 64MB Scrypt that is run once, they each take 1/3 of the time required for the overall hash. I will get back to you when I have something concrete to grab your attention, even if it proves I was wrong, so that the issue is resolved.
Thanks, looking forward to it. Consider also that the clock is ticking for any GPU implementations. Should have 90% minted within a year, and 10% the next year. It would be regrettable if GPUs or ASICs were to capture the 10% and/or the 2%/year distribution, but not catastrophic.
|
RepNet is a reputational social network blockchain for uncensored Twitter/Reddit style discussion. 10% Interest On All Balances. 100% Distributed to Users and Developers.
|
|
|
maxocoin
Member
Offline
Activity: 102
Merit: 11
|
|
August 18, 2013, 12:14:23 PM |
|
Hi,
Something is wrong :-(
I started my client with the -gen switch and started mining, after a night I got 2 blocks and the client crashed.
After start it again and synchronize with the network, I don't have my 45 MCoins any more :-(
I was checking the get miner info from time to time and was geting around 10 mh and the "green" tick was alwas on. I thought that I was on the safe side.
How can I check that everything is fine, and not get the sad surprise the day after?
Cheers MC
|
|
|
|
AnonyMint
|
|
August 18, 2013, 01:27:16 PM |
|
I bet you find the 2MB Scrypt is the dominant time factor.
I bet you'll find (on a CPU) the 64MB Scrypt takes twelve times as long as the 2MB Scrypt which takes ten times as long as the 128K Scrypt, so that when the 128K Scrypt is run 120 times interleaved with the 2MB Scrypt that is run 12 times and the 64MB Scrypt that is run once, they each take 1/3 of the time required for the overall hash. I meant the largest memory Scrypt you are using. So if 64MB, then it would use the most time agreed if you were running them each the same number of times. Now I see you are running the shorter Scrypts more times to compensate. And still the 6GB on the GPU means it can run 96 threads on the 64MB Scrypt to defeat memory latency (and take advantage of the 10 times higher main memory bandwidth in the GPU), whereas the CPU can't. So 2/3 of your scrypts the GPU can trounce the CPU by 100 to 1000 times in theory. I am expecting roughly, 0.33 x 10 + 0.67 x 100 (or 1000), so roughly the GPU 70 - 700 times faster. I will get back to you when I have something concrete to grab your attention, even if it proves I was wrong, so that the issue is resolved.
Thanks, looking forward to it. Consider also that the clock is ticking for any GPU implementations. Should have 90% minted within a year, and 10% the next year. It would be regrettable if GPUs or ASICs were to capture the 10% and/or the 2%/year distribution, but not catastrophic. I am thinking 99% of the 2% per annum could go to GPUs. Tangentially, I don't understand how you expect to maintain interest in your coin for enough years for it to gain traction, when 90% will already will awarded in one year. You need new adopters continuously, not just the few who got in early.
|
|
|
|
Stinky_Pete
|
|
August 18, 2013, 01:45:44 PM |
|
Is there a way to find the results of rounds of voting - either just the latest, or historical?
|
|
|
|
FreeTrade (OP)
Legendary
Offline
Activity: 1470
Merit: 1030
|
|
August 18, 2013, 02:46:42 PM |
|
Is there a way to find the results of rounds of voting - either just the latest, or historical?
Start the latest version with the -debugvote=1 switch - should have these available on the web soon.
|
RepNet is a reputational social network blockchain for uncensored Twitter/Reddit style discussion. 10% Interest On All Balances. 100% Distributed to Users and Developers.
|
|
|
FreeTrade (OP)
Legendary
Offline
Activity: 1470
Merit: 1030
|
|
August 18, 2013, 06:45:28 PM |
|
|
RepNet is a reputational social network blockchain for uncensored Twitter/Reddit style discussion. 10% Interest On All Balances. 100% Distributed to Users and Developers.
|
|
|
B.T.Coin
|
|
August 19, 2013, 09:54:22 AM |
|
Is the stability issue still being worked on? New releases used to come every day but now I haven't seen one for a while. The software has gotten a bit more reliable with the last few releases but I still get runtime errors or it hangs on a grant-block. My 6-core and 4-core still cant run all night long without stopping Yes - got an update here and I'm going to be posting further updates here - this thread is becoming very crowded with many different issues, and I don't want to spam the main altcoins forum with continual MC posts - http://21stcenturymoneytalk.org/index.php/topic,38.0.htmlI see a list with known problems there, but no new client I have taken my 6-core and 4-core machines off this coin and put them on something that runs overnight without crashing. Looking at the difficulty that has dropped down again, more miners have done this too or their machines hang and they don't know they are not mining. As soon as there is a stable release I will get back on board again.
|
A fine is a tax you pay for something you did wrong. A tax is a fine you pay for something you did right.
|
|
|
SlyWax
|
|
August 19, 2013, 11:00:51 AM |
|
Now that we have some nice script, let's try something :
I'm calling you, the small miners, to make a revolution and take those grant from the hand of the wealthiest. We need to kick the butt of those hidden rich people who are taking our grant away!
That is why I'll open a new REVOLUTION POOL GRANT MINING.
The pool will give you back a % of the grant proportional to your vote (coin in the voting address). You have to vote for this pool address :
MVTEoEoiMwtXEeHDUYfuwA9ZvbKSH8Jfqb
The address has to be your first choice (send the smallest amount to this address), so that I can calculate your reward. Feel free to vote for other address like memorycoin foundation as second choice.
When we reach a grant, I'll calculate your share and send it to you. For the few first grant, I'll double check manually to see that every thing is working well (so expect some delay), then I'll go on fully automatic, so you get your reward earlier. For this work I'll only take 1 coin from the grant, and it might be even lower after some time if automatic works flawlessly.
PS: I'm reusing the previous grant address since it wasn't used anyway.
We have already 4667 votes in one day. We can do it ! Don't miss your spot and get the reward by voting for the pool now.
|
|
|
|
|