Click the following link for details:
https://bitcointalk.org/index.php?topic=45849.msg2940005#msg2940005
Something I've wondered:
Why are you using N=1024, r=1, and p=1 for scrypt? Why didn't the recommended values from the paper, N=1024, r=8, p=1 get used?
If I remember correctly, ArtForz said that the parameters (1024, 1, 1) resulted in a lower GPU/CPU performance ratio.
Some analysis by him can be found here:
https://bitcointalk.org/index.php?topic=45849.0I have addressed this point in my link above.
From what I know of the gpu miner, option 3 of modifying the scrypt parameter will have minimal impact. The pad size did not seem to matter much, and can be compressed for lack of a better word, with on the fly value reconstruction. So any increase in pad size will have a relatively equal impact on cpu miners until you exceed their cache size, at which point, gpus may become even more efficient.
I think you will be stuck with option 2, finding a completely different hashing algorithm.
Until you put an Scrypt inside of an Scrypt, such that the inner stays in the cache. See my link above.
Are you saying he has disproved the sequential memory hardness for the ROMix algorithm from the original scrypt paper?
No, apparently the issue is the relative memory bandwidths of the different features of hardware (and the ability to hide memory latency in multitheading) and the original Scrypt sequential memory hard proof doesn't factor that. My link above proposes a way to elevate the CPU's cache to a large memory size to overcome the discrepancy.
Anyway, this outlines the difficulty in actually a creating an algorithm that can not easily be serialized so that it will run faster on a CPU than a GPU.
See my link above for an idea for the algorithm.
well without decent network hashrate it can be attacked easy by any botnet ...
See my link above for another idea of how to eliminate botnets with a CPU-only coin.
any further thoughts on this?
I think it would be good to be a cpu only coin again,
tho the retards have already said they will enjoy the challenge of getting it working on gpu...
IMO Litecoin loses its point unless its CPU only
The speedup over a CPU is less than an order of magnitude. It's not fatal.
Appears to be greater than an order-of-magnitude. See my link above.
2)
I suppose that increasing the memory size parameter of scrypt to a very large amount (megabytes...) which doesn't fit in the cache would mean that it'd be infeasible to do hash attempts in parallel with a GPU (and maybe even with several CPU cores), but it also most likely means that people couldn't use their computer to do other stuff while mining litecoins due to system responsiveness issues. Therefore it's possible that the current scrypt parameters as chosen by ArtForz and Lolcust are the best, espeically if bitcoin GPU mining remains more profitable than litecoin GPU mining.
Either the CPU is compute-bound (in small cache memory) or memory-bound in large memory. Either way you can't use your computer for other work that requires the same bound if you want to get maximum hashing rate.
IMO Litecoin loses its point unless its CPU only
Clearly, yes. Now instead of being more accessible to everyone than BTC (because everyone has an okay CPU while not everyone has an okay GPU), LTC is accessible only to the few who are lucky enough to have the GPU miner working properly... Huge step backwards. The only positive effect is that it seems the BTC hashrate lowered a bit recently.
I became interested in mining a second cryptocurrency only because tennebrix (and then others) came up with a way to make CPU mining viable again... I figured, why not put the two or three decent machines I run at home, as well as a couple strays at my shop, onto a useful task instead of simply letting them sit around growing slowly more obsolete by the day? I suspect I represent a fairly typical LTC/SC enthusiast, in that regard.
Put bluntly, if GPU mining becomes viable for LTC and/or SC, their entire raison d'etre vanishes.
Since I was asked to clarify, "significantly more efficient", I guess I will post some hash per watt numbers.
According to litecoin wiki mining hardware comparison, an AMD Phenom X4 955 at 3.6ghz gets 24kh @ 125 watts. This translates to 0.192kh per watt.
A gpu rig consisting of 69xx series gpus can produce 998kh @ 920 watts at the wall. This translates to 1.08kh per watt.
So does at least a 5.6 factor increase in *efficiency* qualify as "significantly more"?
Consider the litecoin wiki entry for the Intel Core i7 860 which produces 25kh at 153 watts (a believable wattage consumption for the entire system). It gives a system kh/watt score of only 0.163. The gpu example is now a factor of 6.6 times more efficient.
PS, Mtrlt has gotten better kh/watt scores by playing with the clocks and voltages, but I figured I would give you an initial test result.
More hardware comparisons for Litecoin:
http://litecoin.info/Mining_Hardware_Comparisonhttp://coinpolice.com/gpu/
coblee,
Given that:
* in a couple of years every (consumer) CPU sold will have opencl-capable integrated graphics
* bitcoin mining will move more and more towards FPGAs/ASICs
I don't think any changes are necessary. Very soon anybody will have a computer capable of GPU-mining Litecoin. Also, since eventually people will stop using GPUs to mine bitcoin, the swings in difficulty from people switching between the two chains won't be a problem.
As far as I can see, the consumer-grade GPU on the CPU motherboard won't be the threat that exists for the stand-alone cards which will always have order-of-magnitude greater
large memory bandwidth, unless the motherboard becomes something of an amalgamation with only GDDR5 memory, e.g. the Sony PS4 (see link with the post that I linked at top of this post of mine). If and when that ever become ubiquitous, then the CPU-only coin will still be mined competitively by the amalgamated system.
One thing (somewhat theoretical) I would throw out there is that as GPUs become more "CPU like" they will devote the necessary resources (transistors and chip yield) to increased L1 cache. GPU long since outstripped the growth in pixel counts so they devoted more resources to improved image quality at a fixed number of pixels and/or polygons. The GPU resources are growing faster than developers ability to use them as devising more complex and realistic "effects" requires more human capital than simply doubling the polygon count or going from a 800x600 pixel count to 1920x1200 pixel count. So it will be increases in GPGPU workload which increasingly drives development of future GPUs.
Given my idea of nested Scrypt, if the GPU has adequate L1 cache per CU, the problem remains that if I set the parameters to be for example 4 cores running 32KB inner scrypt, with 1.5GB outer scrypt, then CPU (with 6GB GDDR RAM) can only employ 4 of its CU (cores). So it can only use a fraction of its hardware.
For example, the HD 7970 has 32 CU (cores) each with 16 KB L1 cache and 24 KB L2 cache running
at 2TB/s and 0.7 TB/s respectively. But the Intel Haswell Core i7/i5 has 4 cores with 32 KB L1 cache and 256 KB L2 cache running
at 1TB/s and 0.33 TB/s respectively.
So if the coin requires 32 KB inner Scrypt, then HD 7970 is going to be at the 0.25 to 0.5 TB/s of the GDDR RAM but with much latency and only 4 threads so much slower than the CPU. Even if the coin requires only 16 KB inner Scrypt or later version of the GPU has 32 KB L1 cache, the GPU is still going to be employing only 4 threads same as for the CPU, but may run at twice the speed of the CPU because of the double L1 cache speed.
GPU traditionally had very little local cache because there is no need when performing traditional graphics work. That dynamic likely won't hold true in the future. NVidia Tesla cards for example can be configured to double the amount of L1 cache because it is so useful in boosting performance of some GPGPU functionality. Larger L1 caches will eventually trickle down into consumer grade products to.
My other idea is to force the total memory requirement of the outer Scrypt higher than any GPU, since I know of no GPU which allows addon GDDR memory. There is no retail market for GDDR memory.
No coin today is "anti-GPU" it is they can be described as "large L1 cache dependent".
Not "today", but my idea is the nested Scrypt idea should make them more "anti-GPU" when coupled with a large L1 or L2 cache dependent.
Employing the 256 KB L2 cache of the Intel Core family would mean the HD 7970 can only run three threads and still be in L2 cache but its L2 cache is 2X faster than the CPU, so 2 x 3/4 = 3/2 speed of the CPU. Or the HD 7970 could run 4 threads being main memory bound which has comparable bandwidth but memory latency would accumulate, so slower than the CPU.
SolidCoin's Hashing Algorithm
Actually the SolidCoin hash was developed to be fairly equal in performance on GPUs and CPUs, watt for watt. It is currently delivering that and has been for some time (with a small favor to CPUs). SolidCoin takes all consumer hardware that is viable so we can let the widest range of people mine it fairly. Unlike Bitcoin which is going to be FPGA soon and Litecoin (which was supposed to be CPU only and is now a GPU coin), we want everyone to be able to mine.
What looks interesting is that they still claim the SC2 algorithm to be GPU-resistant. I'm not at all convinced. Any technical opinion on this?
It's not GPU resistant, but random reads on constant 4MB buffer make the difference of CPU/GPU slightly lower cause GPU lucks on a lot of cache. But it's still 4-6 times faster on GPU as on CPU
I see nothing correct in your Hashing Algorithm, i've implemented 2.5 faster Version of miner for it (CPU) and due lack of time and profit for mining there is even 3x-8x better implementation of GPU miner for Solidcoin
so it's nothing other
That
SolidCoin link shows roughly the same advantage for the AMD GPU HD 7970 over the Intel CPU iCore as for Litecoin.
I didn't take the time to study the
linked SolidCoin hashing algorithm, but if it is based on a claimed advantage of randomized memory latency, note the point I make in my link at the top, is that this latency can be hidden by the multithreading of many threads.