wknight
Legendary
Offline
Activity: 889
Merit: 1000
Bitcoin calls me an Orphan
|
|
February 10, 2012, 06:27:49 PM |
|
Well.. since everyone knows now that cobblee and artforz have a top secret gpu miner.. and i am part of the dozen people who are allowed to use it.. let me share it with all of you: Code: The miner is written in plain C# and doesnt need any special libraries besides what you have anyway. I havent tested it, but since it is written in MONO on windows it might be cross platform compatible... For more hash you have to tweak lines 31 and 32.. but dont expect too much, its just a preview of what cobblefart is.. By the way.. fun is: You can set it to any pool you like.. its speed sucks on bitcoin but is a bomb on Solidcoin and Scrypt - and its wattage is awesome. Not much more than simple idle power consumption at 200kh Code: http://pastebin.com/aCR53QcYWe have found a bug in the code to make it look like we are mining at 1 MH/S on a simple dual core.. however only 1 share was submitted.. again.. no real proof
|
Mining Both Bitcoin and Litecoin.
|
|
|
coblee
Donator
Legendary
Offline
Activity: 1654
Merit: 1350
Creator of Litecoin. Cryptocurrency enthusiast.
|
|
February 10, 2012, 06:43:46 PM |
|
I also don't really care about the price because I'm not in this for the money, but on last check the price of each SC is nearly 4x that of LiteCoin.
Seriously CoinHunter. If you didn't care about price, then why do you keep manipulating block rewards to try to reach dollar parity? SC block rewards is now only ~0.05 (1/1000 of Litecoin). And SC price is only about 3x that of LTC. That's rather weak in my opinion. Maybe that's why you come here to spread your FUD?
|
|
|
|
Roadhog2k5
|
|
February 10, 2012, 07:21:20 PM |
|
I also don't really care about the price because I'm not in this for the money, but on last check the price of each SC is nearly 4x that of LiteCoin.
Seriously CoinHunter. If you didn't care about price, then why do you keep manipulating block rewards to try to reach dollar parity? SC block rewards is now only ~0.05 (1/1000 of Litecoin). And SC price is only about 3x that of LTC. That's rather weak in my opinion. Maybe that's why you come here to spread your FUD? He needs someone to pick on since he is butthurt that he killed solidcoin again.
|
|
|
|
Snapman
|
|
February 10, 2012, 07:33:35 PM |
|
Typical coinhunter, continuing to be worthless to the crypto currency community.
Are you really that sad, your coin comes to a crashing halt; and your response instead of improving your system, is bashing another. You would do quite well in Washington, DC (slumming it with the underlords of douche baggery)
|
BTCRadio: 17cafKShokyQCbaNuzaDo5HLoSnffMNPAs
|
|
|
Gabi
Legendary
Offline
Activity: 1148
Merit: 1008
If you want to walk on water, get out of the boat
|
|
February 10, 2012, 07:36:53 PM |
|
Another pathetic tentative to attack litecoin? Ridicolous. Oh and i have made some dozens of full custom chips at 28nm specifically tailored to mine bitcoin and litecoins. 1 terahash per chip. True story.
|
|
|
|
StewartJ
|
|
February 10, 2012, 08:08:34 PM |
|
I've learned to ignore the solidcoin shills.
They are actually bots programmed to rile you.
|
|
|
|
Mousepotato
|
|
February 10, 2012, 08:14:25 PM |
|
Oh and i have made some dozens of full custom chips at 28nm specifically tailored to mine bitcoin and litecoins. 1 terahash per chip. True story.
Ships in 4-6 weeks?
|
Mousepotato
|
|
|
wknight
Legendary
Offline
Activity: 889
Merit: 1000
Bitcoin calls me an Orphan
|
|
February 10, 2012, 08:17:21 PM Last edit: February 10, 2012, 08:49:01 PM by wknight |
|
Ohh my goodness.. look out.. We have found the ultimate GPU miner for litecoin!!!! We did this with a GeForce mx2
|
Mining Both Bitcoin and Litecoin.
|
|
|
Schwede65
|
|
February 10, 2012, 09:28:31 PM |
|
It currently gets ~250KH/s on a 6990 .
the ltc-mining cpu- and gpu-effectivity w/kh seem to be simultan: core i3 / 3.1 ghz / 3 threads / ~ 15 w only for this / 12.5 kh/s => 1.2 w/khcore i7 / 3.6 ghz / 7 threads / ~ 46 w only for this / 32 kh/s => 1.437 w/kh6990 / ~350 w (don't know the correct watt) for 250 kh/s => 1.4 w/khbut with more improvement there will be an even better w/kh-rate for the 6990 and the other mining-cards so the end of ltc-cpu-mining is not too far away because one 6990 does the job of ~8 core-i7 or ~20 core i3
|
|
|
|
DeathAndTaxes
Donator
Legendary
Offline
Activity: 1218
Merit: 1079
Gerald Davis
|
|
February 10, 2012, 09:39:56 PM |
|
It currently gets ~250KH/s on a 6990 .
the ltc-mining cpu- and gpu-effectivity w/kh seem to be simultan: core i3 / 3.1 ghz / 3 threads / ~ 15 w only for this / 12.5 kh/s => 1.2 w/khcore i7 / 3.6 ghz / 7 threads / ~ 46 w only for this / 32 kh/s => 1.437 w/kh6990 / ~350 w (don't know the correct watt) for 250 kh/s => 1.4 w/khbut with more improvement there will be an even better w/kh-rate for the 6990 and the other mining-cards so the end of ltc-cpu-mining is not too far away because one 6990 does the job of ~8 core-i7 or ~20 core i3 A unicorn can run 500 miles on one gallon (of beer). A gallon of beer is more expensive than a gallon of gas but likely we are still seeing the end of hybrid cars because one unicorn is equal to almost 10x Toyota Prius. Plus with improvement in unicorn-beer technology you are getting even better m/goa rate (thats miles to gallons of alcohol).
|
|
|
|
Mousepotato
|
|
February 10, 2012, 09:48:06 PM |
|
At 250 kH/s with your 6990 your daily yield is right around 160 LTC at current difficulty (estimating with http://www.litecoinpool.org/stats, and ignoring pool fees of course). At an exchange rate of .002802 BTC per LTC that means you can exchange your daily take of LTC for roughly .44 BTC. Now if you were mining straight BTC with that 6990, you'd be doing around 820 MH/s which yields around .60 BTC per day, maybe a little more. I'm guessing that you'd need that GPU Litecoin miner to hit around 350 kH/s or so before you start breaking even versus mining straight BTC.
|
Mousepotato
|
|
|
ssvb
Newbie
Offline
Activity: 39
Merit: 0
|
|
February 10, 2012, 09:48:41 PM |
|
I am no GPGPU expert, but I think ArtForz made some very good points in the following thread: https://bitcointalk.org/index.php?topic=45849.0CoinHunter could make his claims convincing by simply explaining how to address the GPU limitations outlined by ArtForz. At least ArtForz was mistaken about Cell earlier Just let's do some simple math. Playstation3 has 6 SPE cores, each clocked at 3.2GHz and 25GB/s of total memory bandwidth. Calculating one hash needs approximately 434176 ADD/ROL/XOR operations on 128-bit vectors in the performance critical part of salsa20/8 which are executed in the even pipe (shuffles and the other instructions are executed in the odd pipe). Also calculating one hash needs 256KB of memory bandwidth (128KB is written sequentially, 128KB is read in scattered 128-byte chunks). So taking into account that SPE core can execute one instruction from the even pipe each cycle, the theoretical performance limit based on computational power is (6 * 3200000000) / 434176 ~= 44.2 khash/s. The theoretical performance limit based on memory bandwidth is 25GB / 256KB ~= 95.4 khash/s. There is a lot of headroom for the memory bandwidth and arithmetic calculations are the bottleneck. Though Cell has precise control over memory operations by scheduling DMA transfers and can overlap DMA transfers with calculations. This allows to utilize memory bandwidth very efficiently for scrypt algorithm. This page seems to say that HD 6990 has 320GB/s of memory bandwidth. And here ArtForz tells us that it is possible to achieve < 20% peak BW with GPU. Doing some math again, we get 320GB * 0.2 / 256KB ~= 244 khash/s. Looks rather believable to me. edit: corrected HD 6990 memory bandwidth (it is 320GB/s and not 350GB/s)
|
|
|
|
Mousepotato
|
|
February 10, 2012, 09:53:12 PM |
|
At least ArtForz was mistaken about Cell earlier Just let's do some simple math. Playstation3 has 6 SPE cores, each clocked at 3.2GHz and 25GB/s of total memory bandwidth. Calculating one hash needs approximately 434176 ADD/ROL/XOR operations on 128-bit vectors in the performance critical part of salsa20/8 which are executed in the even pipe (shuffles and the other instructions are executed in the odd pipe). Also calculating one hash needs 256KB of memory bandwidth (128KB is written sequentially, 128KB is read in scattered 128-byte chunks). So taking into account that SPE core can execute one instruction from the even pipe each cycle, the theoretical performance limit based on computational power is (6 * 3200000000) / 434176 ~= 44.2 khash/s. The theoretical performance limit based on memory bandwidth is 25GB / 256KB ~= 95.4 khash/s. There is a lot of headroom for the memory bandwidth and arithmetic calculations are the bottleneck. Though Cell has precise control over memory operations by scheduling DMA transfers and can overlap DMA transfers with calculations. This allows to utilize memory bandwidth very efficiently for scrypt algorithm. This page seems to say that HD 6990 has 350GB/s of memory bandwidth. And here ArtForz tells us that it is possible to achieve < 20% peak BW with GPU. Doing some math again, we get 350GB * 0.2 / 256KB ~= 267 khash/s. Looks rather believable to me. Check out the big brain on Brett!
|
Mousepotato
|
|
|
DeathAndTaxes
Donator
Legendary
Offline
Activity: 1218
Merit: 1079
Gerald Davis
|
|
February 10, 2012, 10:06:08 PM |
|
I am no GPGPU expert, but I think ArtForz made some very good points in the following thread: https://bitcointalk.org/index.php?topic=45849.0CoinHunter could make his claims convincing by simply explaining how to address the GPU limitations outlined by ArtForz. At least ArtForz was mistaken about Cell earlier Just let's do some simple math. Playstation3 has 6 SPE cores, each clocked at 3.2GHz and 25GB/s of total memory bandwidth. Calculating one hash needs approximately 434176 ADD/ROL/XOR operations on 128-bit vectors in the performance critical part of salsa20/8 which are executed in the even pipe (shuffles and the other instructions are executed in the odd pipe). Also calculating one hash needs 256KB of memory bandwidth (128KB is written sequentially, 128KB is read in scattered 128-byte chunks). So taking into account that SPE core can execute one instruction from the even pipe each cycle, the theoretical performance limit based on computational power is (6 * 3200000000) / 434176 ~= 44.2 khash/s. The theoretical performance limit based on memory bandwidth is 25GB / 256KB ~= 95.4 khash/s. There is a lot of headroom for the memory bandwidth and arithmetic calculations are the bottleneck. Though Cell has precise control over memory operations by scheduling DMA transfers and can overlap DMA transfers with calculations. This allows to utilize memory bandwidth very efficiently for scrypt algorithm. This page seems to say that HD 6990 has 320GB/s of memory bandwidth. And here ArtForz tells us that it is possible to achieve < 20% peak BW with GPU. Doing some math again, we get 320GB * 0.2 / 256KB ~= 244 khash/s. Looks rather believable to me. edit: corrected HD 6990 memory bandwidth (it is 320GB/s and not 350GB/s) Bandwidth has nothing to do w/ scrypt. LATENCY does. Which is why the amount of L1 cache is so important.
|
|
|
|
tacotime
Legendary
Offline
Activity: 1484
Merit: 1005
|
|
February 10, 2012, 10:20:35 PM |
|
I am no GPGPU expert, but I think ArtForz made some very good points in the following thread: https://bitcointalk.org/index.php?topic=45849.0CoinHunter could make his claims convincing by simply explaining how to address the GPU limitations outlined by ArtForz. At least ArtForz was mistaken about Cell earlier Just let's do some simple math. Playstation3 has 6 SPE cores, each clocked at 3.2GHz and 25GB/s of total memory bandwidth. Calculating one hash needs approximately 434176 ADD/ROL/XOR operations on 128-bit vectors in the performance critical part of salsa20/8 which are executed in the even pipe (shuffles and the other instructions are executed in the odd pipe). Also calculating one hash needs 256KB of memory bandwidth (128KB is written sequentially, 128KB is read in scattered 128-byte chunks). So taking into account that SPE core can execute one instruction from the even pipe each cycle, the theoretical performance limit based on computational power is (6 * 3200000000) / 434176 ~= 44.2 khash/s. The theoretical performance limit based on memory bandwidth is 25GB / 256KB ~= 95.4 khash/s. There is a lot of headroom for the memory bandwidth and arithmetic calculations are the bottleneck. Though Cell has precise control over memory operations by scheduling DMA transfers and can overlap DMA transfers with calculations. This allows to utilize memory bandwidth very efficiently for scrypt algorithm. This page seems to say that HD 6990 has 320GB/s of memory bandwidth. And here ArtForz tells us that it is possible to achieve < 20% peak BW with GPU. Doing some math again, we get 320GB * 0.2 / 256KB ~= 244 khash/s. Looks rather believable to me. edit: corrected HD 6990 memory bandwidth (it is 320GB/s and not 350GB/s) What about smix and the mul operations in scrypt? I thought the reason for the speed of Cell as implemented in PS3 (~35 kh/s) was do to the 256kb onboard local registers... The slowdown in scrypt(1024,1,1) has little to do with the speed of the memory and everything to do with the speed of random accesses to that memory. Cache (or onboard memory in the case of Cell) is way, way faster in terms of random access to data (L1 and L2 are 4 and 10 clock cycles respectively for an I7). With DRAM memory, random access is never efficient. In fact, the GPU hardware looks at all memory addresses that the running threads want to access at a given cycle, and attempts to coalesce them into a single DRAM access - in case they are not random. Effectively the contiguous range from i to i+#threads is reverse-engineered from the explicitly computed i,i+1,i+2… - another cost of replicating the index in the first place. If the indexes are in fact random and can not be coalesced, the performance loss depends on “the degree of randomness”. This loss results from the DRAM architecture quite directly, the GPU being unable to do much about it - similarly to any other processor. http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.htmlGPUs generally have little onboard cache (16-32kb) because the data they process is intended to be sequential (and it usually is for 3D applications).
|
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
|
|
|
ssvb
Newbie
Offline
Activity: 39
Merit: 0
|
|
February 10, 2012, 10:31:58 PM |
|
Bandwidth has nothing to do w/ scrypt. LATENCY does. Which is why the amount of L1 cache is so important.
L1 cache is just less important than you think For example, my scrypt miner optimizations for Cell do not use 256KB of fast local memory at all. It is insufficient for 4x unrolling which is needed in order to eliminate pipeline stalls and at least half of the performance would be lost. But scrypt is not memory heavy enough, so I can easily get away working with the main memory and still have a lot of memory bandwidth headroom. LATENCY is not important in my case, because memory accesses are pipelined, get executed asynchronously and do not block execution. But you can check scrypt_spu_core8 function in the code yourself. If GPUs have excessive computational resources, then even waiting for memory a lot of time (80% or so per each execution core) is likely not a problem as long as all of them are competing for the precious memory bandwidth and fully saturating it. I did not think about GPU mining earlier just because I did not have any experience with GPU programming and honestly did not expect them to have that much memory bandwidth (more than 10x advantage over Cell).
|
|
|
|
RoloTonyBrownTown
|
|
February 10, 2012, 10:37:27 PM |
|
What's funny to me about all this is that gpu mining the coin early before anyone else could is exactly what coinhunter did with SC . Such a douche. You should be banned from this forum.
|
|
|
|
Mousepotato
|
|
February 10, 2012, 10:47:05 PM |
|
What's funny to me about all this is that gpu mining the coin early before anyone else could is exactly what coinhunter did with SC . Such a douche. You should be banned from this forum. Well.. yeah, SC started out as a GPU chain.
|
Mousepotato
|
|
|
P4man
|
|
February 10, 2012, 10:53:14 PM |
|
So I guess CH is really telling us its time to quickly buy some litecoins, because difficulty is about to explode .
|
|
|
|
Joshwaa
|
|
February 10, 2012, 10:57:27 PM |
|
It currently gets ~250KH/s on a 6990 .
the ltc-mining cpu- and gpu-effectivity w/kh seem to be simultan: core i3 / 3.1 ghz / 3 threads / ~ 15 w only for this / 12.5 kh/s => 1.2 w/khcore i7 / 3.6 ghz / 7 threads / ~ 46 w only for this / 32 kh/s => 1.437 w/kh6990 / ~350 w (don't know the correct watt) for 250 kh/s => 1.4 w/khbut with more improvement there will be an even better w/kh-rate for the 6990 and the other mining-cards so the end of ltc-cpu-mining is not too far away because one 6990 does the job of ~8 core-i7 or ~20 core i3 A unicorn can run 500 miles on one gallon (of beer). A gallon of beer is more expensive than a gallon of gas but likely we are still seeing the end of hybrid cars because one unicorn is equal to almost 10x Toyota Prius. Plus with improvement in unicorn-beer technology you are getting even better m/goa rate (thats miles to gallons of alcohol). OMG I laughed for about 4 mins from that one. Priceless!
|
|
|
|
|