http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/2Because NVIDIA has essentially traded a fewer number of higher clocked units for a larger number of lower clocked units, NVIDIA had to go in and double the size of each functional unit inside their SM. Whereas a block of 16 CUDA cores would do when there was a shader clock, now a full 32 CUDA cores are necessary. The same is true for the load/store units and the special function units, all of which have been doubled in size in order to adjust for the lack of a shader clock. Consequently, this is why we can’t just immediately compare the CUDA core count of GK104 and GF114 and call GK104 4 times as powerful; half of that additional hardware is just to make up for the lack of a shader clock. But of course NVIDIA didn’t stop there, as swapping out the shader clock for larger functional units only gives us the same throughput in the end. After doubling the size of the functional units in a SM, NVIDIA then doubled the number of functional units in each SM in order to grow the performance of the SM itself. 3 groups of CUDA cores became 6 groups of CUDA cores, 2 groups of load/store units, 16 texture units, etc. At the same time, with twice as many functional units NVIDIA also doubled the other execution resources, with 2 warp schedulers becoming 4 warp schedulers, and the register file being doubled from 32K entries to 64K entries.
|
|
|
Let's see... A single GTX 580 with 512 shaders at default clocks (core 772MHz, shaders 1544MHz) gives around 140MH/s. It has 2 warp schedulers (each doing 2 instructions per core clock cycle) per SM. A GTX 680 has 4 warp schedulers per SMX, so the theoretical performance should be around 140*(2*1536*1058)/(512*1544) = 576MH/s. http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/2In GF114 each SM contained 48 CUDA cores, with the 48 cores organized into 3 groups of 16. Joining those 3 groups of CUDA cores were 16 load/store units, 16 interpolation SFUs, 8 special function SFUs, and 8 texture units. Feeding all of those blocks was a pair of warp schedulers, each of which could issue up to 2 instructions per core clock cycle, for a total of up to 4 instructions in flight at any given time. http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/2Ultimately where the doubling of the size of the functional units allowed NVIDIA to drop the shader clock, it’s the second doubling of resources that makes GK104 much more powerful than GF114. The SMX is in nearly every significant way twice as powerful as a GF114 SM. At the end of the day NVIDIA already had a strong architecture in Fermi, so with Kepler they’ve gone and done the most logical thing to improve their performance: they’ve simply doubled Fermi. Altogether the SMX now has 15 functional units that the warp schedulers can call on. Each of the 4 schedulers in turn can issue up to 2 instructions per clock if there’s ILP to be extracted from their respective warps, allowing the schedulers as a whole to issue instructions to up to 8 of the 15 functional units in any clock cycle.
|
|
|
Newegg listed "Shader clock: 2012MHz".
Interesting... 1536 ALUs at 2012MHz would give it more processing power than a HD 6990. Either the GTX 680 is going to be the fastest GPU for mining, or it is a blatant mistake from Newegg or the microarchitecture has undisclosed limitations that would prevent exploiting all this apparent power.
Per the NDA lifted today, this was a blatant error. The ALUs will run at 1006-1058 MHz, which should allow this card to mine at an upper bound of 450-470 Mh/s (80-85% the speed of a HD 7970.) This is assuming of course thar Nvidia added a BFI_INT-like instruction to the architecture, which is not certain. If not, performance would be much lower... WOW. This would be heaven if true. Can you say 14 GPU Nvidia mining rig with a relatively cheap motherboard ? The only thing that may kill it is the crap MH/$ value but prices will drop anyway ... http://www.techpowerup.com/162935/ZOTAC-Working-On-GeForce-GTX-680-with-2-GHz-Core-Clock-Speed.html
|
|
|
The Kepler architecture whitepaper should clarify those issues, I don't think they have released it yet? (...) (Unless an expert CUDA programmer can map out bitcoin mining effectively over 1536 nVidia streams?)
The CUDA compiler should take care of that successfully. Now we just need someone with a GTX 680 to run Ufasoft's miner, it will use CUDA by default on nVidia hardware. EDIT: http://www.geforce.com/Active/en_US/en_US/pdf/GeForce-GTX-680-Whitepaper-FINAL.pdf Still, no specific integer references I could see, we need real performance numbers.
|
|
|
This is gonna be fun, since nVidia has got huge margins on this GPU. This was supposed to be the "performance/$" oriented card, like the GTX 560 Ti and GTX 570 were. I'm expecting heavy price drops on this one as soon as supply stabilizes.
|
|
|
1) nVidia purposely severely caps double-precision (FP64) performance of "gaming" cards, so they don't diminish the sales of Quadro and Tesla boards. Single-precision (FP32) is uncapped, double-precision should be around 1/2 of single-precision performance (on Quadros and Teslas). Nevertheless, Bitcoin uses integer math, not floating-point.
2) Any bench using OpenCL is pontless, since nVidia doesn't really care much about OpenCL. It does care a lot about CUDA, and to a lesser degree DirectCompute, since it is being used more and more by games for advanced effects (realistic depth-of-field, etc).
|
|
|
This. OpenCL is currently only supported at the CPU level, in Intel CPUs. It will be supported in future on-dye GPUs of Ivy Bridge, but it isn't supported in Sandy Bridge GPUs atm. Remember that "OpenCL" doesn't mean that it's GPU-only. You can even compile CUDA to run code on CPU, right now.
|
|
|
Newegg listed "Shader clock: 2012MHz".
Interesting... 1536 ALUs at 2012MHz would give it more processing power than a HD 6990. Either the GTX 680 is going to be the fastest GPU for mining, or it is a blatant mistake from Newegg or the microarchitecture has undisclosed limitations that would prevent exploiting all this apparent power.
Shaders are default clocked at 1411MHz on the highest profile, although there seems to be a big overclock headroom. Need some serious reviews to come up!
|
|
|
Guys, DirectCompute and OpenCL benchmarks are really pointless for nVidia cards atm.
Both are direct competitors to CUDA, so nVidia does what it can to "ignore" them. I've seen at times CUDA be 20-30% faster that the "same" code in OpenCL, because they don't care as much for optimizing their OpenCL compiler as they care about their precious CUDA compiler, or their new OpenACC initiative.
|
|
|
Add to that the four +12V rail design... Seriously Antec
|
|
|
GK104 will have 1536 stream processors clocked at 1GHz.
Here's my totally speculative math:
A GTX 570 with 480 SP at 732MHz gets about 150 MH/s, or 0.3125 MH s^-1 SP^-1.
Scaling linearly with clock speed, we would expect 0.4269 MH s^-1 SP^-1 at 1GHz.
For 1536 SPs, that's 656 MH/s at an estimated TDP of 200w.
Of course, it's likely that these new SPs will be a little slower than the old ones, but even if they're 30% slower they should still be competitive with AMD cards.
The stream processors ("cuda cores") are dynamically clocked, and can go to 1411MHz (non-oc'ed). http://www.overclock.net/t/1231113/gigabyte-gtx-680-2gb-already-arrive-at-my-shopAlso, you should be using hot-clocks in your calcs, e.g.: GTX 570, 480SP at 1464MHz (2*732MHz) shaders ~150MH/s, so (1411*1536)/(1464*480) = 3.084*150MH/s = 463MH/s The thing is... they've changed the architecture from Fermi, so untill someone tests it with real hardware, it's all a gamble. http://www.techpowerup.com/162500/GK104-Block-Diagram-Explained.htmlnVidia presentation slides: https://imgur.com/a/aQmuA
|
|
|
It's never as good as it seems and it is never as bad as it seems.
^ Words of wisdom!
|
|
|
You should sell those obsolete 5970s. Better hurry though before word gets out how noisy and ancient they are.
Are people expecting the secondary market to be flooded with GPU's once singles are being easily accessible? i.e., are you ordering more watercooling blocks D&T?
|
|
|
Someone once said that having separate bathrooms is the key to marriage success! I wouldn't go that far, but certainly having separate bank accounts plus one common account for certain types of standard expenses (household, groceries, utility bills, etc), "avoids" alot of problems: I won't get upset on expensive new shoes, and she won't get upset on expensive new toys!
|
|
|
But... but... that Wikipedia example was soooooooo staring at me! I'd be toast if I quoted from Wikipedia for a research paper. As in "really burnt" toast. I'm also no expert at probabilities, but I did have to chew a good dosage of stochastic processes while finishing college (EE) I'm not very good at stats, but this discussion falls under the heading of something that I understand which was the reason for my very first comment And you were right on the money!
|
|
|
A pic is in order "Simulation of coin tosses: Each frame, a coin is flipped which is red on one side and blue on the other. The result of each flip is added as a colored dot in the corresponding column. As the pie chart shows, the proportion of red versus blue approaches 50-50 (the Law of Large Numbers). But the difference between red and blue does not systematically decrease to zero." ( http://en.wikipedia.org/wiki/Gamblers_fallacy)
|
|
|
I believe the jump in hash rate is due to hoppers calculating that there's a high probability of us finding blocks. In other words, we're so down on our luck that there's bound to be a bright future.
<3 hoppers on ECM! Time to cash in DGM benefits! DGM does not benefit from pool hoppers. I don't know why it says that in the FAQ. That is false though. I think the extra hash rate may be from GPUMAX, they can use up to 200gh here. Most of the hashrate is coming from 2 large miners. Reduced variance is a good enough benefit for me!
|
|
|
I believe the jump in hash rate is due to hoppers calculating that there's a high probability of us finding blocks. In other words, we're so down on our luck that there's bound to be a bright future.
<3 hoppers on ECM! Time to cash in DGM benefits!
|
|
|
Very early days yet, but I'll most likely be making the front end for this. My basic ideas for that so far are... A python library, to enable easy communication with the miner in your own code should you want to Command line and GUI tools (which use the library). For the GUI I'm still not settled on a GUI framework. What ever I choose I intend for all of it to be cross platform of course. I may also make a web front end? comments on this also welcome Integration with cgminer would be really great, since it already supports the BFL Single and the Icarus, and it also supports RPC for frontend/easy remote management, if required. No need to reinvent the wheel, I'd say!
|
|
|
|