MrTeal
Legendary
Offline
Activity: 1274
Merit: 1004
|
|
April 21, 2012, 04:20:33 AM |
|
Mining doesnt use FP -_- Also not sure why you quoted the part on GF114. The GTX480 and Quadro/Tesla are GF100, not GF114.
Only the first sentence really talks about GF114, and it's just there to give context as to the difference in approach between GF114 and GK104. As for mining, that's true. However, Telsa cards are not sold for mining. Most scientific computing uses FP, so when discussing whether the new 7B transistor Telsa card is 2x3.5B GK104 or GK110, floating point performance is a huge factor.
|
|
|
|
MrTeal
Legendary
Offline
Activity: 1274
Merit: 1004
|
|
April 21, 2012, 04:23:59 AM |
|
Edit: Anyways if it is right that 1 of the 3 blocks cannot do anything but FP mathematics, that means that there are still 1024 cores which can. Coupled with the halved speed, it should still match the GTX580. I wonder why the performance is so poor then.
I dont buy the new architecture = no tuned miner theory. It should be relatively easy to write a compiler for a MIMD architecture like the GTX680 and get 99% utilization with data parallel tasks, like hashing.
It's not 1/3 blocks. There's 192 normal CUDA cores in an SMX and only 8 FP64 cores. That's why performance is so poor.
|
|
|
|
AzN1337c0d3r
Full Member
Offline
Activity: 238
Merit: 100
★YoBit.Net★ 350+ Coins Exchange & Dice
|
|
April 21, 2012, 04:33:36 AM |
|
Edit: Anyways if it is right that 1 of the 3 blocks cannot do anything but FP mathematics, that means that there are still 1024 cores which can. Coupled with the halved speed, it should still match the GTX580. I wonder why the performance is so poor then.
I dont buy the new architecture = no tuned miner theory. It should be relatively easy to write a compiler for a MIMD architecture like the GTX680 and get 99% utilization with data parallel tasks, like hashing.
It's not 1/3 blocks. There's 192 normal CUDA cores in an SMX and only 8 FP64 cores. That's why performance is so poor. Ahh, I misread and equated "blocks" with "SMs". Apologies. But as far as I know, mining is integer performance, which should go with the normal CUDA cores right? Why is the performance so poor then? As for FP performance, most scientific applications should use FP32 to save memory/time (although with the amount of lazy programmers I deal with everyday, that may not be the case....)
|
|
|
|
MrTeal
Legendary
Offline
Activity: 1274
Merit: 1004
|
|
April 21, 2012, 04:39:14 AM |
|
Edit: Anyways if it is right that 1 of the 3 blocks cannot do anything but FP mathematics, that means that there are still 1024 cores which can. Coupled with the halved speed, it should still match the GTX580. I wonder why the performance is so poor then.
I dont buy the new architecture = no tuned miner theory. It should be relatively easy to write a compiler for a MIMD architecture like the GTX680 and get 99% utilization with data parallel tasks, like hashing.
It's not 1/3 blocks. There's 192 normal CUDA cores in an SMX and only 8 FP64 cores. That's why performance is so poor. Ahh, I misread and equated "blocks" with "SMs". Apologies. But as far as I know, mining is integer performance, which should go with the normal CUDA cores right? Why is the performance so poor then? As for FP performance, most scientific applications should use FP32 to save memory/time (although with the amount of lazy programmers I deal with everyday, that may not be the case....) Many scientific simulations require the precision of FP64, so FP32 isn't really an option. That's why you pay thousands for Telsa instead of just buying a GTX580. As for AMD vs NVIDIA mining performance, there are other factors there. AMD supports a couple instructions that significantly improve hashing performance. You can't directly compare shaders*clocks between the two.
|
|
|
|
AzN1337c0d3r
Full Member
Offline
Activity: 238
Merit: 100
★YoBit.Net★ 350+ Coins Exchange & Dice
|
|
April 21, 2012, 04:50:40 AM |
|
Many scientific simulations require the precision of FP64, so FP32 isn't really an option. That's why you pay thousands for Telsa instead of just buying a GTX580.
Except 6970 FP64 performance is even higher than Tesla. People don't buy Tesla for its FP64 performance. They buy it because it is a dynamically-scheduled architecture (which tends to extract a ton of thread-level parallelism), with ECC support (important if you are into HPC), and TCC mode. As for AMD vs NVIDIA mining performance, there are other factors there. AMD supports a couple instructions that significantly improve hashing performance. You can't directly compare shaders*clocks between the two.
I'm talking about Fermi vs Kepler
|
|
|
|
MrTeal
Legendary
Offline
Activity: 1274
Merit: 1004
|
|
April 21, 2012, 04:59:23 AM |
|
Many scientific simulations require the precision of FP64, so FP32 isn't really an option. That's why you pay thousands for Telsa instead of just buying a GTX580.
Except 6970 FP64 performance is even higher than Tesla. People don't buy Tesla for its FP64 performance. They buy it because it is a dynamically-scheduled architecture (which tends to extract a ton of thread-level parallelism), with ECC support (important if you are into HPC), and TCC mode. That, and the better support NVIDIA offers to HPC developers. Interestingly, the hardware scheduler is gone in GK104. As for AMD vs NVIDIA mining performance, there are other factors there. AMD supports a couple instructions that significantly improve hashing performance. You can't directly compare shaders*clocks between the two.
I'm talking about Fermi vs Kepler For that, I have no idea. I would imagine it's a lack of optimizations for the new arch, but I don't know much about the CUDA miners.
|
|
|
|
AzN1337c0d3r
Full Member
Offline
Activity: 238
Merit: 100
★YoBit.Net★ 350+ Coins Exchange & Dice
|
|
April 21, 2012, 05:07:19 AM |
|
Does anyone have a GTX680 here? I would be interested in the results of CUDA-Z. http://cuda-z.sourceforge.net/If the integer performance there is good, it's probably an optimization issue. If it is bad, then more than likely the hardware is just not as good as Fermi in some way.
|
|
|
|
Gabi
Legendary
Offline
Activity: 1148
Merit: 1008
If you want to walk on water, get out of the boat
|
|
April 21, 2012, 01:52:01 PM |
|
Many scientific simulations require the precision of FP64, so FP32 isn't really an option. That's why you pay thousands for Telsa instead of just buying a GTX580.
Except 6970 FP64 performance is even higher than Tesla. People don't buy Tesla for its FP64 performance. They buy it because it is a dynamically-scheduled architecture (which tends to extract a ton of thread-level parallelism), with ECC support (important if you are into HPC), and TCC mode. As for AMD vs NVIDIA mining performance, there are other factors there. AMD supports a couple instructions that significantly improve hashing performance. You can't directly compare shaders*clocks between the two.
I'm talking about Fermi vs Kepler 7900 serie has ECC support
|
|
|
|
mc_lovin
Legendary
Offline
Activity: 1190
Merit: 1000
www.bitcointrading.com
|
|
May 16, 2012, 05:28:55 AM |
|
and updates on 680 hashrates?
|
|
|
|
bulanula
|
|
May 16, 2012, 08:18:22 PM |
|
and updates on 680 hashrates?
Nope it looks like Nfail again. Too bad because I would have loved to get rid of this shitty ATI mining monopoly game.
|
|
|
|
mc_lovin
Legendary
Offline
Activity: 1190
Merit: 1000
www.bitcointrading.com
|
|
May 16, 2012, 09:21:01 PM |
|
the thing that got me excited was the quad-GPU VGX edit: the VGX has only 786 cuda cores total across all 4 GPUs? SAD! no longer excited. and the endless racks FULL of GPUs and since they offer enterprise cloud computing, wondering if we could get all those GPUs hashing? EXCEPT the fact that nVidia sucks balls at mining.
|
|
|
|
AzN1337c0d3r
Full Member
Offline
Activity: 238
Merit: 100
★YoBit.Net★ 350+ Coins Exchange & Dice
|
|
May 16, 2012, 09:37:46 PM |
|
edit: the VGX has only 786 cuda cores total across all 4 GPUs? SAD! no longer excited. It seems more reasonable to me that it would be 768 core per each of the GPU.
|
|
|
|
mc_lovin
Legendary
Offline
Activity: 1190
Merit: 1000
www.bitcointrading.com
|
|
May 17, 2012, 12:24:32 AM |
|
edit: the VGX has only 786 cuda cores total across all 4 GPUs? SAD! no longer excited. It seems more reasonable to me that it would be 768 core per each of the GPU. Yeah but it says GPU Specifications Number of GPUs 4 Total NVIDIA CUDA® Cores 768 Shader Perf (TFLOPS) 1.3 Power (W) 150 So.. Total CUDA cores = 768? What a rip!
|
|
|
|
ataranlen
|
|
May 23, 2012, 04:09:59 AM Last edit: May 23, 2012, 04:34:15 AM by ataranlen |
|
I'm getting set up for doing some test mining on my dual 680's now. Does anyone have a GTX680 here? I would be interested in the results of CUDA-Z. http://cuda-z.sourceforge.net/If the integer performance there is good, it's probably an optimization issue. If it is bad, then more than likely the hardware is just not as good as Fermi in some way. Tested, Here are the results: Gives more stable performance on the second card in my SLI Setup. http://dl.dropbox.com/u/20040127/CUDA-Z.txtI'll be interested to see performance with Cuda 5
|
|
|
|
|
bulanula
|
|
May 25, 2012, 03:51:41 PM |
|
How do you guys thing the k10/k20 tesla's will do. K10 is dual 680's and K20 is big kepler.
Fail again most likely. Nvidia are a bunch of retards for letting AMD and their shitty drivers be the only mining GPUs.
|
|
|
|
AzN1337c0d3r
Full Member
Offline
Activity: 238
Merit: 100
★YoBit.Net★ 350+ Coins Exchange & Dice
|
|
May 28, 2012, 04:15:54 AM |
|
There's a table of operations per clock cycle per SMXGiven that K20 has 15 SMX and K10 has 16 SMX and GTX680 has 8 SMX and we all know how fast GTX680 is at mining.... it doesn't look hopeful.
|
|
|
|
mc_lovin
Legendary
Offline
Activity: 1190
Merit: 1000
www.bitcointrading.com
|
|
May 29, 2012, 09:50:24 PM |
|
How do you guys thing the k10/k20 tesla's will do. K10 is dual 680's and K20 is big kepler.
Fail again most likely. Nvidia are a bunch of retards for letting AMD and their shitty drivers be the only mining GPUs. I think the mining community is too small for nVidia to focus on. Thankfully! If the whole world was mining the difficulty would be pretty darn hard.
|
|
|
|
|