Is my question. Anyone using POCLBM with CUDA-enabled nVidia cards has probably noticed that their cards are only using a fraction of their possible output in terms of heat/electricity. It's reasonable to think then that all of our transistors are not being used effectively or at all.
Particularly I am pretty sure the OCL implementation in POCLBM very poorly utilizes the GPU in terms of blocks/threads:
http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/The PyCUDA documentation is here:
http://documen.tician.de/pycuda/index.html#contentsI don't see any reason why the CUDA architecture, using the full parallel processing capabilities of each CUDA core, should be any slower than ATI cards, but hopefully someone here with a better understanding can figure things out and explain them.