There's something wrong if we have a card that can perform side by side with a 6950 (GTX 570) in 3D applications of parallel processing/SP calculation but only 1/4 as fast in decoding bitcoin blocks.
Bitcoin is an integer algorithm, neither DP nor SP power is needed there.
You can try to do the port yourself though as pyoclbm is Open Source, good luck!