The reason ati's are faster for this specific mining work is that mining uses the SHA instructions (I think that's what it's called).
Pretty much it boils down to the nvidia card have to run a set of instructions (like say 4), where as the ati card can do it in 1 instruction.
So no matter what, until nvidia adopts openCL or gets some similiar instructions, they are going to suck balls at this.
nVidia has openCL but their hardware is lacking the 32 bit right rotation instruction, so they take 3 instructions to emulate.