Am I to understand that none of the software out there for mining is taking advantage of the new GCN arch?
and if so, could I expect more performance out of my card?
Short answer: Yes, slightly better performance is possible.
Long answer: You can expect a little more performance, but unless there's a detail I'm not aware of there is really not much left to gain (1-2% ideally). Let me explain why.
To the best of my knowledge, the closest estimate of the number of mathematical operations required to compute 1 hash is ~3375 (
according to Phateus). And if we consider an ideally efficient processor to be one that computes mathematical operations at a rate of one operation per cycle, then hashing would take ~3375 cycles on this ideally efficient processor.
Now lets take a look at what kind of performance we can measure with today's kernels. The 7970 has 2048 stream processors and a stock frequency of 925Mhz, and with the best known kernels it is computing 550MH/s. Knowing this, we can measure the average number of cycles it is taking each stream processor to compute one hash using the following equation:
Stream Processor Count x GPU Frequency 2048 x 925MHz
Cycles/Hash = -------------------------------------- = ------------- = ~3444cycles
Hashes per second 550 MH/s
Now if we consider that each stream processor at best can perform one ALU instruction per cycle, then the 7970 is extremely efficient (in cycles per hash) since this 3444 cycle measurement is reaaaallly close to the ideal value of 3375 cycles at one instruction per cycle. This is only a ~2% difference off of ideal and might even be due to measurement error. Its so efficient that unless there is a breakthrough that reduces the amount of operations required per hash, or there's some new GCN instruction that I'm unaware of that allows the GPU to compute several steps of the hashing function in one cycle, or kernels are modified to start taking advantage of fixed-function hardware somehow, then to the best of my knowledge ~550MH/s at stock clocks is pretty much all we're ever going to get.
To give you an idea how efficient the 7970 is at computing hashes we can compare its efficiency (in cycles per hash) with a 6970, which has 1536 stream processors and a stock frequency of 880MHz for the highest reported hashrate of 370MH/s at that frequency (from the mining hardware comparison chart):
Stream Processor Count x GPU Frequency 1536 x 880MHz
Cycles/Hash = -------------------------------------- = ------------- = ~3653cycles
Hashes per second 370 MH/s
At an estimate of 3653 cycles, a 6970 stream processor takes ~6% more cycles per hash than a 7970 stream processor at the same frequency, and ~8% more than the ideal 1 instruction per cycle processor.
Now lets compare to a 5870 which has a highest reported hash rate of 379MH/s with its 1600 stream processors and a stock speed of 850MHz:
Stream Processor Count x GPU Frequency 1600 x 850MHz
Cycles/Hash = -------------------------------------- = ------------- = ~3588cycles
Hashes per second 379 MH/s
This makes the 5870 roughly 2% more efficient (in cycles per hash) than the 6970, but it still uses ~4% more cycles per hash than the 7970, and ~6% more than the ideal processor. So we can conclude that ATI's GCN is already making ~98% efficient use of its stream processors for hashing, which is more than the VLIW4 and VLIW5 of its previous two generations and close to the ideal. This more efficient stream processor usage along with the increased number of stream processors and higher stock frequency explains the increased hashing performance when compared to the previous generations of GPUs.
Disclaimer: I'm not a GPU programming expert (yet) so please take my answer with a grain a salt. But for what its worth, I develop HPC software for a living that solves problems running on thousands of nodes in parallel.