Well, the scrypt-chacha kernel is a curious thing.
I think performance is mostly linked to memory. Ideally you want to access the main memory as little as possible even though the scrypt algorithm is forcing you to do so. Large register files and caches help with that. I haven't looked at the exact specs of all the chips but there may be some differences there.
Wikipedia has a nice table with the specs for the HD 7000 series:
http://en.wikipedia.org/wiki/Radeon_HD_7000_SeriesThe chip on the 7870 XT is code named Tahiti LE. It has a 256-bit memory bus which puts it behind the 7900 series in memory bandwidth. And it also has less cores than the 7900 series. I would assume they didn't change the Tahiti design but simply disabled some cores and a few memory controllers. Compared to the 7970, they probably have disabled 33% of memory controllers and 25% of cores. So the memory bandwidth is the biggest issue with your card.
Hanzac did a pretty good job with the OpenCL kernel in his initial release. I only improved it by taking some ideas from the normal scrypt kernel. So it has some new ideas and it has some old ideas.
I think it may be possible to apply some of Hanzac's tricks to the normal scrypt kernel.
The scrypt-chacha kernel certainly seems to perform differently. For example on my 7790 there doesn't seem to be a "sweet spot" for the combination of engine and memory clock. Admitted, I didn't test this thoroughly, but the kernel seemed to always simply run faster at higher clocks.