I think it is now evident that the mining algorithm probably should have been memory hard, not just processing hard. That is, the mining algorithm should have been designed to require at least 128MB of RAM. That would have made CPU and GPU mining practical and on a roughly even footing, and it would have made ASIC/FPGA mining impractical.
Why? We're only seeing ongoing optimization towards what has been asked for.
We would see exactly the same thing happening with a "memory hard" algorithm. Probably the trend would be to use big physical footprint chips with few logic resources, interfacing to many high density DRAM chips simultanously. Memory octopuses so to say, keeping all DRAM busy at all times. CPUs and GPUs would look pale against those, too, and you could THEN argue that a "CPU hard" algorithm should have been chosen instead ........ (?!)