Arguing for ASIC-resistance due to perceived high cost of 1MB is silly though.
(gridseed LTC ASICs already have 512KB on board)
The original claim doesn't even mention cost. It should elaborate on what they
mean by ASIC pipeline though, as that has me confused...
There is something to be said for a cache-oriented proof-of-work though,
of which there are not many examples. There are several concerns:
1) Cache sizes slowly grow over time (Moore's law). Currently, high-end x86 has 2.5MB / core.
The proof-of-work should use as much of this as possible.
You want each core running its own instance while fully utilizing its cache.
Ideally, the dynamic difficulty adjustment should be able to increase the memory requirement,
so as to keep up with hardware improvements.
2) There must be no easy memory-time trade-off (like scrypt has).
Memory accesses must be dependent on earlier ones, so that must necessarily be done
in sequential order. Otherwise, a GPU/FPGA/ASIC will gain a lot from increased parallellism.
3) The amount of computation per memory access must be minimized in order for the core
to have a very high frequency of (cache) memory accesses. The point is that ASICs can
always do computation much faster, but they cannot do random memory access much faster.
If you get these things right; then a GPU will not be competitive.
Certainly, CryptoNight is an improvement on scrypt on all 3 counts above
(for item 3, mostly due to its smaller block size, 16 bytes vs 128 bytes).