... GPUs, however, also have lots of very fast dram and there now exists fairly fast gpu implementations of scrypt.
Scrypt was designed to be able to customize the the memory-per-thread requirement. A quad-core CPU using its 8GB of RAM has 2
GB/core. A GPU with 2 GB of RAM for 1600 cores has only 1-2
MB/core.
The goal of scrypt was to be able to tailor the computational problem to disarm massively-parallel processors from being able to execute too many threads simultaneously. I don't know what exactly the designer had in mind, but I bet GPUs were considered. And it will be a long time before that gap is closed, if ever.