Because it's a 12x-ish additional work factor, thats all. Is 12x a killer? Depends on how specifically that you've constructed it.
Trying to put it all on one big device and you run into IO/memory bottlenecks much easier-- your 100 million midstates is over 64gbit/sec of data, that device goes down and your farm is down. The device is also highly non-covert, so you cannot keep what you are doing secret from your facilities staff. It is much more straightforward to user local FPGAs to generate the collisions. And from what I'm told use of local devices to generate is how the unreleased spoondooles design worked.
Consider: today miners could use centralized devices to generate midstates, but instead the S9/R3 miners have fairly expensive FPGAs with 16MB of attached dual channel DDR2133 to generate the ~2000 midstates per second they need. (When will Trezor ship with such an incredible FPGA? Never, I assume
) They could use a centralized device to do so, but they don't. (well it looks like privately bitmain runs two S9s per controller, twice that).
If you've adopted a design with distributed generation, then you can't escape the cost-- perhaps you could deal with it, but that doesn't mean the infrastructure anyone has already built does deal with it.