The bit interleave part actually has the opposite effect in been ASIC resist. It literally takes zero cost do to fixed bit interleave in ASIC. Been an ASIC designer myself for almost 10 years,I would say this is less effective than dark/quark algorithm in terms of ASIC resist.
The worst part of bit interleave is that the difficulty target is directly mapped to each hash function, which makes parallel calculation possible and simple. If one of hashes result is less than partial target, the rest calc can be skipped.
If I were to implement this in Fpga, I would do it in 4 stages: hefty1+keccak, sha256, Blake, and groestl. The overall hash-per-sec is determined by hefty1+keccak. Reviewing the source code, my estimation is that complexity of hefty1 is in the same magnitude of sha256. So overall throughput would be similar to sha256. Considering cost of all hashes, 1/10 hash throughput of existing bitcoin Fpga miner is very easy to achieve.
Wait a minute.... So I can compute keccak and if certain bits are set I can skip computing the other hashes because I already know that I can not reach the given difficulty?
Right. Pick the fastest of the 4 hashes, compute and compare with certain difficulty bits, then decide whether to proceed with the rest hashes. In terms of GPUs, sha256 is the fastest. I get 6Mhash/s on R9-280x.
The dev's intention was good. Had they apply bit-interleave at input or in middle state, that would be more effective in slowing GPUs down.