This idea may have been discussed elsewhere, but what if someone produced a purely combinational logic circuit for calculating SHA-256 hashes. Such a circuit would be relatively small (low power consumption, easy to keep cool, easy to develop) and ridiculously fast, since you would get outputs as fast as you give it inputs. However, you'd need to develop one circuit for hashing the hash of the block header plus the nonce and another for hashing that hash (different input lengths). This isn't an issue if both circuits are combined, though it does double the initial computations needed for designing the circuit. Am I making some unreasonable assumptions here or would it be possible to achieve a multi-GH/s device?
You still need clocking. See here.
The idea of an ASIC (or FPGA, which is like an ASIC without the AS part) is to implement the logic such that all processing for each round is done in parallel, thus one round per clock. You can then add pipelining to get an effective throughput of multiple rounds per clock.