Cryddit gave an estimation on the number of standard gate building blocks required for a Bitcoin ASIC (adders, logic gates)
However, adders require more space than OR gates, so generally the number of gates will be dominated by the number of adders. Also adders can be implemented in several ways, with different delay/space trade-offs, so even if there could be a theoretical minimum number of gates, practically all implementations would use much more to reduce the delay.
More interesting, you can:
- Compute SHA^2 approximately, and get a better practically good SHA^2 ASIC for mining.
See
https://bitslog.wordpress.com/2015/02/17/faster-sha-256-asics-using-carry-reduced-adders.
- Compute SHA^2 asynchronously (e.g. using asynchronous adders)
Last, it has not been proven that performing a complete SHA^2 evaluation is required
on average to check that a changing header has a SHA^2 hash that is below the target value. In fact, several widely known optimizations have disprove it.
It is hard to guess what was the original intention of the question: mathematical/algebraic minimum or some sort of minimum for a defined implementation technology.
The typical goal for a implementation using silicon CMOS process would be timing closure, i.e. minimizing the time to compute the result. This is the reason why everyone is using carry-look-ahead adders that add 32 bits in parallel in a single clock cycle.
The other approach would be to use serial adders that add 32 bits in 32 clock cycles. Such an adder is just a pair of XOR gates. The problem with this approach is that the resultant clock rate is too high for the practical silicon-based semiconductor manufacturing processes. But if somebody considers e.g. GaAs manufacturing then the very deeply pipelined serial implementation is a viable alternative.