1) Is asicboost a hardware or software optimisation?
As I understand it, both, but the effect is more greatly seen in hardware optimization. With hardware optimization for asicboost, you will still need corresponding software to perform the preprocessing.
According to my current understanding, asicboost optimizes mining through forming a merkle root of special kind. That sounds like it should be implemented in software? Does hardware need to be designed in special way to take advantage of optimized merkle roots? If so, can such special hardware be used to mine regular, unoptimized blocks?
There are two forms of asicboost, overt and covert. The overt form involves changing the bits in the version number and that is pretty obvious to see once the block is found, hence it is overt. The covert form involves finding multiple merkle roots which collide in the last 32 bits.
The way that asicboost works is that it takes advantage of the way that sha256 processes data in order to use the partial hashes of previous block headers when calculating the hashes of multiple block headers so it reduces the number of calculations that need to be made, which in turn makes the mining more efficient.
2) Is asicboost (as the name suggests) applicable only to ASICs (not to GPUs, FPGAs)? If so - why?
The most gains can be found in ASICs because the mining circuitry can be changed to be asicboost circuitry which has less logic gates and is thus more efficient.
3) Does observing of empty blocks from a miner increase our suspicions of asicboost being used?
I guess, that since they need a special merkle root, it may happen, that an empty or almost-empty block provides a necessary merkle root, so they just keep mining such an empty block until it is mined or an optimized merkle root found for a filled block, whichever happens first?
Possibly, but given that empty blocks still happen without asicboost, it isn't really an indicator of much.
4) If it's all about special merkle roots, why couldn't we detect asicboost optimized blocks just by analysing the blocks and their merkle roots?
The covert form of asicboost involves merkle roots. It involves finding and having multiple merkle roots which collide in the last 32 bits. However, because miners only broadcast their final block, outside observers cannot know what the other merkle roots that they have tried and whether those collide. Since we don't know anything about the other merkle roots, we don't know if they are using covert asicboost.
If they use overt asicboost, it is very obvious; the block version number will be all messed up.