so why do we simply discard it?
Because it is done by the default implementation. But nothing stops you from keeping more blocks. For example: if you have mining pools, then they accept blocks meeting lower difficulty, and they are called "shares". And then, centralized pools can split the coinbase reward between miners, based on their shares.
And nothing stops you from keeping other kind of hashes, starting from ones, instead of zeroes, or forming any other patterns you want to trace.
Blocks with dual nonces would have priority as winner blocks.
In the current consensus, different candidating blocks on the same height are treated as equal, and the first-seen block is used to build on top of it. But of course, if there are many candidating blocks, then you can pick whatever block you like, because you will have the same chances to make a new block on top of that, and broadcast it to everyone.
Greater resistance against quantum attacks that directly target the blockchain’s infrastructure.
Your changes don't affect quantum resistance in any way.
Then, one could also consider possible extra rewards on the current Bitcoin network or as an complementary incentive on another blockchain.
Well, you can have just your own chain with extra blocks, like P2Pool did in the past. They used 20 times lower difficulty, to accept more blocks, but you can do similar things with other patterns, if you want to. Of course, if your valid block will never be a valid Bitcoin block, then you will only extend one of the chains, which will make interactions between networks harder, but it is your choice, and technically, you can do it, if you want to.
No single hash can satisfy both chains, preventing true merged mining.
If you mine things, based on the same block template, and just capture more cases, then you produce valid Bitcoin blocks, and valid altcoin blocks, with the same computing power. Which means, that you can use Merged Mining, but your additional computing power will simply never be contributed to the Bitcoin network (which is why it is pointless, but technically, it can be done, if someone wants to get that kind of outcome).
I thought Merged mining allows a single hash to secure both chains
Yes, it does, but if you make your altcoin blocks strictly incompatible with the original network, then your computing power will mine both chains (so you will use Merged Mining), while securing only one of them, when you hit a valid block (which is why nobody did that before).
miners would need to compute separate block headers for each chain and check hashes against both criteria
No, I thought about working on the same block header, but just using more criterias. In the same way, you can also check, if your block hash matches the binary representation of some "pi" approximation.
Miners must work on separate block headers for each chain, as transactions and merkle roots differ.
If you will use different headers, then it won't use Merged Mining. But if you will, then you can enjoy some benefits of that model.