You don't need to recompute the whole merkle tree, though, just one branch of it (the branch that connects the transaction with the extra nonce in it). Recalculating this requires hashing the transaction with extraNonce updated and 64*(height of merkle tree) bytes. So, updating the merkle root by modifying the 20 kb coinbase transaction in a block with 2000 transactions would require hashing
20,000 + 64*log2(2000) = 20.7 kb
But modifying the extranonce in a transaction of size 300 bytes only requires hashing
300 + 64*log2(2000) = 1 kb
Either way, I haven't seen any numbers that show that a CPU can't recalculate merkle roots as fast as an ASIC can exhaust a whole 2^32 nonce range. Is that actually an issue?
I agree that it would be better to have the extra nonce right in the header to avoid recalculating the merkle root entirely, but while I find the extended header idea interesting, I'm not sure it's in sufficient need that it would be worth all the extra work and support.
Related:
http://bitcoin.stackexchange.com/questions/32603/why-didnt-satoshi-make-the-nonce-space-larger