How does that help with consensus?
You still have to pick one Bitcoin block and one Namecoin block to build on. This is what forces consensus (the "nothing is at stake" issue for POS systems).
Maybe Namecoin didn't get that right, looks like they allow multiple altcoins to be mined at the same time. This might allow mining of 2 forks at the same time.
There's some information put into the Bitcoin blockchain. Ok. This makes more sense now. I assume that this "small amount of data" is the hash of the Namecoin block?
Yes, it is the hash of the Namecoin header. That hash can be anything, it doesn't affect POW.
Unlike bitcoin block hashes, Namecoin hashes don't have lots of leading zeros.
Merged mining works like this, you have two totally separate block chains, they are not related in any way nor does either contain any data from the other.
That is either a misunderstanding or simply unclear. The hash of the Namecoin header is placed in the coinbase transactions in a field called the "ExtraNonce".
It is literally intended to be another nonce. You could set it to anything you want and the coinbase is still valid.
There was a BIP which added the requirement that the first bytes of the extraNonce are set to the height of the block, so you don't have full flexibility. Other than that rule, you can set it to any byte array of 100 bytes or less.
The point is that you are just embedding the hash in the blockchain, but it doesn't actually have any effect on the Bitcoin system.