(putting this in it's own thread as I'm not really talking about sidechains anymore)
We shouldn't add extra data into Bitcoin's blockchain. It grows very fast. Imagine how much data it'll be necessary to transmit when Bitcoin usage reach 100.000 transaction per day.
Adding extra data to the coinbase transaction found in every block is just fine. That is how merged mining is accomplished today, and it is a scalable method of adding unlimited amounts of data to each block.
I've heard this argument come up multiple times when stuff like timestamping or side-chains comes up. The problem is adding extra data is only possible at all if you are a miner, and if you want your data to be added in a reasonable amount of time, a pool op. Given that limitation I suspect both timestamping and sidechain concepts will wind up stuffing data into transactions, bloating up the blockchain. I've mentioned before how this can be done right now, without a special transaction type.
So would pool ops be at all willing to put a merkle tree head into the coinbase's they mine?
From their point of view we can make this pretty easy for them to do technically speaking. Lets suppose I wrote some software that would act as a merkle tree server. It'd accept hashes from anyone, build up a full merkle tree, and provide the top hash to pool ops for inclusion. The interface for them can be just a HTTP GET-able top-level hash that they put directly into their coinbase. My software could even look in coinbases of blocks mined to figure out what ones got into the blockchain. If they aren't willing to put the hash in the coin base directly, or want to stuff it into an existing merge-mine merkle tree, they'd have to additionally send back the merkle leaves required to get to the coinbase by either HTTP RPC back to the sender, or even just publishing the associated data automatically on their website. For redundancy the latter option is probably better anyway, because that way it's fine if multiple people are operating these servers, with pool ops determining what server's they're going to accept top-level hashes from. These merkle servers can submit hashes to each other too. If any server gets a hash into a coinbase, they all do, modulo the longer path get to the top hash. Equally, nothing is stopping a pool op from running the merkle server themselves.
Of course, this kinda looks like we're allowing arbitrary people to use pool hash power to do their own merge mining. I suggest that before a pool op includes the top-level hash into the coinbase or merge-mine tree they do a final "poisoning" step that consists of taking that hash, converting it to hex, pre-pending a known and fixed string, ("This hash is not a proof of work: <hash>") and then hashing that whole string with sha256d. Since the poison step is always the same timestamp and similar uses don't have to record that extra bit of data in signatures, but at the same time it unambiguously distinguishes such merkle trees from legit merge mined ones.
I've been thinking about this stuff in terms of timestamps and have been recently working on a very compact, and extendable, way of storing the resulting merkle paths. The goal here is to be able to timestamp a file and save a path to one or more notaries who guarantee that a hash existed at a given time, with the option of later adding additional paths to other notary signatures as they become available. Notary being a PGP signature, Bitcoin block, commercial timestamping service, dead-tree publication etc. It's a field that doesn't seem to have much open-source software available, probably because the problem isn't really a software problem... but Bitcoin changes that. I've got some Python code (actually Cython) I'm working on in a really early stage; nothing set in stone yet. One key thing I want to do is ensure that servers running the timestamping version of this software widely witness each other with multiple notary methods; the blockchain is an excellent way of doing that if you include Bitcoin block hashes into your own merkle trees.
Regarding scalability lets suppose we would up with a smallish number of people running servers, 8 or 16 or so, that pool ops collectively were querying. If all 16 got into a block the merkle tree is now 4 levels deep. Lets assume each timestamp server operator is running their server on Amazon EC2's free tier, 15GB bandwidth out a month. With 1KiB out/request they can serve up to 6 timestamps/second (3600 in 10minutes) without running out of bandwidth. 2^12=4096 so the path is 16 long, 512bytes + block header overhead. Rate-limiting, if even required at all, can be reasonably done with API keys, CAPTCHAs and asking nicely, with the intention being that heavy users run their own servers. (they probably have their own notaries anyway, with bitcoin as a backup verification) Heck, maybe even monetary payments to deter abuse; if only there was a way to easily accept such payments...
In any case, if pool ops aren't interested, this kind of thing can equally work by dumping the data into specially marked transactions, although having the option of something centralized would be better provided enough pool ops are interested that getting your timestamp/sidechain in a reasonable amount of time is possible. The worst case is if timestamping and sidechains is a common use case, and everyone rolls their own transaction-based system bloating up the chain. As it is with the 1MiB block size limit and an average transaction size of around 256 bytes block's aren't going to have more than 4096 transactions in them anyway, so applications trying to limit the size of their trees are often going to have an incentive to use transactions directly, modulo transaction fees.