1. TxOut scripts are not evaluated until they are spent-- those are probably unspendable TxOuts.
Fair enough... I was wondering if there was a reason to believe they were spendable, and thus I need to add something to my scripting code to accommodate. I'm really just looking for a sanity check. It sounds like there's no action here.. at least not until someone demonstrates they are spendable and my code would've failed.
2. The inputs must be valid (you're looking at coinbase txns with no inputs though). Again, TxOuts aren't evaluated until they are used as inputs in another transaction; as long as they deserialize properly they'll be accepted.
Looking back, I see that the transactions are coinbase, but the non-std scripts are in the TxOuts -- which means they could've been put on any transaction, not specific to coinbase. So my response is the same here as in #1 -- I'll just assume they are unspendable and that I don't need to accommodate anything new in script engine. You make a good point that TxOut scripts can be anything, so I'll just always assume they are unspendable until I see evidence otherwise.
3. I don't know of any other bugs in the scripts ops, but I don't know that anybody has written thorough unit tests for them (anybody looking for a good get-your-feet-wet project that could be a good one to tackle; there are already unit tests for CHECKMULTISIG in the repostitory....).
It sounds like that arbitrary scripts have been run through the client software successfully in the past (such as these testnet scripts), but there hasn't been any rigorous efforts to check that it's robust, etc. I would volunteer, but I'm not sure how to isolate the ref client scripting engine, and then still throw in things like OP_CHECKSIG evaluations which require more than just the script itself (such as the whole Tx and the ECDSA verification methods).