Look at this example:
https://mempool.space/tx/0301e0480b374b32851a9462db29dc19fe830a7f7d7a88b81612b9d42099c0aeOP_PUSHBYTES_32 aa7c645e1ee2a6b9fcd113e98f057cbc70eec92cb0d18d7f138c178f625bbc76
OP_CHECKSIG
OP_0
OP_IF
OP_PUSHBYTES_3 "ord"
OP_PUSHBYTES_1 01
OP_PUSHBYTES_10 "image/jpeg"
OP_0
OP_PUSHDATA2 <520-byte block>
OP_PUSHDATA2 <520-byte block>
OP_PUSHDATA2 <520-byte block>
...
OP_PUSHDATA2 <520-byte block>
OP_PUSHDATA2 <520-byte block>
OP_PUSHDATA2 <520-byte block>
OP_ENDIF
See? Everything is splitted just into 520-byte blocks.
But I thought that the taproot upgrade simply enables people to tweak the public key to embed a merkle root on chain by tweaking the public key to create a modified address.
Tweaking the key is one thing. But how do you know, if the TapScript is valid or not? You have to see that. And Ordinals just abuse the fact, that the whole TapScript is always revealed. Which means, that even if there is a way to make things smaller, without sacrificing security, then still, people can choose the least optimal way, just because they can.
For example: if you have a single public key, then you can use this script: "<pubkey> OP_CHECKSIG". It will be simple. But: anyone can abuse the Script, and instead write: "OP_2 OP_2 OP_ADD OP_4 OP_EQUALVERIFY <pubkey> OP_CHECKSIG". Then, this part "2+2=4" is useless, but it is still there, and it has to be revealed, if you want to spend by TapScript.
Even then, the merkle root isn't actually stored on chain. You just have to know it to spend coins from that address right?
Yes, but you don't have to pick the best way. Which means, that if you want to make a condition: "Alice or Bob can spend those coins", then you can create two branches: "<pubkeyAlice> OP_CHECKSIG" and "<pubkeyBob> OP_CHECKSIG". And then, you can reveal one of them. But: you can abuse the TapScript, and make a single branch with both conditions: "OP_IF <pubkeyAlice> OP_CHECKSIG OP_ELSE <pubkeyBob> OP_CHECKSIG OP_ENDIF". Then, everything will be always revealed, if you will spend by TapScript.
I know my understanding is wrong because if all you needed to know to spend coins from the script path was the merkle root you wouldn't need to know the hash of any of the leaves. So can you guys fill in the gaps in my understanding?
You have to prove, that the revealed TapScript is committed to your address. Which means, that you provide some kind of SPV proof, similar to proving, that a given transaction is part of the block. And then, you reveal the whole TapScript in that leaf. Then, all comes into what users picked: they could for example always spend by public keys, and reveal their TapScripts outside Bitcoin. Or they could put their Ordinals into a separate leaf, which could begin with OP_RETURN, so it would be never pushed on-chain, but could be still committed to their address.
So, to sum up, there are many ways to make the whole system secure, without pushing everything on-chain. But users just decided to put their data anyway, even if there is no need to do so. And also, for that reason, I think about making pubkey-only system, which could be resistant to that kind of attack. Or at least about making a lightweight node, which would only process signatures, without storing data, so it would be similar to OP_CHECKSIGFROMSTACK.