Couldn't the pool validate your shares without knowing the full merkle tree? So the miner provides the PoW via the block header, plus a merkle branch which proves that the pool is paid in the coinbase transaction.
To submit a block, the pool has to have the entire raw block worth of data. So if the block is 500KB, the miner would have to upload the full 500KB being used to build the block to the pool, plus a bit of overhead for the JSON markup, assuming the miner was actually involved in building the block.
There are *some* ways to shrink this, where the pool dictates what transactions can be selected, with a tx-id integer (instead of full hash) so the miner could just upload a list of the transactions they included in the block, or a list of ones they excluded. But once the miner is introducing their own transactions from their local node that the pool wasn't originally aware of, this is no longer possible, since the pool needs to know the raw transaction.
Reading this, only understanding 30% of it
Sincere question, maybe it was covered here but I couldn't understand it. What is the real world chance that one day in the close future (next 12 months say), somebody will do something like this 51% attack thing, or some fault will be discovered, that creates big problems for bitcoin? Is it a high risk of happening?
A 51% is extremely unlikely. Even the largest pools now represent less than 30% of the actual network hash rate. Luck spikes make their 24-hour block percentage above 30%, but this can't be predicted and is not reliable. The 51% attack threat is having enough power to guarantee eventually you will make the longest chain as long as you keep trying. If the largest pools were DDoS'd, most of that hash rate filters down to backup pools, so the argument that you just DDoS the largest pools to make a 51% attack easier is not quite accurate.