1: If the solution were to randomize what chain to work on in a tie, why wouldn't the selfish pool
try to create multiple chains by having subpools work on finding hashes for separate blocks?
Doesn't work. They need double the hashing power. If they have that, the selfish mining strategy would work much better with one chain. Basically, the subpools would attack each other.
2: In general, holding on to a block for some period after finding it, looks like a potential advantage
is working on the next block. Why not have all the nodes do this as normal operation?
Holding one block for some period is not good for a miner. Selfish miners lose some blocks by doing this. What makes selfish mining profitable is that as soon as they get two blocks ahead they will always win and can wait until the remaining network catches up to kill all the blocks that the honest miners produced.
A selfish mining attack is clearly visible. You get forks that are several blocks long. Bitcoin users can no longer trust confirmed transactions. A miner with enough hashing power to make such an attack should hopefully realize that the damage to Bitcoin and the resulting price drop will make him earn less and make his huge investment in mining hardware almost worthless. Even if they just rented the equipment and would try to monetize their profit fast, it wouldn't work. They only profit after two weeks when the difficulty is adjusted. Before that, they would only lose a lot.