But anyway some more thoughts: because its no longer a first past the post race
Mining is [...]
NOT a "first past the post race". There is no upper bound on the number of blocks solved per unit time. When a new block is found on the network you simply switch to extending the new chain.
While the effect is the same, I disagree: the race to claim transaction fees and reward is a first past the post race, because orphan blocks do not get to keep any of the fees nor reward (in the single winning chain approach). The fact that miners will start a new race as soon as they learn that a past race is won, doesnt mean they are not engaging in a first past the post race (it just means they enjoy racing and immediately try the next race;)
The reason bitcoin mining is fair, despite the first past the post race, is that hashcash based proof-of-work is
power-fair.
Hashcash proof of work is power-fair because as you alluded it has no memory (its like a coin toss, with no progress within the work, and all sequences of choices of nonces taking the same amount of work). Most of the other proof of work functions do not have this power-fairness property (eg client-puzzles, amortizable hashcash, time-lock, Dwork-Naor pricing functions (maybe)). Scrypt is power-fair I think. If scrypt turned out not to have the power-fair property its a security bug and people with fast processors will be able to get a disproportionate advantage.
However the need for power-fairness in the proof-of-work function is just because of the first past the post race choice. For other cooperative race types it is not needed.
A way to see why power-fairness is needed in first past the post (and that bitcoin is a first past the post) is imagine the bitcoin proof of work was tweaked to use a simple non-power fair proof like amortizable hashcash with eg 256 smaller proof of works with same expected 10mins time total... 2.34 seconds per challenge. (Amortizable here just means the challenge is to collect 256 sub-challenges.) This achieves 16x lower standard deviation which is potentially desirable because it is achieved without incurring network traffic, neither on the main chain, nor on a p2pool chain. With this approach you can see there is work-progress so it is no longer power-fair. Ie a fast node is going to win races disproportionately even accounting for its power.
I made a racing car analogy for reduced variance in
https://bitcointalk.org/index.php?topic=182252.msg1911750#msg1911750A loose analogy imagine currently bitcoin miners are race cars. Some are fast (ferrari) and some are slow (citroen 2cv) but they are all very very unreliable. So who wins the race? The ferrari mostly, but the 2cv still has a fair chance relative to its speed because the ferrari is really likely to break down. With low variance coins, you have well maintained cars, and they very rarely break down. So the ferrari wins almost always. Now if you have a line of 20 cars of varying speeds, well maintained (low variance) the first 5 that are going to get past the post are almost certainly going to be the 5 fastest. No one else stands a chance hardly.
You make some more points:
Controlling the time between blocks is also important for minimizing bandwidth and computation, especially for SPV nodes. Amiller had made a nice suggestion regarding merging orphans for the purpose of making the block time dynamically adapt to the diameter, though that doesn't itself address keeping the network usable by SPV nodes.
FWIW, "P2pool" does solve the variance nicely— including allowing miners variable difficulty work (though confined to not result in shares faster than six per minute, to control the cost and prevent convergence problems)— without burdening the perpetually stored Bitcoin network with frequent tiny blocks.
Your points about increasing number of packets and slight bandwidth increase are valid downsides.
(I think the bandwidth increase would not have to be too large as nodes could refer to other variable cost blocks by block hash, they only need to add any additional transactions they have seen that are missing.)
I think I need to re-read p2pool a 2nd time to comment on the other bit.
If the time between blocks becomes small relative to diameter then the network will start having convergence failures and large reorgs (even absent an attacker).
Btw that sounds like a separate argument against alt-coins that shorten the block time interval.
Adam