it would be impossible to reach a consensus on which transactions are to be included in the "block pool"
Of course all of them, as it is today (up to mempool limit around 300 MB, as it is today). If the transaction is totally new, it will just begin with very easy proof of work. And when some node will see the same transaction with higher proof of work, it will be replaced. And if there will be two blocks having the same transaction, the block with lower hash will win.
Miners should have the say in which transaction they want to include since they're the ones collecting the fees, not the users.
Of course, they can pick any transaction, even if it is non-standard or has no fee, if they really want. The users are not mining their transactions, so they can only observe what miners are telling them, they cannot change it without having significant computing power or enormous amount of luck.
Changing the coinbase transaction would change the entire merkle root.
Yes, and it have to be changed anyway, because in other case there is no progress in mining. It does not matter if the order of transactions is changed, some transactions are replaced with other transactions, or if the coinbase is changed. There have to be some change, but it does not really matter what will be changed as long as produced block is valid and has lower difficulty than the block being replaced.
Users shouldn't trust any transactions until they are put into the blockchain.
But they are and they will put more and more trust in 0-conf transactions. For example, the whole lightning network is based on 0-conf transactions.
Even if you can use a huge amount of hashrate to back your own transaction, the final say lies within whatever is put into the blockchain.
Yes, the final block will always have the last word on what is valid and what is not. But as long as something is unconfirmed, it just may be convenient for user to have some feedback instead of just knowing that some transaction is still unconfirmed.
If you want faster confirmation, the best method is to reduce the block intervals.
Reducing the block intervals will reduce difficulty, assuming the same network power. There is no difference between 30 confirmations on chain with 20 seconds block time and 1 confirmation on chain with 10 minutes block time, assuming everything else remains the same. Doing this at mempool level is what can be achieved by soft-fork, but to reduce block intervals you will need some hard-fork.