So how exactly a block is added to the chain? Why some blocks have very few transactions while others have a many? What does it mean that "a new block is discovered"?
I still did not understand the relationship between a miner and transaction confirmation.
When you send a transaction you broadcast it to any peers you are connected to. Those peers validate the transaction and then re-broadcast it to any peers they are connected to. Those peers validate the transaction and re-broadcast it, and so on. Every peer that hears about the transaction keeps track of it in a memory pool so they can transmit it to any new peers that haven't heard about it yet.
Eventually some of the peers that hear about the transaction are miners. Miners choose unconfirmed transactions from the list they have in memory of all currently unconfirmed transactions. The protocol allows miners can use any criteria they want for deciding which transactions to choose. They can choose just a few if they want, they can choose enough to completely fill a block (there is currently a 1 megabyte limit to the maximum size of a block). The miner also creates a special transaction called a "coinbase" transaction. The coinbase transaction pays the block reward to an address (or addresses) of the miner's choosing. This block reward is allowed by the protocol to be equal to the sum of the block subsidy (currently 25 BTC) and all transaction fees from all transactions that the miner has selected
for the block.
Once the miner has determined which of the unconfirmed transactions they want to include in the block they are creating, they generate a special hash of the transactions (called a "merkle root"), adn include it in a block header that they create. If any transaction changes (or the order of the transactions changes), then the merkle root will be an entirely different value. As such, the merkle root can later be used by any/every peer to verify that the merkle root in the block header truly represents the exact list of transactions in the block. There are various other attributes that are stored in the block header that the miner is creating (such as a date/time stamp, the hash of the most recently solved block on the network, the block size, etc).
Once the miner has finished creating a block header they calculate a pair of SHA-256 hashes of the header and compare the result to the current difficulty target determined by the protocol. If the value of the hash is low enough then the block is "solved". If it is not low enough, then the miner increments a special field in the block header called a nonce (the only purpose of which is to provide a fast easy way for the miner to alter the contents of the header so that a new hash value will result). The miner then calculates the pair of SHA-256 hashes of the block header and compares the resulting value to the difficulty target again. The miner repeats this process until they either find a hash with a low enough value, or receive a newly solved block relayed to them from a peer.
If they receive a newly solved block from a peer, they check all the values of the block to ensure it is valid. Then they add it to their own copy of the blockchain, remove from their memory all the previously unconfirmed transactions that are in the block they just received, discard the block they've been working on, and begin working on a new block per the steps outlined above.
If they find a hash of a low enough value before they receive a newly solved block from a peer, then they are lucky enough to be the miner that has just solved a new block. They broadcast the block header and list of transactions to all the peers that they are connected to. Each peer checks all the values of the block to ensure it is valid. Then they add it to their own copy of the blockchain, remove from their memory all the previously unconfirmed transactions that are in the block that they just received, and re-broadcast the block header and transactions to each peer they are connected to. This process repeats until all peers on the network have heard about the new block and added it to their own copy of the blockchain.
The first confirmation on a transaction simply means that a miner has chosen the transaction for a block they were working on and successfully "solved" the block by finding a nonce that results in a hash of low enough value and broadcasting it.
Any change to the contents of a block will result in different hash value for the block. This means that nobody can convince any peer on the network to change the contents of a block without first going through the effort of finding a hash of low enough value to be considered "solved". This is a process that takes the entire hashing power of the entire bitcoin network approximately 10 minutes on average to find. Meanwhile the rest of the network is busy working on the next block. Since the hash value of a block is always included in the following block, someone attempting to modify a transaction in an older block must recreate (and re-solve) every single block that has occurred since, and they have to do it faster than the rest of the network is solving blocks or they'll never catch up.
Therefore, each new block that is added to the blockchain makes it exponentially more difficult for an attack to go back and modify a transaction that was added to an older block. For this reason every new block added to the blockchain is considered to be an additional "confirmation" on top of every transaction already in the blockchain.
TLDR;
Miners select transactions and create a set of them which is then called a "block".
Miners do some work that requires significant processing power to come up with a value that proves they did the work.
Miners then broadcast the selected transactions and the proof that they did the work.
If a transaction is in the set selected by the miner that successfully finds the value, then the transaction is considered "confirmed".
This proof of work makes it difficult for an attacker to modify the transaction history.