I have some questions
Lots of questions there, I'll try and break it down into pieces.
The enitre system is peer-to-peer. So everyone runs a client program that verifies that all the information it receives matches the rules of the protocol, and refuses to forward any information to anyone else if the information isn't valid. This means that a single person (or group of people) can't add false information into the system because it won't get relayed.
how the total number of bitcoins is ultimately limited to 21 million
Satoshi started the block reward (given to the miners for "solving a block") at 50 BTC. The protocol that every client uses makes sure that the block reward is cut in half every 210,000 blocks. If you add up all the bitcoins that can be created as the reward gets smaller and smaller, you'll find that it will never exceed 21 million.
I vaguely understand that new blocks are created through "solving a cryptographic problem" and that this takes an immense amount of computing power. But what I don't understand is what the cryptographic problem they solve is
The miners gather up all the unconfirmed transactions that they want to include in a block. They package these together along with some headers and other useful information. Then they calculate a 256 bit hash (SHA256). This is actually a VERY fast and VERY easy (for a computer) thing to calculate. This hash has essentially a random numeric value between 0 and 1.158e+77 (a very big number). The "difficulty" set by the protocol is a target number that the hash has to be less than. If the hash isn't small enough, then the miner increments a value in the block called a nonce by 1 and recalculates the hash. The miner repeats this process until the generated hash is small enough. Due to the difficulty it can take on average about 13,200,000,000,000,000 hash calculations (today with the current difficulty) to find one that is lower than the target. This takes about 10 minutes for some miner somewhere on the network to happen across. One reason that a hash is so useful is that when the block is solved and published to the connected peers (and then relayed throughout the network) it is very fast and easy for each node to recalculate the hash with the given nonce to verify that it is the correct hash and that the value is low enough. Another reason that the hash is so useful is that there is no known way to predict what the hash value is without calculating it. There is now way to know what nonce will give you a low enough hash without trying each one.
how it was made part of the blockchain.
Once a miner solves a block, they tell all the connected peers. Those peers just validate that the block meets all the rules of the protocol, append the block to the end of the blockchain file that they have, and then relay it on to all the peers that they are connected to. Those peers do the same, and so on until every node has an updated blockchain.
How exactly do the blocks fit together?
Each new block is appended to the end of the blockchain file. I mentioned earlier that each block has some other "useful information", one of those pieces of information is the hash of the previous block. This means that the client can keep track of the "chain" by comparing the hash stored in each new block with the hash value of the previous block. It also means that nobody can change any value at all in any earlier block (and convince the entire network that their new value is valid) without first recalculating hashes of enough difficulty of all the blocks that have come since then. Since the "valid" blockchain is still being updated by the rest of the network you'd be chasing a moving target and have to have more hashing power than all of the bitcoin miners in the whole world combined to be able to regenerate blocks faster than they are generating new ones.
What is the information that is analyzed in order to create new blocks
Each unconfirmed transaction that a miner wants to include in a block is analyzed to make sure that it meets the requirements of the protocol, the nonce is incremented, and hashes are calculated and compared to the current network difficulty.
What are blocks checked against to know they are valid?
Blocks are checked for valid transactions, headers, and hashes. (Valid meaning meeting the requirements of the protocol)
Could the entire 21 million BTC blockchain technically be solved only from information that was in the very first block or whatever algorithm is used for BTC?
No. The blockchain contains all the bitcoin transactions that have ever occurred. You'd have to have a list of those transactions and which block each was in to re-create the blockchain. Even then, since the hashes are essentially random, your re-created blockchain wouldn't match the current one.
Would it be possible (in theory) for all of the information to be known by the creator/Satoshi by having a private key or something that forms the basis for the blockchain?
The transactions and the protocol are the basis of the blockchain. It is not possible to know in advance what value to use for the hash. The only way to figure that out is to iterate through the nonce and keep trying.