One of the most important questions in my opinion is:
What "difficult mathematical problem" is being computed?
I mean not in the "baby language" - but let's say - in understanable way for a so called "non-technical" person.
To start with you need to understand (or at least accept) that there are certain mathematical calculations that are very easy&fast to do in one direction and very difficult&slow to do in the opposite direction.
One example of this would be multiplying large prime numbers.
If I give you 2 prime numbers:
9931 and 9803
You can quickly and easily multiply them together and determine that the result is:
97,353,593
On the other hand if I give you a number:
93,603,659
And ask you which two prime numbers it is the product of, you would find it difficult and time consuming to determine that the answer is:
9923 & 9433
Now that we have established that there are mathematical problems that can be easy&fast in one direction and difficult&slow in the other. . .
The particular mathematical problem being solved during mining is called a SHA-256 hash. This algorithm takes a set of data and calculates a 256 bit number from that data. It is very fast and easy for a computer to calculate a SHA-256 hash from a set of data. It is extremely slow and difficult to go the other way, finding a set of data that will exactly match a given hash. It is so slow and difficult that it can be considered impossible.
Without performing the SHA-256 algorithm, it isn't possible to know ahead of time (with current mathematics) what the resulting 256 bit number will be, so the result is essentially random. Changing even a single bit in the original data changes the resulting 256 bit number in an unpredictable way, the entire SHA-256 algorithm must be re-calculated to determine the new result. If you are given a SHA-256 hash, there is no known fast or easy way to determine what original set of data will result in that particular hash.
The miner packages up all the transactions that they want to include in a block. Then they perform a SHA-256 hashing process on all the transactions and include the resulting number (often referred to as a Merkle Root) in the block header. Changing any information in any transaction will change this Merkle Root, and therefore change the header of the block.
The miner then calculates a SHA-256 hash of the block header. To force miners to prove that they've done a certain amount of mathematical work, the bitcoin protocol sets a target value for this SHA-256 hash. If the hash that the miner calculated is less than the target value, then the block is "solved" and ready to be published to the network. If the hash is greater than the target value, the miner changes a special field in the header called a nonce. This field exists solely for the purpose of modifying the resulting hash value. The miner then recalculates the SHA-256 hash of the block header (with this new nonce value). The miner repeats this process over and over until the resulting hash is less than the target value. There is no known way to predict what nonce will be needed ahead of time. The miner has to just keep trying until they get a low enough hash value.
As the numbers (and speeds) of miners working on this hashing problem increases, they begin discovering a hash that is less than the target value faster. The protocol is then designed to require an even lower target value to make it more difficult to find an appropriate hash. If miners start turning off their mining systems, discovering a hash that is less than the target value will start taking longer. The protocol is then designed to require a higher target value to make it easier to find an appropriate hash. The protocol adjusts this target value every 2016 blocks to keep the average amount of time to solve a block at 10 minutes. This forces the entire network to put "on average" 10 minutes of hashing calculations towards each block.
If someone wanted to change any information in an already solved block, it would change the value of the Merkle Root in the header. This would change the hash of the header. The person attempting to modify the block would have to find a hash of their new header that was lower than the necessary target in order to get any of the peers on the network to accept it as valid.
Each new block includes the exact hash of the previous block in its header. This means that the person attempting to modify data in a solved block has to not only find a low enough hash for the block that they are attempting to modify, but that they have to find low enough hashes for every block that has been created since then. While the person attempting to change an old block is trying to solve all these hashes of low enough value, the rest of the network is working on a new block. This means that the person attempting to modify the blockchain history has to catch up with the rest of the honest network that is trudging on ahead of them. The only way to realistically do this is to have more hashing power than the entire combined honest network.
Note that once a header (with its nonce and resulting hash) is broadcast to the network, it is very fast for each peer to verify that the previous work by the miner was valid. They only need to calculate a single hash of the broadcast header to see if the resulting hash is the broadcast hash, and then compare that hash to the target value to confirm that it is low enough.
Was that too technical? Or did I manage to describe it in "an understandable way for a so called non-technical person"