My attempt:
1) Mining 'creates' coins by solving a puzzle. If the puzzle is solved, the miner broadcasts this to the network, and claims a certain amount of Bitcoin. The network then checks if the miner really solved the puzzle, and if they agree with the solution and the amount of Bitcoin claimed, then the miner gets that amount of Bitcoin. So the coins are 'created out of thin air', but because it's a consensus model - i.e. the majority of the Bitcoin network has to agree - a miner can't just claim 1,000,000 Bitcoin.
2) The coins are generated in a transaction - they aren't actually 'spent from' another address. Addresses themselves are an abstraction for a list of transactions which can be spent if a particular other type of puzzle is solved.
3) For mining, the puzzle that needs to be solved is: given a hash function and a certain input+nonce, find the nonce that causes the value of the hash to be below a certain threshold. Real world analogies (most of them involving toin cosses) are difficult, but a popular chemistry class example might work somewhat: phenolphthalein is used to demonstrate when a solution is more acidic than it is a base. You have to add the base one droplet at a time (titration), and only when the acid has been neutralized does the solution become and remain pink. This titration takes a long time. But once you have done this, you can share the same solution with somebody else and simply tell them how many ml you added, and they can then quickly verify if you are correct by adding the same amount - the solution should turn pink, and then adding a droplet of the original solution again - the solution should turn clear. The analogy is still flawed because for hashing, there's only the single hash operation required to verify the value. If you never had this demonstration in chemistry (or didn't take chemistry), hit up youtube
4) The difficulty adjusts every 2016 blocks based on the amount of time the last 2016 (well, 2015) blocks took to mine. Ideally this should take exactly 2 weeks. If the last 2016 blocks actually took 12 days, then the network is (1-(12/14))*100% ~= 14.3% fast, and the difficulty is adjusted to be ~-14.3% more difficult to solve, so that ideally the next 2016 blocks should take an exact 2 weeks again. This is all done via the code and, once again, via consensus - so there's no single person, organization or company that changes the difficulty.
5) Yes, the mining method where the the hash of an input and a nonce are taken is very much like bruteforcing. The outcome of the hashing function is unpredictable (or so we assume with regard to SHA-256), so a miner has to try a lot of nonces to find a combination of input+nonce that results in a desired hash value.