The purpose of the selection is to maximize the income of the miner within the limits of the consensus rules. You could mine with a different objective-- such as maximizing the number of times the string "Jesus" is included in the blockchain-- but given the competitive nature of mining all objectives except the income maximizing one will be eventually driven into bankruptcy... and if you did have some other objective you could augment the fees by the amounts you're willing to subsidize transactions in order to achieve it and then use fees as an objective. So it's sufficient to just consider fee maximizing as the objective of transaction selection.
Maximizing the fee income of a block at first glance looks like a classical knapsack problem: You have a sack of a particular size, and you want to maximize the value of the sack by filling it with items of various sizes and values.
One solution is to sort items by their value density (their fees divided by the weight capacity they use up; also called feerate in bitcoin) and include items until no more fit. If it happens that the last included item exactly fills the sack then this is an optimal solution. Typically the last item that fits doesn't exactly fill the sack so the optimal solution is much harder but simply filling until you can't gets you very close to optimal: The loss vs optimal is no more than the remaining unfilled sizes times the feerate of the best non-included txn. Rather than leave empty space the software just walks down the options in feerate order and tries to include any txn it can until it runs out of txn or space.
In Bitcoin the problem is somewhat harder because transactions have dependencies and sometimes there is are higher feerate transactions you can't include until after you've included its lower feerate parents. The mining algorithm in Bitcoin Core right now is smart enough to consider transactions along with their ancestors as a 'package' and consider including it. This approach however doesn't consider the fact that one low fee ancestor might enable multiple *sibling* children, and only effectively prioritizes the parents if the best of the children would justify including it. But the specific case of "someone paid me with low feerate, so now I'll spend the unconfirmed input with higher fees to get it confirmed" is something people have long done and so the code has accommodated finding those solutions for a long time.
The full problem considering dependencies is actually a version of the
maximum weight closure problem (MWCP) which has been extensively studied in, ironically, open pit mining (e.g. the mineral extraction business, rather than cryptocurrency mining). In open pit mining you can't mine a chunk of earth without removing the chunks around the one above it, so you may have to extract unprofitable blocks to get to the good stuff and the best mining sequence can be much more profitable than one that wasn't carefully designed.
There is ongoing work to replace the current approach in Bitcoin Core with one based on optimal solutions to the MWCP as part of the cluster mempool project:
https://github.com/bitcoin/bitcoin/issues/30289 which ought to be completely merged soon, if you're going to spend a lot of time learning how it works I'd recommend studying the cluster mempool work instead of the current behavior because the current behavior will soon be obsolete. For Bitcoin Core this work is more important than just making the most profitable blocks, since when mempools reach their size limits the *worst* transactions according to the mining sequence are the best ones to drop and they're also the least important to relay. Similarly, transaction replacement is only incentive compatible when it improves the income of blocks mined including the replacement. So making good decisions about the mining order of transactions is important to nodes even when they're not mining.
w/ cluster mempool the normal behavior of Bitcoin Core will produce optimal feerate maximizing blocks except in so far as they won't optimally pack the last bit of the block when the optimal selection doesn't fit exactly (it will just do a best effort fill of whatever fits), that it won't consider alternative replacements or conflicting transactions, or if a interlinked cluster of transactions is >64 tx in size or takes to long to find an optimal ordering, and in the rare case that the other limits (like sigops limit) are meaningful it will make suboptimal selections with respect to them.
There are also generic solver tools to generate good solutions to the bitcoin block problem, which could even solve for things like mutually exclusive replacement sets, accurately handle multiple limits, etc... but those are not incremental and online and so aren't optimized for the other things Bitcoin Core needs to know the mining order for, or-- for that matter-- producing a new block template ASAP after a block was found.