You are confused as to how it works.
Segwit actually does _eliminate_ the size limit. The weight limit is constructed in a way which is compatible with the old limit, such that pre-segwit nodes will not think their limit is violated under any condition.
There are not two distinct limits, avoiding that was a design _requirement_ because multiple limits require multidimensional optimization in mining which would be a serious computational burden and because it would make accurate fee estimation intractable. (because the fees you would need to pay would depend on the relative contention of the various limits, which depends on the compositions of the transactions in the future after you author your own.)
Weight = 3 x witness-stripped-size + size; and the limit is that the weight must be less than 4 million. Old nodes receive witness stripped blocks and so they always accept the blocks under their own limits.
(And, FWIW, this is how Bitcoin does all the calculations since 0.13-- the results are the same as the only logic when there are no segwit txn-- so the size is already gone, just witness tx are not yet in use).
Selecting transactions by highest fee per weight is the unique income maximizing solution, no other priority order can produce more fee income (+/- small knapsack boundary effects-- e.g. you might skip a higher rate input in order to fill the block more completely).
Ok, thank you for the thorough response. It makes sense now and I understand segwit a bit better.