It is complicated and probably better to look at the code directly as trying to explain it in English at a high level abstracts some of it away so while this gets you closer it may still leave some parts ambiguous. The second image is closer to the correct scenario however remember a tx can have multiple inputs as well.
A hash of the simplified transaction consisted of a SINGLE INPUT and ALL outputs is created, this is then signed by the private key for that input and stored in the "sig & pubkey" portion of that tx input. See here:
https://en.bitcoin.it/w/images/en/e/e1/TxBinaryMap.png Bitcoin has no concept of "owners" or "wallets" so the change output isn't special, it is an output just like any other. If you are going to create an illustration for explaining then creating one with only a single input is a poor choice because the next questions becomes how to dI handle n inputs where n >1. An example with 2 inputs is better because 3, 20, or 4,000 inputs are handled exactly the same as 2.
The other thing you need to divorce yourself from is you aren't sending funds from a public key, you are using a SPECIFIC unspent output of a prior tx as an input. Yes in conversation we may say "he paid me from address 123..." but when trying to explain Bitcoin's working that falls apart and trying to understand anything becomes impossible. An input is a specific unspent output. It is referred to (in the Tx input) by the tx hash (of the prior tx) and index (which output of that tx is being used here).
So in more general form. Lets assume you have a tx which consists of 3 unspent outputs (which become inputs for this tx) A, B, C and two new outputs Y & Z (it doesn't matter if one of these is a change address or you just happened to pay two different receivers with the exact amount of coins, an inputs is an input and an output is an output).
First the output scripts are created. The "normal" Bitcoin tx is a PayToPubKeyHash, so likely the user provided you a Bitcoin Address. This is reversed into the PubKeyHash and a script which locks that output of a specific value to a specific PubKeyHash is created. The outputs are arranged in a sequence randomly (to avoid leaking which one is the "change"). Lets say Z become TxOut0 and Y becomes TxOut1. The inputs are then arranged in a sequence randomly. Lets say A becomes TxIn0, B becomes TxIn1, and C becomes TxIn2.
We now have a tx which looks like this:
TxIn0
Txin1
TxIn2
TxOut0
TxOut1
A simplified version of the tx is created (TxIn0, TxOut0, TxOut1) and hashed. The hash is signed by the private key for A and stored in the Sig & PubKey section for TxIn0.
A simplified version of the tx is created (TxIn1, TxOut0, TxOut1)and hashed. The hash is signed by the private key for B and stored in the Sig & PubKey section for TxIn1.
A simplified version of the tx is created (TxIn2, TxOut0, TxOut1) and hashed. The hash is signed by the private key for C and stored in the Sig & PubKey section for TxIn2.
One last thing which may not be obvious. There is no 1:1 relationship between inputs and private keys. For example if you have two unspent outputs to the same address (say A & C) then they share the same private key however they are still unique and discrete inputs so the tx above would still have 3 inputs.
I likely am overlooking or misstating something here so I would double check all this against the code.