As piotr_n pointed out the bitcoin expects the highest bit to be zero so if it isn't an extra zero byte is added. This means that r & s will always be 32 or 33 bytes. Don't quote me but I believe this is a bug in OpenSSL which was copied and now everyone has to keep the bug in the code to ensure it remains compatible.
You can't think of a signature as a curve point (x,y), right? A signature is a pair of 256-bit numbers (r,s), not a point on secp256k1.
Correct. A point in computed in the creation of the signature (k x G) but it is not the signature itself. r & s are not an x & y value.
Did I mark-up the image correctly (r comes before s)?
Yes.
If I know r and s as 256-bit integers (big numbers), how exactly do I DER encode them?
You will also need to know the sighash. Given r,s, and sighash they are arranged in the following order:
<Len_sig><sequence = 0x30><len_rs><integer = 0x02><len_r><r_value><integer 0x02><len_s><s_value><sighash>
All elements are 1 byte except r & s which will be 32 or 33 bytes. Also be sure to read up carefully on sighash because it is "moved" (for reasons that are beyond me). Another Satoshi-ism I guess.
r_value = Convert r into a little endian byte array. If the leading bit is not zero then prepend a zero value byte.
s_value = Convert s into a little endian byte array. If the leading bit is not zero then prepend a zero value byte.
r_len = number of bytes for r (always 20 or 21)
s_len = number of bytes for s (always 20 or 21)
sequence = always 0x30
integer = always 0x02
len_rs = r_len + s_len + 2 (two extra bytes for the two integer bytes)
len_sig = len_rs + 3 (three extra bytes for the len_rs byte, the sequence byte and the sighash byte
What follows next is whatever is needed to complete the script which is encumbering the outputs. The PubKey then follows when redeeming Pay2PubKeyHash outputs but is not universally present in other output types (i.e. Pay2PubKey).
An aside: Besides DER being the standard format of OpenSSL, is there really any benefit to DER encoding? It seems to me we could simply pack r and s and have all signatures exactly 64 bytes long.
Correct. There is very little reason to use DER encoding other than satoshi did it that way. A new version of the tx format could created which is more space efficient. The major advantage of DER encoding is sharing information between incompatible systems. It is excessively verbose to facilitate a data interchange. Putting DER signature inside propreitary data makes no sense. Knowing DER doesn't allow you to decode a Bitcoin tx, and if you can decode a Bitcoin tx you could just follow explicit rules to decode r,s, sighash, pubkeys, etc just like you need to follow explicit rules to decode the tx version, number of inputs, sequence, etc.