So 2 bytes indicate the size of the data?
It is "up to" 2 bytes. It can be 0, 1 or 2.
For example it could be OP_Return OP_2 (which passes
IsPushOnly()).
Do i need to specify OP_PUSHDATA on these 2 bytes?
All data has to be pushed with its appropriate size byte(s).
* Any bytes that can translate to a number should use OP_number ({0} should use OP_0 instead of 0x0100, {16} should use OP_16 instead of 0x0110)
* Any bytes with lengths smaller than 0x4c (76) is pushed with 1 byte equal to the size (byte[10] -> 10 + byte[10]; byte[70] -> 70 + byte[70])
* Any bytes bigger than or equal to 0x4c is pushed by using 0x4c (ie. OP_PUSHDATA) followed by the length followed by the data (byte[80] -> OP_PUSHDATA + 80 + byte[80])
* Any bytes with length bigger than 255 uses 0x4d (OP_PUSHDATA2)
* Any bytes with length bigger than 65535 (0xffff) uses 0x4e (OP_PUSHDATA4)
6A # OP_RETURN code
04 # Size of the data in byte?
32 30 32 31 # the data
This is the correct one.
Data length is 4 bytes and 4 is smaller than 76 so 0x04 is used as length followed by the data itself.
As for converting "2021" string to bytes, bitcoin doesn't care what is used, UTF8, ASCII, Unicode,... it is up to the extra protocol (like the side chain) or the user's preference to decide. For instance you could use Unicode and convert emojis to a byte stream that you push after OP_Return.
Basically as impossible as sending bitcoin through 0-conf transaction, that means anyone who make transaction with multiple OP_RETURN or OP_RETURN where the data is more than 80 bytes must contact pool/miner and hope they agree to include your transaction.
Yes.
I just remember there are variation of UTF-8 from 1 to 4 bytes, do you mean 1 byte UTF-8?
I don't know much about UTF-8 but I don't think there is such thing as "1 byte UTF-8".
AFAIK the way this encoding works is that you read one byte at a time and decide based on that byte how many more bytes you need to create the first character.
For example if the first byte is 0xxxxxxx (in binary where 0 is zero and x can be 1 or 0) then that byte is the character itself.
If the first byte is 110xxxxx (in binary) then you have to read another byte and the two bytes represent the character (the second byte also has to use 10xxxxxx format).
Similarly 1110xxxx for 3 bytes and 11110xxx for 4.
That means you can end up reading 4 bytes from the stream to be able to represent 1 character (0xF0, 0x90, 0x8D, 0x88 -> 𐍈
Hwair since the binary is
11110000
10010000
10001101
10001000)
You can check the Wikipedia link
https://en.wikipedia.org/wiki/UTF-8#EncodingOr here is the .net source code in C# where it converts the bytes to string when it can't be mapped 1 byte to ASCII chars:
https://source.dot.net/#System.Private.CoreLib/Utf8Utility.Validation.cs,250