Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: jose_see on August 14, 2024, 11:41:15 AM



Title: Script Decoding: Simple Specification
Post by: jose_see on August 14, 2024, 11:41:15 AM
Hello y’all,

Happy to be a new member of this forum.

My goal is to create a program to decode any script in a bitcoin UTXO (even if it has been spent, that is not the issue here).

I am looking for an unambiguous, complete and detailed specification of a script payload. I looked here and there and could not find anything that looks 100% up-to-date or clear enough.

Also, if you could just highlight me the high-level steps to decode a payload, I would be very happy, because it means I could try to make an implementation myself.

There is a couple of issues I am facing:
- I know there is plenty of type of scripts out there, but is the specification generic enough to be able to make a simple implementation ?
- How can I decode the public key and make it a base58 bitcoin adress ?
- Is it really possible to have a generic implementation to decode all possible scripts ?

My primary language is Python, so if you know libraries that match this requirements (preferable up-to-date with the latest bitcoin protocol evolution such as native segwit), it would be awesome.

Thanks for your reading.

Thanks for your input


Title: Re: Script Decoding: Simple Specification
Post by: pooya87 on August 14, 2024, 01:28:57 PM
Start here: https://en.bitcoin.it/wiki/Script
Then read the interpreter class https://github.com/bitcoin/bitcoin/blob/master/src/script/interpreter.cpp
The evaluation starts on line #L406 (https://github.com/bitcoin/bitcoin/blob/master/src/script/interpreter.cpp#L406)

Quote
- I know there is plenty of type of scripts out there, but is the specification generic enough to be able to make a simple implementation ?
There are only about a handful of standard output scripts in UTXOs:
Code:
P2PK   <pubkey> OP_CHECKSIG
P2PKH  OP_DUP OP_HASH160 <160-bit hash> OP_EQUALVERIFY OP_CHECKSIG
P2SH   OP_HASH160 <160-bit hash> OP_EQUAL
P2WPKH OP_0 <160-bit hash>
P2WSH  OP_0 <256-bit hash>
P2TR   OP_1 <256-bit tweaked pubkey>
NULL   OP_RETURN <80 bytes>
If this is all you want, it is very easy to implement. You just have to know which byte represents which OP code and how to read the script as bytes.

Quote
- How can I decode the public key and make it a base58 bitcoin adress ?
If you mean P2PKH address, then you take the public key (33 byte compressed is common but it can be 65 byte uncompressed) then compute its SHA256 hash then take that and compute its RIPEMD160 hash. Then add (prepend) the version byte to the start of it (version for MainNet is 0) then feed the resulting 33 bytes to the Base58 encoder that returns the result with a checksum.

Quote
- Is it really possible to have a generic implementation to decode all possible scripts ?
Yes but that is harder to implement since you'd have to implement the entire interpreter class I posted above.


Title: Re: Script Decoding: Simple Specification
Post by: jose_see on August 14, 2024, 02:44:53 PM
Thank you sir for the very clear explanation, this is very much helpful and extremely appreciated.

I realized that I have been trying to extract an address from a P2SH script. The transaction in question is https://live.blockcypher.com/btc/tx/bfa5b2de068fbb1b963e479138b3b5db0670f584d60d45cf4ee50a85b4e1f483/

If you decode it here https://live.blockcypher.com/btc/decodetx/

You get
Code:
{
    "addresses": [
        "1EgVG6jkqUTpq9oZtXzSiYuMYDKsYLzrqL"
    ],
    "block_height": -1,
    "block_index": -1,
    "confirmations": 0,
    "double_spend": false,
    "fees": 0,
    "hash": "bfa5b2de068fbb1b963e479138b3b5db0670f584d60d45cf4ee50a85b4e1f483",
    "inputs": [
        {
            "age": 0,
            "output_index": -1,
            "script": "042f931d1a028000",
            "script_type": "empty",
            "sequence": 4294967295
        }
    ],
    "outputs": [
        {
            "addresses": [
                "1EgVG6jkqUTpq9oZtXzSiYuMYDKsYLzrqL"
            ],
            "script": "4104827404c816fe73adfe0f02020c8097dc9c0cafb304ca9089c80011825573f61465c17388b76a2ece60121c84773fe9f7c7b2870566dbbdfbce4aaceda35d8c41ac",
            "script_type": "pay-to-pubkey",
            "value": 5008000000
        }
    ],
    "preference": "low",
    "received": "2024-08-14T14:31:27.281306319Z",
    "relayed_by": "3.231.209.196",
    "size": 135,
    "total": 5008000000,
    "ver": 1,
    "vin_sz": 1,
    "vout_sz": 1,
    "vsize": 135
}

Now I’ve found this message: https://bitcointalk.org/index.php?topic=5265034.msg54884971#msg54884971

Also, some code on the internet:

Code:
def address(script_pub_key: Octets, network: str = "mainnet") -> str:
    """Return the bech32/base58 address from a script_pub_key."""
    if script_pub_key:
        script_type, payload = type_and_payload(script_pub_key)
        if script_type in ("p2pkh", "p2sh"):
            return b58.address_from_h160(script_type, payload, network)
        if script_type in ("p2wsh", "p2wpkh"):
            return b32.address_from_witness(0, payload, network)
        if script_type == "p2tr":
            return b32.address_from_witness(1, payload, network)

    # not script_pub_key
    # or
    # script_type in ("p2pk", "p2ms", "nulldata", "unknown")
    return ""

So P2PK script do not seem to be able to hold an address, which was confusing me totally.

Actually, it seems like blockcypher is taking the public key, hash it using https://learnmeabitcoin.com/technical/keys/public-key/hash/#hash160-tool, add a null byte and base58 encode everything, which effectively gives
from
Code:
04827404c816fe73adfe0f02020c8097dc9c0cafb304ca9089c80011825573f61465c17388b76a2ece60121c84773fe9f7c7b2870566dbbdfbce4aaceda35d8c41
The value
Code:
>>> base58.b58encode(bytes.fromhex("00961172bf46d12fc3b3ff3d3bf4473d08870079a8f29b9fcf"))
b'1EgVG6jkqUTpq9oZtXzSiYuMYDKsYLzrqL'

Not sure if this address has any practical usage  ???


Title: Re: Script Decoding: Simple Specification
Post by: pooya87 on August 14, 2024, 04:58:39 PM
That's a weird thing block explorers have been doing for as long as I remember. They convert P2PK outputs into P2PKH and show the balance locked in the P2PK script as the balance of the P2PKH address. Maybe its because they wanted to make searching easier since there is no human readable format (aka an address) defined for P2PK scripts; and since the private key is the same it make sense to some extent.


Title: Re: Script Decoding: Simple Specification
Post by: amaclin1 on August 14, 2024, 07:11:31 PM
Maybe its because they wanted to make searching easier
Imagine how many questions there will be "what is the address of Satoshi Nakamoto?"


Title: Re: Script Decoding: Simple Specification
Post by: NotATether on August 15, 2024, 06:42:52 AM
ZPyWallet contains a module for decoding scripts here: https://github.com/ZenulAbidin/zpywallet/blob/master/zpywallet/transactions/decode.py

It is almost self-contained, if you want it in a single file just copy and paste the relevant imports into one file.

In the scripts/ folder you will also find decoding functions for every single opcode.

Disclosure: I made this library. I have tested this part of the code, but there are some areas that are way outside this file that still need some unit testing. Feel free to use it if you want.