Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: kabab on July 11, 2018, 06:31:20 AM



Title: Using references in order to compress the byte size of transactions
Post by: kabab on July 11, 2018, 06:31:20 AM
I was thinking of ways a blockchain's format might be designed so that its transaction records can be made more compact. One low hanging fruit appears the following..

Transaction inputs involve previous outputs on the chain. If in place of recording these subsequent inputs by value, they were instead only referenced in transaction records, then perhaps their byte size could be reduced significantly.

The idea would work something like this: whenever a new output is recorded on the chain, an unused (monotonically increasing) numeric ID is recorded next to it so that subsequent blocks may reference that value by ID. Since the space of IDs so defined can probably fit in something like 8-10 bytes, it seems like a considerable savings.

Am I overlooking something? (Is/was this something already considered?)


Title: Re: Using references in order to compress the byte size of transactions
Post by: odolvlobo on July 11, 2018, 07:05:52 PM
How would two different wallets agree on the IDs of the outputs when they construct their transactions? If the IDs are set when the transaction is included in a block, then how would an output that is not yet in a block be referenced?


Title: Re: Using references in order to compress the byte size of transactions
Post by: aliashraf on July 11, 2018, 08:03:40 PM
How would two different wallets agree on the IDs of the outputs when they construct their transactions? If the IDs are set when the transaction is included in a block, then how would an output that is not yet in a block be referenced?
For some reasons, other than reducing tr size, I like the idea of using references to blockchain data instead of the real addresses. For example, a wallet by referencing a transaction in a block, implies its adherence to the state of the blockchain at least in the referenced block.

Your objection could be easily resolved by supporting both legacy tr format and the compact form proposed by op.


Title: Re: Using references in order to compress the byte size of transactions
Post by: achow101 on July 11, 2018, 10:04:37 PM
Transaction inputs involve previous outputs on the chain. If in place of recording these subsequent inputs by value, they were instead only referenced in transaction records, then perhaps their byte size could be reduced significantly.
They already are referred to by reference, not by value. The reference is known as the outpoint, which is the hash of the transaction containing the output and the 0-based index of the output. This is a reference to where a node can find the output, it is not the output itself.

For some reasons, other than reducing tr size, I like the idea of using references to blockchain data instead of the real addresses.
There is no such thing as "real addresses" and Bitcoin already works by using references in many places.


Title: Re: Using references in order to compress the byte size of transactions
Post by: aliashraf on July 12, 2018, 11:03:27 AM
Transaction inputs involve previous outputs on the chain. If in place of recording these subsequent inputs by value, they were instead only referenced in transaction records, then perhaps their byte size could be reduced significantly.
They already are referred to by reference, not by value. The reference is known as the outpoint, which is the hash of the transaction containing the output and the 0-based index of the output. This is a reference to where a node can find the output, it is not the output itself.
For some reasons, other than reducing tr size, I like the idea of using references to blockchain data instead of the real addresses.
There is no such thing as "real addresses" and Bitcoin already works by using references in many places.
You should be more cautious about calling transaction input (the outpoint field) a reference, technically it is not.

A reference should encompass a valuable information about an external event/data which is not the case with current bitcoin implementation of input addresses. let's examine it more closely:

First, we have a transaction with output(s) being tweaked (prefixed and padded properly) RIPEMD-160 hash(es) of respected public key(s).
Please note: Once this transaction is propagated in the network resided in the mempool, is nothing more valuable than when it was created.  

Now, suppose for some reason we like to 'refer' to one of the outputs by saying that it is the nth output of the transaction (using its hash/id), instead of the original wallet address used for the output (the way bitcoin actually implements inputs and you have correctly mentioned it), is it really a reference?

No! It is not. The hash of a value is not a reference to it, hashes don't contain any information, they don't lead you to a new data/informatio.
Ideally speaking, the hash of a value is nothing other than the value itself, a version of it. Having the id of a transaction won't give you any new information about it, you should have access to the original version and for this you have to find it by applying a search on a data structure, probably.

This is why you can conventionally call bitcoin transaction inputs as 'real address', they are real, an alternative version of the reality but with the same information value, obtainable from the transaction.

I'm not questioning this technique, actually it is good for many reasons but not the only option, neither the best one.

What op suggests is a true reference to the blockchain data. To  make it more formal:

A blockchain can be understood and treated as an immutable ordered list of transactions. One can trivially derive an ordered list of input addresses from the transaction list.
Using the absolute position of an input in the derived ordered list of outputs could be considered as an alternative approach to current approach of using the original transaction's address.

My first evaluations, suggest wider consequences than just compression, e.g. it helps security by rendering bootstrap/long range attacks orders of magnitude more difficult.



Title: Re: Using references in order to compress the byte size of transactions
Post by: pebwindkraft on July 12, 2018, 11:20:52 AM
...
You should be more cautious about calling transaction input a reference, technically it is not.

A reference should encompass a valuable information about an external event/data which is not the case with current bitcoin implementation of input addresses. let's examine it more closely:

First, we have a transaction with output(s) being tweaked (prefixed and padded properly) RIPEMD-160 hash(es) of respected public key(s).
Please note: Once this transaction is propagated in the network resided in the mempool, is nothing more valuable than when it was created. 

Now, suppose for some reason we like to 'refer' to one of the outputs by saying that it is the nth output of the transaction (using its hash/id), instead of the original wallet address used for the output (the way bitcoin actually implements inputs and you have correctly mentioned it), is it really a reference?
I don‘t get the mixture between hashes and index. Maybe I am missing something?

The previous transaction is found by giving it‘s hash, and the outpoint to spend from is given as a number starting from zero. So both are pointers into a previous tx, as opposed to providing the whole data structure of a previous tx, which saves a lot of space... and one could call this a reference?

On addresses it is clear, that we talk about data representation. The bitcoin address is derived from the public key via some conversions and mathematical functions, which are not (as per today‘s knowledge) reversibel.

So when it comes to data for the input, it can be considered a reference, whereas the bitcoin addresses don‘t appear in a tx, only it‘s pubkey or the hash of it. They are not references in this sense.


Title: Re: Using references in order to compress the byte size of transactions
Post by: aliashraf on July 12, 2018, 12:35:15 PM
...
You should be more cautious about calling transaction input a reference, technically it is not.

A reference should encompass a valuable information about an external event/data which is not the case with current bitcoin implementation of input addresses. let's examine it more closely:

First, we have a transaction with output(s) being tweaked (prefixed and padded properly) RIPEMD-160 hash(es) of respected public key(s).
Please note: Once this transaction is propagated in the network resided in the mempool, is nothing more valuable than when it was created.  

Now, suppose for some reason we like to 'refer' to one of the outputs by saying that it is the nth output of the transaction (using its hash/id), instead of the original wallet address used for the output (the way bitcoin actually implements inputs and you have correctly mentioned it), is it really a reference?
I don‘t get the mixture between hashes and index. Maybe I am missing something?

The previous transaction is found by giving it‘s hash, and the outpoint to spend from is given as a number starting from zero. So both are pointers into a previous tx, as opposed to providing the whole data structure of a previous tx, which saves a lot of space... and one could call this a reference?
No. It is not a reference. It is just a hash. A hash is not an index or a pointer (pointers are indexes of an ordered list of bytes, memory), although it is common practice when there is no ambiguity to 'treat' them like a reference.

When you give someone a hash of a data, you have just offered an alternative version of the same data, temporarily useless tho, but when the data is revealed it is applicable for comparison and security purposes, yet at the end, no value is added to the data.

Comparatively, a reference to the data (or to its hash as is the case here) yields, both the data (or its hash)  and its location in a list. References are more rich than the raw data. They are processed data, i.e. information. Even a simple pointer to a data structure is more valuable than its copy.

In the context of this discussion, an index to a confirmed transaction (a reference to one of its outputs, precisely) on the blockchain is ways more information rich than a hash of the same transaction. e.g. the user by embedding it as an input, is acknowledging the state of the blockchain at the height of the containing block.


Title: Re: Using references in order to compress the byte size of transactions
Post by: odolvlobo on July 12, 2018, 05:58:55 PM
Another issue is that any fork in the block chain might result in transactions that are valid on one branch being considered invalid on the other branch because the references to the outputs in the competing blocks would be different (assuming that the reference is determined by the miner).


Title: Re: Using references in order to compress the byte size of transactions
Post by: aliashraf on July 12, 2018, 07:40:22 PM
Another issue is that any fork in the block chain might result in transactions that are valid on one branch being considered invalid on the other branch because the references to the outputs in the competing blocks would be different (assuming that the reference is determined by the miner).
It is rather an advantage (the one I'm in love with  ;) ), it helps security and gives wallets more strength on securing blockchain and should be considered  a disruptive improvement with a wide range of socio-economic and political consequences that deserve in-depth investigations and analysis.

It is worth mentioning that referencing data on blockchain by using the index of the outputs as the input is a brilliant idea but in its naive form it doesn't  help security  significantly because an adversary can re-write the blockchain so that he would be able to "steal" the transactions.

To avoid this, we have to remember the 28 bytes of valuable space that would be available! 8 bytes suffices to address more than 18 billion trillion transactions, almost an infinity,with current 36 bytes outpoint field we are left with 28 bytes to make it really hard for the adversary. Say, by adding a strong checksum/trimmed reference to the containing block hash and even more, another checksum for the latest block the wallet is willing to participate in finalizing it, being typically a much "younger" block.

OP is not interested in this subject and is just thinking about compacting stuff. I don't care about compression the main point is user contribution in security a totally new horizon which I'm already embracing and adopting it in my personal  work, PoCW,  Proof of Contributive Work (https://bitcointalk.org/index.php?topic=4438334.0).