Bitcoin Forum
December 11, 2024, 03:41:08 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2]  All
  Print  
Author Topic: Max block size should also consider the size of UTXO set  (Read 3320 times)
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
May 09, 2015, 06:00:40 PM
 #21

All UTXOs are considered to be potentially spendable, unless proven otherwise

I think having an exact definition of unspendable is ideal.  This is necessary for UTXO set commitments, since unspendable outputs should not be entered into the UTXO set.

At the moment, CScript.IsUnspendable() is true unless the script starts with OP_RETURN.

I think the disabled opcodes are OK unless they are part of an executed branch?

Are scriptPubKeys always accepted as long as they encode a byte array?  Is there a check that it is valid script when decoding?

The easiest is just to use OP_RETURN at the start to mark unspendable.  The next easiest would be if any of the disabled opcodes (or OP_RETURN) occur outside an OP_IF OP_ENDIF portion of the script.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
jl2012 (OP)
Legendary
*
Offline Offline

Activity: 1792
Merit: 1111


View Profile
May 09, 2015, 06:23:47 PM
 #22

All UTXOs are considered to be potentially spendable, unless proven otherwise

I think having an exact definition of unspendable is ideal.  This is necessary for UTXO set commitments, since unspendable outputs should not be entered into the UTXO set.

At the moment, CScript.IsUnspendable() is true unless the script starts with OP_RETURN.

I think the disabled opcodes are OK unless they are part of an executed branch?

Are scriptPubKeys always accepted as long as they encode a byte array?  Is there a check that it is valid script when decoding?

The easiest is just to use OP_RETURN at the start to mark unspendable.  The next easiest would be if any of the disabled opcodes (or OP_RETURN) occur outside an OP_IF OP_ENDIF portion of the script.

I'm not sure how accurate https://en.bitcoin.it/wiki/Script is. In my understanding, the following codes will fail the script, even if they occur in an unexecuted OP_IF branch

RETURN, CAT, SUBSTR, LEFT, RIGHT, INVERT, AND, OR, XOR, 2MUL, 2DIV, MUL, DIV, MOD, LSHIFT, RSHIFT, VERIF, VERNOTIF, 0xba-0xff

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
May 09, 2015, 06:48:42 PM
 #23

I'm not sure how accurate https://en.bitcoin.it/wiki/Script is. In my understanding, the following codes will fail the script, even if they occur in an unexecuted OP_IF branch

RETURN, CAT, SUBSTR, LEFT, RIGHT, INVERT, AND, OR, XOR, 2MUL, 2DIV, MUL, DIV, MOD, LSHIFT, RSHIFT, VERIF, VERNOTIF, 0xba-0xff

Looking at the source code.

The sequence is

fExec is set (true means next opcode is executed)

opcode fails to parse (only fails for double sized opcodes) -> FAIL

last push to stack to large -> FAIL

check number of non-push opcodes > maximum -> FAILS

OP_<disabled> -> FAIL

if (fExec || opcode is an IF-type opcode)
   // No curly brackets unfortunately
   case
   OP_VERIFY:  FAIL if not true
   OP_RETURN:  FAIL
   (and all the normal opcodes)

The disabled opcodes are hard FAILS no matter what happens with OP_IF.  OP_RETURN only fails if executed.

Easy rule would be that a script is unspendable if it contains a disabled opcode or OP_RETURN before the first OP_IF.  It is much easier to just check if the first opcode is OP_RETURN.

Personally, I would like to keep hope that the disabled opcodes will be re-enabled eventually Smiley.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
jl2012 (OP)
Legendary
*
Offline Offline

Activity: 1792
Merit: 1111


View Profile
May 09, 2015, 06:59:35 PM
 #24


Personally, I would like to keep hope that the disabled opcodes will be re-enabled eventually Smiley.

There is no point to "re-enable" a code because that's a hardfork. If we are going to re-introduce codes like OP_CAT, the most likely way is to softfork with P2SHv2 or OP_EVALv2

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4284
Merit: 8816



View Profile WWW
May 09, 2015, 07:02:09 PM
Last edit: May 09, 2015, 07:19:17 PM by gmaxwell
 #25

since unspendable outputs should not be entered into the UTXO set.
Unless P == NP, unspendability is undecidable in the general case... someone can always write a script whos spendability can't be decided without insanely large work, if they want. (e.g. push zeros, OP_SHA1 OP_EQUALVERIFY ... does the all zeros sha1 have a preimage?)

So once you're depending on their cooperation, defining a single known unspendable kind seems reasonable. Someone who wants to bloat the utxo set with an unspendable output always can; unfortunately.

Using that system, the entire UTXO set would be 16 bytes per entry for the key and value.
Thats not much below the current system. Value and outpoint index are compressed and just take a couple bytes, the scriptpubkey is template compressed and takes 20 bytes for most scriptpubkeys. The version, height, coinbase flag, and txid is shared among all outputs with the same id. Smiley

Quote
It wouldn't even be required to store the entire hash.  A node could salt the hashes and collisions would be very unlikely.
I'd previously suggested something similar (using a permutation) to encrypt the utxo data locally; to reduce problems with virus data in the UTXO triggering AV and the risk of committing a strict liability crime storing someone elses data in the UTXO set.  Especially considering how small a compact encoding is, I'm not sure that doing something where one must repeat the scriptpubkey across the network (to provide the preimage) is a real win. Bandwidth seems likely to be in shorter supply than even fast storage.

I don't think collisions are even harmful; so long as they're unpredictable by an attacker.  Well I suppose I could just flood you with invalid transactions knowing that after enough one would collide for sure and you'd allow it. Then you'll go mine a bad block and fork yourself off-- but that scales exponentially with the data size, it's a multiway second preimage problem. So even a 64 bit value is pretty hard to hit. Tricky, against 10M UTXO, with a 64bit hash, the expected number of bad transactions to get a hit is 10,000 bad transactions per second for about 2135 days.  So long as you're sending the preimages though the amount stored doesn't need to be normative; increasing the size to 72 bits makes it look quite a bit less feasible.

I wrote this long time ago when I knew Bitcoin not very well. But I always think this has to be done if we increase the max block size.

I would like to rewrite it as follow:

We will formally define "utxoSize", which is

txid (32 bytes) + txoIndex (varInt) + scriptPubKey length (varInt) + scriptPubKey (? bytes) + value (8 byte) + coinbase (1 byte) + coinbase block height (0 or 4 bytes)

This is much larger than the encoding we currently use.

Quote
Code:
utxoDiff = (total utxoSize of new UTXOs) - (total utxoSize of spent UTXOs)
With utxoDiff, we may have a soft-fork rule of one of the followings:
  • 1. Put a hardcoded cap to utxoDiff for each block. Therefore, we will be sure that the UTXO set won't grow too fast; or
  • 2. If a block has a small (or even negative) utxoDiff, a higher MAX_BLOCK_SIZE is allowed
That will encourage miners to accept txs with small or negative utxoDiff, which will keep the UTXO set small.

Having multiple limits makes the linear programming problem of deciding which transactions to include much harder (both computationally and algorithmically); it also means that a wallet cannot precisely set a fees per byte based on the content of the transaction without knowing the structure of all the other transactions in the mempool.

Did you see my proposal where I replace size with a single augmented 'cost' that is a weighed sum of the relevant costly thing we'd hope to limit?  (this wouldn't prevent having worse case limits in any of the dimensions too; if really needed).


TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
May 09, 2015, 07:09:30 PM
 #26

Unless P == NP, unspendability is undecidable in the general case

There needs to be agreement on what counts as definitely unspendable for UTXO set commitments. 

Unless the script starts with OP_RETURN, it is entered into the UTXO set would be fine.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
jl2012 (OP)
Legendary
*
Offline Offline

Activity: 1792
Merit: 1111


View Profile
May 09, 2015, 07:20:50 PM
 #27

since unspendable outputs should not be entered into the UTXO set.
Unless P == NP, unspendability is undecidable in the general case... someone can always write a script whos spendability can't be decided without insanely large work, if they want. (e.g. push zeros, OP_SHA1 OP_EQUALVERIFY ... does the all zeros sha1 have a preimage?)

So once you're depending on their cooperation, defining a single known unspendable kind seems reasonable. Someone who wants to bloat the utxo set with an unspendable output always can; unfortunately.

Using that system, the entire UTXO set would be 16 bytes per entry for the key and value.
Thats not much below the current system. Value and outpoint index are compressed and just take a couple bytes, the scriptpubkey is template compressed and takes 20 bytes for most scriptpubkeys. The version, height, coinbase flag, and txid is shared among all outputs with the same id. Smiley

Quote
It wouldn't even be required to store the entire hash.  A node could salt the hashes and collisions would be very unlikely.
I'd previously suggested something similar (using a permutation) to encrypt the utxo data locally; to reduce problems with virus data in the UTXO triggering AV and the risk of committing a strict liability crime storing someone elses data in the UTXO set.  Especially considering how small a compact encoding is, I'm not sure that doing something where one must repeat the scriptpubkey across the network (to provide the preimage) is a real win. Bandwidth seems likely to be in shorter supply than even fast storage.

I wrote this long time ago when I knew Bitcoin not very well. But I always think this has to be done if we increase the max block size.

I would like to rewrite it as follow:

We will formally define "utxoSize", which is

txid (32 bytes) + txoIndex (varInt) + scriptPubKey length (varInt) + scriptPubKey (? bytes) + value (8 byte) + coinbase (1 byte) + coinbase block height (0 or 4 bytes)

This is much larger than the encoding we currently use.

Quote
Code:
utxoDiff = (total utxoSize of new UTXOs) - (total utxoSize of spent UTXOs)
With utxoDiff, we may have a soft-fork rule of one of the followings:
  • 1. Put a hardcoded cap to utxoDiff for each block. Therefore, we will be sure that the UTXO set won't grow too fast; or
  • 2. If a block has a small (or even negative) utxoDiff, a higher MAX_BLOCK_SIZE is allowed
That will encourage miners to accept txs with small or negative utxoDiff, which will keep the UTXO set small.

Having multiple limits makes the linear programming problem of deciding which transactions to include much harder (both computationally and algorithmically); it also means that a wallet cannot precisely set a fees per byte based on the content of the transaction without knowing the structure of all the other transactions in the mempool.

Did you see my proposal where I replace size with a single augmented 'cost' that is a weighed sum of the relevant costly thing we'd hope to limit?  (this wouldn't prevent having worse case limits in any of the dimensions too; if really needed).


So your idea is to replace the MAX_BLOCK_SIZE with a single composite score of block size, delta utxo size, and something else?

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4284
Merit: 8816



View Profile WWW
May 09, 2015, 07:28:16 PM
Last edit: May 09, 2015, 07:39:51 PM by gmaxwell
 #28

So your idea is to replace the MAX_BLOCK_SIZE with a single composite score of block size, delta utxo size, and something else?
Yes (something else would be the sigops limit; though sigops actually have a kind of quadratic cost, due to the cost of rehashing the transaction and all its inputs; but all this only need to be very roughly correct in order to get the behavioral incentives (don't use frivolous sigops!) right).
jl2012 (OP)
Legendary
*
Offline Offline

Activity: 1792
Merit: 1111


View Profile
May 09, 2015, 07:47:39 PM
 #29

So your idea is to replace the MAX_BLOCK_SIZE with a single composite score of block size, delta utxo size, and something else?
Yes (something else would be the sigops limit).

I think the relationship is:

block size => bandwidth and storage cost
utxo size => RAM cost
sigops => CPU cost

Then the formula has to be reviewed regularly (i.e. having hardfork regularly), depending on each component's development in the real world.

However, my impression is that the CPU cost is negligible comparing with bandwidth, storage and RAM and the sigop limit is there just as an anti-DOS mechanism

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4284
Merit: 8816



View Profile WWW
May 09, 2015, 08:08:26 PM
Last edit: May 09, 2015, 09:09:59 PM by gmaxwell
 #30

block size => bandwidth and storage cost
utxo size => RAM cost
No the UTXO isn't in ram, it's just more storage; it's more costly because every verifier must have it online where as the majority of nodes can forget all the rest once its burred; and certainly doesn't need to access it. I'd suggest thinking about it as storage which is necessarily online for the UTXO and storage which is mostly offline/nearline for the history, once you're past the most recent couple hundred blocks or so.
Quote
sigops => CPU cost
However, my impression is that the CPU cost is negligible comparing with bandwidth, storage and RAM and the sigop limit is there just as an anti-DOS mechanism
It's not exactly negligible; especially when one considers the cost of catching up-- there its the dominating cost currently for many people due to sig operations syncing the chain only runs at about 11Mbit/sec or so on a 3.2GHz quad core... (Well, libsecp256k1 will make this considerably faster-- but still slower than high speed consumer broadband that is widely available)  It only doesn't impact block propagation because virtually all signatures are cached from the initial txn relay now.

I don't really think it needs to be adjusted often (or hopefully ever); in that it can be set pretty conservatively and the most important point is getting a situation where "I will pay less fees if I do bar, so if I'm otherwise neutral I should choose to do so."-- partially this is because you're right that it's mostly an anti-dos mechanism.
jl2012 (OP)
Legendary
*
Offline Offline

Activity: 1792
Merit: 1111


View Profile
May 10, 2015, 03:56:55 AM
 #31


My formula works with bytes all around, though there is some scaling because signatures tend to be much bigger than outputs and limits because I'm trying to avoid some edge cases like miners bloating the utxo set with cheaply spendable outputs in order to store-and-forward their blocksize.

Do you document your formula somewhere?

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
May 10, 2015, 07:49:29 AM
 #32

Do you document your formula somewhere?

It was on the mailing list.

Quote from: gmaxwell
An example would be tx_size = MAX( real_size >> 1,  real_size + 4*utxo_created_size - 3*utxo_consumed_size).   The reason for the MAX is so that a block which cleaned a bunch of big UTXO could not break software by being super large, the utxo_consumed basically lets you credit your fees by cleaning the utxo set; but since you get less credit than you cost the pressure should be downward but not hugely so. The 1/2, 4, 3 I regard as parameters which I don't have very strong opinions on which could be set based on observations in the network today (e.g. adjusted so that a normal cleaning transaction can hit the minimum size).

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
solex
Legendary
*
Offline Offline

Activity: 1078
Merit: 1006


100 satoshis -> ISO code


View Profile
May 10, 2015, 11:30:09 AM
 #33

Some further thoughts for comment...

Given a delta uxto size (whether composite with sigops or not), is it better to:

credit or debit the required fee during tx preparation
  or
aggregate delta utxo size at a block level and scale an "allowed" block size which can be mined for the given tx set in it
  or
both?

Considering how the utxo set is expanding so fast, would it be wise to look at changing the fundamental basis for free tx, from days-destroyed to negative delta uxto size? This could be implemented much sooner than changes associated directly with the block limit.

TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
May 10, 2015, 01:26:37 PM
 #34

I created a draft BIP relating to etx and eblock messages.

If a protocol version number was used, then a new block message name is not strictly required.

The updates to the reference client may not be that difficult.  Transactions could be stored internally in the new format and translated when sending/receiving from the network.

To support legacy blocks, the entire blockchain would have to be reindexed.

Some further thoughts for comment...

Given a delta uxto size (whether composite with sigops or not), is it better to:

credit or debit the required fee during tx preparation
  or
aggregate delta utxo size at a block level and scale an "allowed" block size which can be mined for the given tx set in it
  or
both?

I think setting a required fee as a network rule is the wrong approach.  Either have the UTXO limits as a separate rule or folded into size.  That way there is some kind of market price effect.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
jl2012 (OP)
Legendary
*
Offline Offline

Activity: 1792
Merit: 1111


View Profile
May 10, 2015, 03:43:17 PM
 #35

I created a draft BIP relating to etx and eblock messages.

If a protocol version number was used, then a new block message name is not strictly required.

The updates to the reference client may not be that difficult.  Transactions could be stored internally in the new format and translated when sending/receiving from the network.

To support legacy blocks, the entire blockchain would have to be reindexed.

Some further thoughts for comment...

Given a delta uxto size (whether composite with sigops or not), is it better to:

credit or debit the required fee during tx preparation
  or
aggregate delta utxo size at a block level and scale an "allowed" block size which can be mined for the given tx set in it
  or
both?

I think setting a required fee as a network rule is the wrong approach.  Either have the UTXO limits as a separate rule or folded into size.  That way there is some kind of market price effect.

I think it's a good idea and should have an new thread for that.

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517
Pages: « 1 [2]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!