Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: jl2012 on March 15, 2013, 03:45:52 PM



Title: Max block size should also consider the size of UTXO set
Post by: jl2012 on March 15, 2013, 03:45:52 PM
This is obsolete. See my new proposal below: https://bitcointalk.org/index.php?topic=153401.msg11329252#msg11329252

We should encourage "good transactions", which are: 1. small in size, 2. with less outputs, 3. with practically spendable outputs

Targets 2, 3 are important for maintaining a reasonable size of UTXO set.

The current block size restriction, however, considers only the target 1. Miners will accept polluting transactions (with lots of practically unspendable outputs) as long as enough fee is paid. However, every full nodes has to maintain the inflated UTXO set.

If the block size limit is to be increased, it could be determined by more factors, not just the absolute size.

I have a rough idea:

Let's denote:

S0, S1,.....,Sn: Amount of Satoshis of the output 0, 1 ..... n.
Size: size of the transaction in kB.

The adjusted transaction size is defined as:

Size * (1/(log2S0+1)+1/(log2S1+1)+..+1/(log2Sn+1))

The value of 1/(log2Sn+1) increases exponentially as the output size decreases. The value is 1 for 1 satoshi, 0.5 for 2 satoshi, 0.13 for 1 uBTC, 0.057 for 1mBTC, and 0.036 for 1 BTC.  

The adjusted block size is defined as the sum of adjusted transaction size.

If the real block size is < 1MB, the adjusted block size is not consider. If the real block size is > 1MB, the adjusted block size must be smaller than a certain constant.

Many problems are solved with a system like this:

1. Block size is still scare. If it is < 1MB, it is equivalent to current limit. If it is > 1MB, "good transactions" are prioritised
2. Miners will have an incentive to exclude dust outputs, because that will increase the adjusted block size
3. Miners will love transactions with less outputs, so the UTXO set could be reduced.
4. People trying to send dust outputs and/or inflate UTXO set have to pay more miner fee to compensate for their pollution
5. The block size, which costs bandwidth and disk space, is still accounted.

Since there must be a hard fork when lifting the max block size, adding extra rules like these won't make the change more complicated.


Title: Re: Max box size should also consider the size of UTXO set
Post by: justusranvier on March 15, 2013, 03:50:44 PM
We should
Start a mining pool. You can implement any rule you want with regards to which transactions get included and which do not. Convince other miners that your rules are the best and they will vote with their hashing power.


Title: Re: Max box size should also consider the size of UTXO set
Post by: jl2012 on March 15, 2013, 04:25:17 PM
We should
Start a mining pool. You can implement any rule you want with regards to which transactions get included and which do not. Convince other miners that your rules are the best and they will vote with their hashing power.

No, this is a hard fork.


Title: Re: Max box size should also consider the size of UTXO set
Post by: justusranvier on March 15, 2013, 04:29:30 PM
No, this is a hard fork.
Granted that your block still needs to be valid for the rest of the network to accept it.

But your goal of "encouraging good transactions" does not require any changes to the protocol. You can reject any non-good transactions in your pool and just ask other miners to support you by mining in your pool.

Isn't that a much better solution than putting mandatory fee selection rules in the protocol?


Title: Re: Max box size should also consider the size of UTXO set
Post by: markm on March 15, 2013, 04:33:09 PM
Oh come now, surely it is obvious that other people not being forced to obey "the" rules is doubleplus-ungood?

-MarkM-


Title: Re: Max box size should also consider the size of UTXO set
Post by: jl2012 on March 15, 2013, 04:47:05 PM
No, this is a hard fork.
Granted that your block still needs to be valid for the rest of the network to accept it.

But your goal of "encouraging good transactions" does not require any changes to the protocol. You can reject any non-good transactions in your pool and just ask other miners to support you by mining in your pool.

Isn't that a much better solution than putting mandatory fee selection rules in the protocol?

This is actually a response to the other thread: https://bitcointalk.org/index.php?topic=153133.0

Using this proposal, we can increase the max block size without bloating the UTXO set


Title: Re: Max box size should also consider the size of UTXO set
Post by: d'aniel on March 15, 2013, 08:11:48 PM
Using this proposal, we can increase the max block size without bloating the UTXO set
This.

The argument that a block size limit is necessary to prevent excessive centralization applies equally well to a limit on the per block expansion of the utxo set.  So I think this justifies the creation of such a limit, if a block size limit is to be maintained.

Also, I don't see why we would need to have a single metric to describe usage of the two scarce resources, block space and utxo set space.  Wouldn't it be simpler to just have separate limits for both?  They consume distinct physical resources - bandwidth and storage, respectively - and so these parameters should be somewhat orthogonal.


Title: Re: Max box size should also consider the size of UTXO set
Post by: jl2012 on March 16, 2013, 12:56:37 AM
Using this proposal, we can increase the max block size without bloating the UTXO set
This.

The argument that a block size limit is necessary to prevent excessive centralization applies equally well to a limit on the per block expansion of the utxo set.  So I think this justifies the creation of such a limit, if a block size limit is to be maintained.

Also, I don't see why we would need to have a single metric to describe usage of the two scarce resources, block space and utxo set space.  Wouldn't it be simpler to just have separate limits for both?  They consume distinct physical resources - bandwidth and storage, respectively - and so these parameters should be somewhat orthogonal.

Agreed.

We may have a UTXO index, which is the total number of outputs in a block minus the total number of inputs in a block, and have a hard limit for it


Title: Re: Max box size should also consider the size of UTXO set
Post by: Ari on March 16, 2013, 01:07:41 AM
There was a thread on bitcoin-development about this too...

http://sourceforge.net/mailarchive/forum.php?thread_name=CABOyFfrPTYeq-g5tgte2HWfvBiBcRLw_Bvyk_X2hXMWVoW3dgQ%40mail.gmail.com&forum_name=bitcoin-development


Title: Re: Max box size should also consider the size of UTXO set
Post by: solex on May 09, 2015, 11:01:03 AM
...
If the real block size is < 1MB, the adjusted block size is not consider. If the real block size is > 1MB, the adjusted block size must be smaller than a certain constant.

Many problems are solved with a system like this:

1. Block size is still scare. If it is < 1MB, it is equivalent to current limit. If it is > 1MB, "good transactions" are prioritised
2. Miners will have an incentive to exclude dust outputs, because that will increase the adjusted block size
3. Miners will love transactions with less outputs, so the UTXO set could be reduced.
4. People trying to send dust outputs and/or inflate UTXO set have to pay more miner fee to compensate for their pollution
5. The block size, which costs bandwidth and disk space, is still accounted.

Since there must be a hard fork when lifting the max block size, adding extra rules like these won't make the change more complicated.

Apologies for necro-ing this thread, but the OP does seem particularly relevant to Gavin's concerns about UTXO bloat: http://gavinandresen.ninja/utxo-uhoh

and Gregory's recent post:
Quote
Another related point which has been tendered before but seems to have
been ignored is that changing how the size limit is computed can help
better align incentives and thus reduce risk.  E.g. a major cost to the
network is the UTXO impact of transactions, but since the limit is blind
to UTXO impact a miner would gain less income if substantially factoring
UTXO impact into its fee calculations; and without fee impact users have
little reason to optimize their UTXO behavior.   This can be corrected
by augmenting the "size" used for limit calculations.   An example would
be tx_size = MAX( real_size >> 1,  real_size + 4*utxo_created_size -
3*utxo_consumed_size).   The reason for the MAX is so that a block
which cleaned a bunch of big UTXO could not break software by being
super large, the utxo_consumed basically lets you credit your fees by
cleaning the utxo set; but since you get less credit than you cost the
pressure should be downward but not hugely so. The 1/2, 4, 3 I regard
as parameters which I don't have very strong opinions on which could be
set based on observations in the network today (e.g. adjusted so that a
normal cleaning transaction can hit the minimum size).
http://sourceforge.net/p/bitcoin/mailman/message/34097489/

Aligning the demand for larger blocks with pressure to maintain a cleaner UTXO set, may be more constructive, beneficial long-term, and a more immediate priority than concerns about further pressure for the fees market.


Title: Re: Max block size should also consider the size of UTXO set
Post by: gmaxwell on May 09, 2015, 11:46:58 AM
Yea, this has been in several different threads before (I'm not even sure if I was aware of this one). It seems obvious and correct to account for UTXO impact; though the exact parameters seem less clear (though almost any would be workable; though I'd rather not go implement a separate fixed point log for this.)

It also hasn't gone anywhere though these things are trivial to implement; because basically they're more or less tolerable to go without them at the current limits (though they'd be nice; utxo is still growing at a somewhat alarming rate; and the dust limits are kinda ugly hacks); and most of the push for larger blocks starts with the premise that there isn't major risk or trade-off worth addressing, and by that metric additional complexity doesn't sound justified-- and indeed, almost anything is more complex than "change a 1 to a 20". :)

I don't like the framing "Miners will have an incentive to exclude dust" ... rather, transactions have a cost, the change makes the economic cost better aligned with the true costs, and the change means transactions will have to pay in proportion to their costs-- in my suggestion it's set so that creating outputs effectively prepays much of the cost of spending them;  but there is no discrimination against "dust", miners make no judgement-- there is a cost and it's up to the users to justify it. :)

I don't like having txout values in the terms because that has an implicit scaling factor between the value of coins and the value of bytes, and because the value is actually unrelated to the costs to the network. My formula works with bytes all around, though there is some scaling because signatures tend to be much bigger than outputs and limits because I'm trying to avoid some edge cases like miners bloating the utxo set with cheaply spendable outputs in order to store-and-forward their blocksize.


Title: Re: Max block size should also consider the size of UTXO set
Post by: jl2012 on May 09, 2015, 12:06:11 PM
I wrote this long time ago when I knew Bitcoin not very well. But I always think this has to be done if we increase the max block size.

I would like to rewrite it as follow:

We will formally define "utxoSize", which is

txid (32 bytes) + txoIndex (varInt) + scriptPubKey length (varInt) + scriptPubKey (? bytes) + value (8 byte) + coinbase (1 byte) + coinbase block height (0 or 4 bytes)

txid is the txid of the utxo
txoIndex is the index of the utxo in the tx, recorded as varInt
scriptPubKey length is the length of the scriptPubKey of the utxo, a varInt
scriptPubKey is the scriptPubKey of the utxo
value is the utxo value in Satoshis
coinbase is an indicator to show whether the utxo is a coinbase reward
if it is a coinbase reward, 4 more bytes is needed to record the block height

I think these are the minimal amount of data needed to store utxo. the utxo set needs to show whether an utxo is from coinbase due to the 100-block rule.

However, if the utxo contains ANY one of the invalid OP_CODE, e.g. OP_RETURN, OP_VERIF, OP_CAT which guarantees the script to fail, the utxoSize is set to zero.

For each block, we need to calculate the net change in UTXO size, which is

Code:
utxoDiff = (total utxoSize of new UTXOs) - (total utxoSize of spent UTXOs)

With utxoDiff, we may have a soft-fork rule of one of the followings:

  • 1. Put a hardcoded cap to utxoDiff for each block. Therefore, we will be sure that the UTXO set won't grow too fast; or
  • 2. If a block has a small (or even negative) utxoDiff, a higher MAX_BLOCK_SIZE is allowed

That will encourage miners to accept txs with small or negative utxoDiff, which will keep the UTXO set small.


Title: Re: Max block size should also consider the size of UTXO set
Post by: TierNolan on May 09, 2015, 02:29:22 PM
I think these are the minimal amount of data needed to store utxo. the utxo set needs to show whether an utxo is from coinbase due to the 100-block rule.

An UTXO only really needs to contain a hash of the output point (and height).  The spending transaction should include all the information required, when spending.

That saves storing the information in RAM, i.e. the owner of the transaction stores the info.

It wouldn't even be required to store the entire hash.  A node could salt the hashes and collisions would be very unlikely.

The database would be a map of

Hash(key_salt | txid | n) maps to Hash(value_salt | OutPoint | height | ... )

If each hash was reduced to the lower 8 bytes and there were 10 million in the UTXO set, the odds of a collision is still tiny (around 1 in 180,000).

To falsely spend an UTXO, an attacker would need 2 ^ 63 attempts, but that requires knowning the target node's value_salt.

Using that system, the entire UTXO set would be 16 bytes per entry for the key and value.


Title: Re: Max block size should also consider the size of UTXO set
Post by: jl2012 on May 09, 2015, 03:06:57 PM
I think these are the minimal amount of data needed to store utxo. the utxo set needs to show whether an utxo is from coinbase due to the 100-block rule.

An UTXO only really needs to contain a hash of the output point (and height).  The spending transaction should include all the information required, when spending.


How about the scriptPubKey and value of the utxo? These are not part of the spending tx.


Title: Re: Max block size should also consider the size of UTXO set
Post by: TierNolan on May 09, 2015, 03:16:44 PM
How about the scriptPubKey and value of the utxo? These are not part of the spending tx.

I meant that the info isn't strictly needed in the UTXO database (in general).  An extended transaction format could include the scriptPubKey and the value (and also the height, for coinbase spends).

This moves the cost of storing that info from RAM in every node to whoever wants to spend the transaction.  P2SH outputs already effectively do this (well not the value), but there is no reason that all transactions couldn't support it.  

During the transitiion, tx messages could be converted into etx messages by full nodes.  Eventually, tx messages could be depreciated.  Wallet software would have to manage the info for any coins held in its wallet.


Title: Re: Max block size should also consider the size of UTXO set
Post by: jl2012 on May 09, 2015, 03:32:57 PM
How about the scriptPubKey and value of the utxo? These are not part of the spending tx.

I meant that the info isn't strictly needed in the UTXO database (in general).  An extended transaction format could include the scriptPubKey and the value (and also the height, for coinbase spends).

This moves the cost of storing that info from RAM in every node to whoever wants to spend the transaction.  P2SH outputs already effectively do this (well not the value), but there is no reason that all transactions couldn't support it.  

During the transitiion, tx messages could be converted into etx messages by full nodes.  Eventually, tx messages could be depreciated.  Wallet software would have to manage the info for any coins held in its wallet.

ok, but I don't understand the meaning of your: Hash(key_salt | txid | n) maps to Hash(value_salt | OutPoint | height | ... )

Why don't just use an index of Hash(salt|txid|n|height|scriptPubKey|value)? The etx message will provide all these information (expect the salt which is a secret of the node).

Also, the etx message won't need to provide scriptPubKey if it is a standard one like P2PK, P2PKH or P2SH. Just use a byte to indicate the type is enough.


Title: Re: Max block size should also consider the size of UTXO set
Post by: TierNolan on May 09, 2015, 04:16:50 PM
ok, but I don't understand the meaning of your: Hash(key_salt | txid | n) maps to Hash(value_salt | OutPoint | height | ... )

Why don't just use an index of Hash(salt|txid|n|height|scriptPubKey|value)? The etx message will provide all these information (expect the salt which is a secret of the node).

I was thinking you need to be able to look it up from the database, so need (txid | n) to be a key.  However, you are right a hash set is sufficient, if the etx has all the required additional info.

This drops things to 8 bytes per UTXO output for serialization and slightly more due to hashset overhead for RAM.

There could be an eblock too.  This allows nearly complete block verification without needing any info outside the eblock.  The only check is that the UTXOs all actually exist.

I think this would help with the consensus library.  You pass it an eblock message and it returns a list of UTXOs that must exist for the block to be valid and also a list of new UTXO hashes.

This means that (almost) all the complexity of block verification (script and crypt) is handled internally in the library.  The only thing that the outside client needs to be is manage the UTXO digests.

It might be better to use hash(salt | hash(<info>)), so that the salting can be done outside the consensus library.  The consensus library would use 32 byte digests.

Quote
Also, the etx message won't need to provide scriptPubKey if it is a standard one like P2PK, P2PKH or P2SH. Just use a byte to indicate the type is enough.

It still needs to provide the info to generate the scriptPubKey (address or serialized scriptPubKey).


Title: Re: Max block size should also consider the size of UTXO set
Post by: jl2012 on May 09, 2015, 04:38:21 PM
Quote
Also, the etx message won't need to provide scriptPubKey if it is a standard one like P2PK, P2PKH or P2SH. Just use a byte to indicate the type is enough.

It still needs to provide the info to generate the scriptPubKey (address or serialized scriptPubKey).

You don't need any extra info for P2PKH or P2SH because you can reconstruct the scriptPubKey with scriptSig.

(For P2PK and BIP11 multi-sig, however, the etx has to provide the scriptPubKey)


Title: Re: Max block size should also consider the size of UTXO set
Post by: TierNolan on May 09, 2015, 04:47:07 PM
You don't need any extra info for P2PKH or P2SH because you can reconstruct the scriptPubKey with scriptSig.

Yes, of course, sorry.


Title: Re: Max block size should also consider the size of UTXO set
Post by: jl2012 on May 09, 2015, 04:48:54 PM
I wrote this long time ago when I knew Bitcoin not very well. But I always think this has to be done if we increase the max block size.

I would like to rewrite it as follow:

We will formally define "utxoSize", which is

txid (32 bytes) + txoIndex (varInt) + scriptPubKey length (varInt) + scriptPubKey (? bytes) + value (8 byte) + coinbase (1 byte) + coinbase block height (0 or 4 bytes)

txid is the txid of the utxo
txoIndex is the index of the utxo in the tx, recorded as varInt
scriptPubKey length is the length of the scriptPubKey of the utxo, a varInt
scriptPubKey is the scriptPubKey of the utxo
value is the utxo value in Satoshis
coinbase is an indicator to show whether the utxo is a coinbase reward
if it is a coinbase reward, 4 more bytes is needed to record the block height

I think these are the minimal amount of data needed to store utxo. the utxo set needs to show whether an utxo is from coinbase due to the 100-block rule.

However, if the utxo contains ANY one of the invalid OP_CODE, e.g. OP_RETURN, OP_VERIF, OP_CAT which guarantees the script to fail, the utxoSize is set to zero.

For each block, we need to calculate the net change in UTXO size, which is

Code:
utxoDiff = (total utxoSize of new UTXOs) - (total utxoSize of spent UTXOs)

With utxoDiff, we may have a soft-fork rule of one of the followings:

  • 1. Put a hardcoded cap to utxoDiff for each block. Therefore, we will be sure that the UTXO set won't grow too fast; or
  • 2. If a block has a small (or even negative) utxoDiff, a higher MAX_BLOCK_SIZE is allowed

That will encourage miners to accept txs with small or negative utxoDiff, which will keep the UTXO set small.

If we can make the size of utxo entry a constant (as suggest by TierNolan) for any type of utxo, this could be simplified to

Code:
utxoDiff = (# of new potentially spendable UTXOs) - (# of spent UTXOs)
where
(# of new potentially spendable UTXOs) = (# of new UTXOs) - (# of new UTXOs with invalid OP_CODE)

All UTXOs are considered to be potentially spendable, unless proven otherwise


Title: Re: Max block size should also consider the size of UTXO set
Post by: TierNolan on May 09, 2015, 06:00:40 PM
All UTXOs are considered to be potentially spendable, unless proven otherwise

I think having an exact definition of unspendable is ideal.  This is necessary for UTXO set commitments, since unspendable outputs should not be entered into the UTXO set.

At the moment, CScript.IsUnspendable() is true unless the script starts with OP_RETURN.

I think the disabled (https://github.com/bitcoin/bitcoin/blob/master/src/script/interpreter.cpp#L279) opcodes are OK unless they are part of an executed branch?

Are scriptPubKeys always accepted as long as they encode a byte array?  Is there a check that it is valid script when decoding?

The easiest is just to use OP_RETURN at the start to mark unspendable.  The next easiest would be if any of the disabled opcodes (or OP_RETURN) occur outside an OP_IF OP_ENDIF portion of the script.


Title: Re: Max block size should also consider the size of UTXO set
Post by: jl2012 on May 09, 2015, 06:23:47 PM
All UTXOs are considered to be potentially spendable, unless proven otherwise

I think having an exact definition of unspendable is ideal.  This is necessary for UTXO set commitments, since unspendable outputs should not be entered into the UTXO set.

At the moment, CScript.IsUnspendable() is true unless the script starts with OP_RETURN.

I think the disabled (https://github.com/bitcoin/bitcoin/blob/master/src/script/interpreter.cpp#L279) opcodes are OK unless they are part of an executed branch?

Are scriptPubKeys always accepted as long as they encode a byte array?  Is there a check that it is valid script when decoding?

The easiest is just to use OP_RETURN at the start to mark unspendable.  The next easiest would be if any of the disabled opcodes (or OP_RETURN) occur outside an OP_IF OP_ENDIF portion of the script.

I'm not sure how accurate https://en.bitcoin.it/wiki/Script is. In my understanding, the following codes will fail the script, even if they occur in an unexecuted OP_IF branch

RETURN, CAT, SUBSTR, LEFT, RIGHT, INVERT, AND, OR, XOR, 2MUL, 2DIV, MUL, DIV, MOD, LSHIFT, RSHIFT, VERIF, VERNOTIF, 0xba-0xff


Title: Re: Max block size should also consider the size of UTXO set
Post by: TierNolan on May 09, 2015, 06:48:42 PM
I'm not sure how accurate https://en.bitcoin.it/wiki/Script is. In my understanding, the following codes will fail the script, even if they occur in an unexecuted OP_IF branch

RETURN, CAT, SUBSTR, LEFT, RIGHT, INVERT, AND, OR, XOR, 2MUL, 2DIV, MUL, DIV, MOD, LSHIFT, RSHIFT, VERIF, VERNOTIF, 0xba-0xff

Looking at the source code (https://github.com/bitcoin/bitcoin/blob/master/src/script/interpreter.cpp#L265).

The sequence is

fExec is set (true means next opcode is executed)

opcode fails to parse (only fails for double sized opcodes) -> FAIL

last push to stack to large -> FAIL

check number of non-push opcodes > maximum -> FAILS

OP_<disabled> -> FAIL

if (fExec || opcode is an IF-type opcode)
   // No curly brackets unfortunately
   case
   OP_VERIFY:  FAIL if not true
   OP_RETURN:  FAIL
   (and all the normal opcodes)

The disabled opcodes are hard FAILS no matter what happens with OP_IF.  OP_RETURN only fails if executed.

Easy rule would be that a script is unspendable if it contains a disabled opcode or OP_RETURN before the first OP_IF.  It is much easier to just check if the first opcode is OP_RETURN.

Personally, I would like to keep hope that the disabled opcodes will be re-enabled eventually :).


Title: Re: Max block size should also consider the size of UTXO set
Post by: jl2012 on May 09, 2015, 06:59:35 PM

Personally, I would like to keep hope that the disabled opcodes will be re-enabled eventually :).

There is no point to "re-enable" a code because that's a hardfork. If we are going to re-introduce codes like OP_CAT, the most likely way is to softfork with P2SHv2 or OP_EVALv2


Title: Re: Max block size should also consider the size of UTXO set
Post by: gmaxwell on May 09, 2015, 07:02:09 PM
since unspendable outputs should not be entered into the UTXO set.
Unless P == NP, unspendability is undecidable in the general case... someone can always write a script whos spendability can't be decided without insanely large work, if they want. (e.g. push zeros, OP_SHA1 OP_EQUALVERIFY ... does the all zeros sha1 have a preimage?)

So once you're depending on their cooperation, defining a single known unspendable kind seems reasonable. Someone who wants to bloat the utxo set with an unspendable output always can; unfortunately.

Using that system, the entire UTXO set would be 16 bytes per entry for the key and value.
Thats not much below the current system. Value and outpoint index are compressed and just take a couple bytes, the scriptpubkey is template compressed and takes 20 bytes for most scriptpubkeys. The version, height, coinbase flag, and txid is shared among all outputs with the same id. :)

Quote
It wouldn't even be required to store the entire hash.  A node could salt the hashes and collisions would be very unlikely.
I'd previously suggested something similar (using a permutation) to encrypt the utxo data locally; to reduce problems with virus data in the UTXO triggering AV and the risk of committing a strict liability crime storing someone elses data in the UTXO set.  Especially considering how small a compact encoding is, I'm not sure that doing something where one must repeat the scriptpubkey across the network (to provide the preimage) is a real win. Bandwidth seems likely to be in shorter supply than even fast storage.

I don't think collisions are even harmful; so long as they're unpredictable by an attacker.  Well I suppose I could just flood you with invalid transactions knowing that after enough one would collide for sure and you'd allow it. Then you'll go mine a bad block and fork yourself off-- but that scales exponentially with the data size, it's a multiway second preimage problem. So even a 64 bit value is pretty hard to hit. Tricky, against 10M UTXO, with a 64bit hash, the expected number of bad transactions to get a hit is 10,000 bad transactions per second for about 2135 days.  So long as you're sending the preimages though the amount stored doesn't need to be normative; increasing the size to 72 bits makes it look quite a bit less feasible.

I wrote this long time ago when I knew Bitcoin not very well. But I always think this has to be done if we increase the max block size.

I would like to rewrite it as follow:

We will formally define "utxoSize", which is

txid (32 bytes) + txoIndex (varInt) + scriptPubKey length (varInt) + scriptPubKey (? bytes) + value (8 byte) + coinbase (1 byte) + coinbase block height (0 or 4 bytes)

This is much larger than the encoding we currently use.

Quote
Code:
utxoDiff = (total utxoSize of new UTXOs) - (total utxoSize of spent UTXOs)
With utxoDiff, we may have a soft-fork rule of one of the followings:
  • 1. Put a hardcoded cap to utxoDiff for each block. Therefore, we will be sure that the UTXO set won't grow too fast; or
  • 2. If a block has a small (or even negative) utxoDiff, a higher MAX_BLOCK_SIZE is allowed
That will encourage miners to accept txs with small or negative utxoDiff, which will keep the UTXO set small.

Having multiple limits makes the linear programming problem of deciding which transactions to include much harder (both computationally and algorithmically); it also means that a wallet cannot precisely set a fees per byte based on the content of the transaction without knowing the structure of all the other transactions in the mempool.

Did you see my proposal where I replace size with a single augmented 'cost' that is a weighed sum of the relevant costly thing we'd hope to limit?  (this wouldn't prevent having worse case limits in any of the dimensions too; if really needed).




Title: Re: Max block size should also consider the size of UTXO set
Post by: TierNolan on May 09, 2015, 07:09:30 PM
Unless P == NP, unspendability is undecidable in the general case

There needs to be agreement on what counts as definitely unspendable for UTXO set commitments. 

Unless the script starts with OP_RETURN, it is entered into the UTXO set would be fine.


Title: Re: Max block size should also consider the size of UTXO set
Post by: jl2012 on May 09, 2015, 07:20:50 PM
since unspendable outputs should not be entered into the UTXO set.
Unless P == NP, unspendability is undecidable in the general case... someone can always write a script whos spendability can't be decided without insanely large work, if they want. (e.g. push zeros, OP_SHA1 OP_EQUALVERIFY ... does the all zeros sha1 have a preimage?)

So once you're depending on their cooperation, defining a single known unspendable kind seems reasonable. Someone who wants to bloat the utxo set with an unspendable output always can; unfortunately.

Using that system, the entire UTXO set would be 16 bytes per entry for the key and value.
Thats not much below the current system. Value and outpoint index are compressed and just take a couple bytes, the scriptpubkey is template compressed and takes 20 bytes for most scriptpubkeys. The version, height, coinbase flag, and txid is shared among all outputs with the same id. :)

Quote
It wouldn't even be required to store the entire hash.  A node could salt the hashes and collisions would be very unlikely.
I'd previously suggested something similar (using a permutation) to encrypt the utxo data locally; to reduce problems with virus data in the UTXO triggering AV and the risk of committing a strict liability crime storing someone elses data in the UTXO set.  Especially considering how small a compact encoding is, I'm not sure that doing something where one must repeat the scriptpubkey across the network (to provide the preimage) is a real win. Bandwidth seems likely to be in shorter supply than even fast storage.

I wrote this long time ago when I knew Bitcoin not very well. But I always think this has to be done if we increase the max block size.

I would like to rewrite it as follow:

We will formally define "utxoSize", which is

txid (32 bytes) + txoIndex (varInt) + scriptPubKey length (varInt) + scriptPubKey (? bytes) + value (8 byte) + coinbase (1 byte) + coinbase block height (0 or 4 bytes)

This is much larger than the encoding we currently use.

Quote
Code:
utxoDiff = (total utxoSize of new UTXOs) - (total utxoSize of spent UTXOs)
With utxoDiff, we may have a soft-fork rule of one of the followings:
  • 1. Put a hardcoded cap to utxoDiff for each block. Therefore, we will be sure that the UTXO set won't grow too fast; or
  • 2. If a block has a small (or even negative) utxoDiff, a higher MAX_BLOCK_SIZE is allowed
That will encourage miners to accept txs with small or negative utxoDiff, which will keep the UTXO set small.

Having multiple limits makes the linear programming problem of deciding which transactions to include much harder (both computationally and algorithmically); it also means that a wallet cannot precisely set a fees per byte based on the content of the transaction without knowing the structure of all the other transactions in the mempool.

Did you see my proposal where I replace size with a single augmented 'cost' that is a weighed sum of the relevant costly thing we'd hope to limit?  (this wouldn't prevent having worse case limits in any of the dimensions too; if really needed).


So your idea is to replace the MAX_BLOCK_SIZE with a single composite score of block size, delta utxo size, and something else?


Title: Re: Max block size should also consider the size of UTXO set
Post by: gmaxwell on May 09, 2015, 07:28:16 PM
So your idea is to replace the MAX_BLOCK_SIZE with a single composite score of block size, delta utxo size, and something else?
Yes (something else would be the sigops limit; though sigops actually have a kind of quadratic cost, due to the cost of rehashing the transaction and all its inputs; but all this only need to be very roughly correct in order to get the behavioral incentives (don't use frivolous sigops!) right).


Title: Re: Max block size should also consider the size of UTXO set
Post by: jl2012 on May 09, 2015, 07:47:39 PM
So your idea is to replace the MAX_BLOCK_SIZE with a single composite score of block size, delta utxo size, and something else?
Yes (something else would be the sigops limit).

I think the relationship is:

block size => bandwidth and storage cost
utxo size => RAM cost
sigops => CPU cost

Then the formula has to be reviewed regularly (i.e. having hardfork regularly), depending on each component's development in the real world.

However, my impression is that the CPU cost is negligible comparing with bandwidth, storage and RAM and the sigop limit is there just as an anti-DOS mechanism


Title: Re: Max block size should also consider the size of UTXO set
Post by: gmaxwell on May 09, 2015, 08:08:26 PM
block size => bandwidth and storage cost
utxo size => RAM cost
No the UTXO isn't in ram, it's just more storage; it's more costly because every verifier must have it online where as the majority of nodes can forget all the rest once its burred; and certainly doesn't need to access it. I'd suggest thinking about it as storage which is necessarily online for the UTXO and storage which is mostly offline/nearline for the history, once you're past the most recent couple hundred blocks or so.
Quote
sigops => CPU cost
However, my impression is that the CPU cost is negligible comparing with bandwidth, storage and RAM and the sigop limit is there just as an anti-DOS mechanism
It's not exactly negligible; especially when one considers the cost of catching up-- there its the dominating cost currently for many people due to sig operations syncing the chain only runs at about 11Mbit/sec or so on a 3.2GHz quad core... (Well, libsecp256k1 will make this considerably faster-- but still slower than high speed consumer broadband that is widely available)  It only doesn't impact block propagation because virtually all signatures are cached from the initial txn relay now.

I don't really think it needs to be adjusted often (or hopefully ever); in that it can be set pretty conservatively and the most important point is getting a situation where "I will pay less fees if I do bar, so if I'm otherwise neutral I should choose to do so."-- partially this is because you're right that it's mostly an anti-dos mechanism.


Title: Re: Max block size should also consider the size of UTXO set
Post by: jl2012 on May 10, 2015, 03:56:55 AM

My formula works with bytes all around, though there is some scaling because signatures tend to be much bigger than outputs and limits because I'm trying to avoid some edge cases like miners bloating the utxo set with cheaply spendable outputs in order to store-and-forward their blocksize.

Do you document your formula somewhere?


Title: Re: Max block size should also consider the size of UTXO set
Post by: TierNolan on May 10, 2015, 07:49:29 AM
Do you document your formula somewhere?

It was on the mailing list (http://sourceforge.net/p/bitcoin/mailman/message/34097489/).

Quote from: gmaxwell
An example would be tx_size = MAX( real_size >> 1,  real_size + 4*utxo_created_size - 3*utxo_consumed_size).   The reason for the MAX is so that a block which cleaned a bunch of big UTXO could not break software by being super large, the utxo_consumed basically lets you credit your fees by cleaning the utxo set; but since you get less credit than you cost the pressure should be downward but not hugely so. The 1/2, 4, 3 I regard as parameters which I don't have very strong opinions on which could be set based on observations in the network today (e.g. adjusted so that a normal cleaning transaction can hit the minimum size).


Title: Re: Max block size should also consider the size of UTXO set
Post by: solex on May 10, 2015, 11:30:09 AM
Some further thoughts for comment...

Given a delta uxto size (whether composite with sigops or not), is it better to:

credit or debit the required fee during tx preparation
  or
aggregate delta utxo size at a block level and scale an "allowed" block size which can be mined for the given tx set in it
  or
both?

Considering how the utxo set is expanding so fast, would it be wise to look at changing the fundamental basis for free tx, from days-destroyed to negative delta uxto size? This could be implemented much sooner than changes associated directly with the block limit.


Title: Re: Max block size should also consider the size of UTXO set
Post by: TierNolan on May 10, 2015, 01:26:37 PM
I created a draft BIP (https://github.com/TierNolan/bips/blob/extended_transactions/bip-etx.mediawiki) relating to etx and eblock messages.

If a protocol version number was used, then a new block message name is not strictly required.

The updates to the reference client may not be that difficult.  Transactions could be stored internally in the new format and translated when sending/receiving from the network.

To support legacy blocks, the entire blockchain would have to be reindexed.

Some further thoughts for comment...

Given a delta uxto size (whether composite with sigops or not), is it better to:

credit or debit the required fee during tx preparation
  or
aggregate delta utxo size at a block level and scale an "allowed" block size which can be mined for the given tx set in it
  or
both?

I think setting a required fee as a network rule is the wrong approach.  Either have the UTXO limits as a separate rule or folded into size.  That way there is some kind of market price effect.


Title: Re: Max block size should also consider the size of UTXO set
Post by: jl2012 on May 10, 2015, 03:43:17 PM
I created a draft BIP (https://github.com/TierNolan/bips/blob/extended_transactions/bip-etx.mediawiki) relating to etx and eblock messages.

If a protocol version number was used, then a new block message name is not strictly required.

The updates to the reference client may not be that difficult.  Transactions could be stored internally in the new format and translated when sending/receiving from the network.

To support legacy blocks, the entire blockchain would have to be reindexed.

Some further thoughts for comment...

Given a delta uxto size (whether composite with sigops or not), is it better to:

credit or debit the required fee during tx preparation
  or
aggregate delta utxo size at a block level and scale an "allowed" block size which can be mined for the given tx set in it
  or
both?

I think setting a required fee as a network rule is the wrong approach.  Either have the UTXO limits as a separate rule or folded into size.  That way there is some kind of market price effect.

I think it's a good idea and should have an new thread for that.