Partial validating nodes

jl2012 (OP)

Legendary

Offline

Activity: 1792
Merit: 1097

Partial validating nodes

June 29, 2015, 10:55:38 AM

#1

With increase in block size, it will become more difficult to run a full node. However, SPV nodes may still contribute by validating some, but not all blocks.

Based on my estimation with not-so-random sampling of blocks, I assume:
3 inputs per transaction on average
550 bytes per transaction on average

Also, 1kB = 1000 bytes, 1MB = 1000kB.

With 8MB block, that would be 14545 tx / block. To locate a tx with 14545 tx / block that would require 14 levels of Merkle hash, i.e. 14*256/8 = 448 bytes / input

So an SPV proof for each input would take 550 + 448 = 998 bytes. Lets round it to 1kB.

With 14545 tx / block, that means 43636 inputs / block. Therefore, for a partial validating node to validate a 8MB block (assuming inputs are valid), it has to download 8MB + 43.636MB = 51.636MB of data.

We currently have around 5900 full nodes. Since partial validating nodes does not need to store the block data permanently, we may have at least 10,000 partial validating nodes in the future. With each node validating only 1 block per day, each of the 144 blocks in a day will be independently validated by 69 nodes.

--------------------------------------

To further extend the idea, partial validating nodes may sign the validated blocks. With some kind of web-of-trust system people may validate the blockchain collaboratively

--------------------------------------

Even mobile devices may become partial validating nodes when they are connected to AC power and WIFI. It may take a few days for them to validate a single block but it still contributes to the network security as (hopefully) there will be many mobile devices with SPV wallets.

--------------------------------------

What happens if an invalid tx is found? A fraud warning system is tricky because that could become a DoS vector. We may require fraud warning to be signed by bitcoin address with a certain amount of deposit. This would be a different topic.

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Partial validating nodes

June 29, 2015, 11:08:08 AM

#2

Quote from: jl2012 on June 29, 2015, 10:55:38 AM

With increase in block size, it will become more difficult to run a full node. However, SPV nodes may still contribute by validating some, but not all blocks.

Based on my estimation with not-so-random sampling of blocks, I assume:
3 inputs per transaction on average
550 bytes per transaction on average

Also, 1kB = 1000 bytes, 1MB = 1000kB.

With 8MB block, that would be 14545 tx / block. To locate a tx with 14545 tx / block that would require 14 levels of Merkle hash, i.e. 14*256/8 = 448 bytes / input

So an SPV proof for each input would take 550 + 448 = 998 bytes. Lets round it to 1kB.

With 14545 tx / block, that means 43636 inputs / block. Therefore, for a partial validating node to validate a 8MB block (assuming inputs are valid), it has to download 8MB + 43.636MB = 51.636MB of data.

That is sufficient to show that the transactions in the block are all correctly signed. It doesn't protect against double spending.

Archive nodes could store each block including that extra meta-data.

Each partial node could validate and then store those blocks. It could then serve the 50MB of block data directly to any other node that wants to do validation.

Quote

What happens if an invalid tx is found? A fraud warning system is tricky because that could become a DoS vector. We may require fraud warning to be signed by bitcoin address with a certain amount of deposit. This would be a different topic.

Fraud proofs cannot be used as a DOS system. Fraud proofs should be relative short and quick to check.

If a block has 100 invalid transactions, the fraud proof for the block only needs to broadcast for one of them.

Nodes would only forward fraud proofs for the main chain.

With UTXO set commitments, even double spending can be protected against.

I think having entire blocks as the 'atom' for partial verification is a good idea. This means that inflation doesn't need an extra check.

The disadvantage is that an inflation fraud proof is the size of the entire block.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Partial validating nodes

June 29, 2015, 11:24:03 AM
Last edit: November 13, 2015, 12:15:36 AM by TierNolan

#3

You need to enumerate all the required fraud proofs

(This doesn't necessarily include them all)

- block to large
-- This requires a Merkle sum-tree for tx size

- Invalid merkle sum-tree for sizes

- inflation
-- This requires a Merkle sum-tree for fees

- Invalid merkle sum-tree for fees

- double spending
-- This requires the two spending transactions and merkle paths
-- Already supported

- false minting (spending non-existant output)
-- This requires UTXO commitments [edit] or commitments for every tx input in the block[/edit]

- Invalid UTXO commitments

- invalid signature
-- source and spend transactions and merkle paths
-- Already supported

- sum(outputs) > sum(inputs) transaction
-- All input transactions, overspending transaction and merkle paths
-- Already supported

Even limiting transactions to 100kB doesn't necessarily prevent things from getting to large. An overspending 100kB transaction with lots of 100kB inputs could give a massive fraud proof.

A consensus rule could be added that the total size of all inputs into a transaction cannot exceed 200kB and transactions cannot exceed 100kB. This keeps the fraud proof limited (though still very large).

It would be necessary to go through the entire set of consensus rules and create a fraud proof for every check that is performed.

It is also necessary to create a check of anything that is used for checking. If UTXO commitments are added, then fraud proofs are needed for the UTXO set commitment tree.

Ideally, there would be a guarantee that the maximum size of a fraud proof has a finite limit. Some of the elements of the fraud proof scale with O(log(N)) so it can't be guaranteed entirely, but it should be possible to guarantee in practice.

Fraud proof don't protect against miners withholding some info. You can't prove a block is invalid if you only have 99% of the transactions in the block.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

jl2012 (OP)

Legendary

Offline

Activity: 1792
Merit: 1097

Re: Partial validating nodes

June 29, 2015, 03:07:22 PM

#4

My proposal proofs these types of fraud: block too large (also too many sigop, etc.), inflation, invalid signature, sum(outputs) > sum(inputs) transaction

For spending non-existing output and miners withholding info, people will assume miners are guilty unless it is proven otherwise. Logically speaking, there is no reason to withhold info if there is nothing wrong

The real problem seems to be double spending. With current protocol, only a full node may discover a double spending.

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Partial validating nodes

June 29, 2015, 05:36:46 PM

#5

You can cover double spending and false minting with UTXO set commitments.

You need to include a merkle path through the UTXO set for each input and output.

I created a thread with the specification.

This means that a block can be verified entirely locally.

Assuming that there are 32 million elements in the UTXO set, then each path would be 25 digests long. If there are 8000 inputs and 8000 outputs, then the first 13 digests in each path would occur multiple times.

The block size would be

standard block: 1MB
repeated digests: 16000 * 32 = 512kB
remaining paths: 16000 * (25 - 13) * 32 = 6MB

Total: 7.5MB

This is much higher than the standard block, but it is completely self contained. You can prove that there has been no double spending and that the UTXO root node is updated correctly.

Quote from: jl2012 on June 29, 2015, 03:07:22 PM

For spending non-existing output and miners withholding info, people will assume miners are guilty unless it is proven otherwise. Logically speaking, there is no reason to withhold info if there is nothing wrong

The real problem seems to be double spending. With current protocol, only a full node may discover a double spending.

The problem is that it is hard to prove to other nodes that data is missing. If your node is only checking 1% of the blocks, and I claim that a portion of block 234,567 is missing. How can I prove it to you? It is wide open to a DOS attack. Even if you immediately get the block when you request it, you can't be sure that I wasn't under an isolation attack and honestly couldn't get the block.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

jl2012 (OP)

Legendary

Offline

Activity: 1792
Merit: 1097

Re: Partial validating nodes

June 29, 2015, 06:27:10 PM

#6

Quote from: TierNolan on June 29, 2015, 05:36:46 PM

Quote from: jl2012 on June 29, 2015, 03:07:22 PM

For spending non-existing output and miners withholding info, people will assume miners are guilty unless it is proven otherwise. Logically speaking, there is no reason to withhold info if there is nothing wrong

The real problem seems to be double spending. With current protocol, only a full node may discover a double spending.

The problem is that it is hard to prove to other nodes that data is missing. If your node is only checking 1% of the blocks, and I claim that a portion of block 234,567 is missing. How can I prove it to you? It is wide open to a DOS attack. Even if you immediately get the block when you request it, you can't be sure that I wasn't under an isolation attack and honestly couldn't get the block.

This is why I say fraud proof could be a DOS attack vector.

We can solve this by a soft fork:

1. Redefine OP_NOP9 as OP_FRAUDPROOF
2. A validator will deposit some bitcoins to this scriptPubKey: <validator's public key> OP_FRAUDPROOF OP_2DROP (The OP_2DROP is needed for complying with BIP62)
3. The validator can always get his money back any time with this scriptSig: OP_0 <validator's sig>. In this case, the "validator's sig" is same as a signature for OP_CHECKSIG
4. The bitcoins can also be spent by anyone, if the validator signs a fraudproof. The scriptSig will be <hash of block X><validator's sig testifying fraud in block X>. However, this scripSig is valid only in the descendants of block X.
5. If the block X is invalid, the validator will lose nothing because the descendants of X won't be valid anyway.
6. If the block X is valid, someone (most likely a miner) will collect the bitcoin and the blockchain goes on.

Problems:
1. Why would a voluntary validator agree to lock his bitcoin this way, just for charity?
2. Validator has to keep the private key in a hot wallet
3. When there is a hardfork, non-upgrading validators will lose money.

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Partial validating nodes

June 29, 2015, 06:52:10 PM

#7

Quote from: jl2012 on June 29, 2015, 06:27:10 PM

This is why I say fraud proof could be a DOS attack vector.

The fraud proof itself can't be used as a DOS attack. Fraud proofs are self-proving. You just need the header chain + fraud proof to verify that it is a valid fraud proof and then you can forward it.

Only the first fraud proof for a given block should be forwarded as that causes the block to be blacklisted.

The only DOS attack is claims that data is missing.

Quote

We can solve this by a soft fork:

1. Redefine OP_NOP9 as OP_FRAUDPROOF
2. A validator will deposit some bitcoins to this scriptPubKey: <validator's public key> OP_FRAUDPROOF OP_2DROP (The OP_2DROP is needed for complying with BIP62)
3. The validator can always get his money back any time with this scriptSig: OP_0 <validator's sig>. In this case, the "validator's sig" is same as a signature for OP_CHECKSIG
4. The bitcoins can also be spent by anyone, if the validator signs a fraudproof. The scriptSig will be <hash of block X><validator's sig testifying fraud in block X>. However, this scripSig is valid only in the descendants of block X.
5. If the block X is invalid, the validator will lose nothing because the descendants of X won't be valid anyway.
6. If the block X is valid, someone (most likely a miner) will collect the bitcoin and the blockchain goes on.

The only claim required is "transaction X in block Y is missing". I don't really see how this helps with that.

Validators could honestly not realise that the data is available.

Quote

Problems:
1. Why would a voluntary validator agree to lock his bitcoin this way, just for charity?
2. Validator has to keep the private key in a hot wallet
3. When there is a hardfork, non-upgrading validators will lose money.

Maybe, validators could get a share of the fees somehow.

The fundamental problem is ensuring that a majority of miners are willing to broadcast their blocks quickly. I think proof of publication would be helpful here. Under proof of publication, blocks are just lists of byte arrays. This is easy to verify, so may reduce the centralisation pressure.

Proof of publication combined with something like p2pool might be enough to ensure that data isn't withheld. Once all nodes have consistent access to block data, fraud proofs can handle everything else.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

Realpra

Hero Member

Offline

Activity: 815
Merit: 1000

Re: Partial validating nodes

June 30, 2015, 08:20:07 PM

#8

Quote from: TierNolan on June 29, 2015, 06:52:10 PM

The fundamental problem is ensuring that a majority of miners are willing to broadcast their blocks quickly.

Since you covered everything else nicely I will just comment on this.

A network of partially validating nodes would be very fast since each node only needs to deal with a small subset of data.

So if you configure them so that a block isn't valid until say 3-10 nodes from each validator subset says they are + no fraud proofs are handed in - miners would have very little incentive to withhold data - their block would just be orphaned.

(Validator subsets would remember the full history of certain address scripts and get all other data from their peers)

As for selfish mining that is a different topic that I believe is easily solvable.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Partial validating nodes

June 30, 2015, 09:15:32 PM

#9

Quote from: Realpra on June 30, 2015, 08:20:07 PM

A network of partially validating nodes would be very fast since each node only needs to deal with a small subset of data.

So if you configure them so that a block isn't valid until say 3-10 nodes from each validator subset says they are + no fraud proofs are handed in - miners would have very little incentive to withhold data - their block would just be orphaned.

The validators only need to verify that the block was actually published. If that is guaranteed then fraud will be found. The problem is that there is no way to punish the validator if that wasn't true. 1 hour later, they can just publish the data to prove that they shouldn't be punished, since the data was published.

This means that validators need to actually validate what they are checking. If the data they certify is incorrect, they lose their deposit.

You need to break down blocks into small pieces that can be verified individually. I think 1MB is a reasonable chunk of data. A block would consist of sub-blocks. Each sub-block would have the UTXO commitments and any additional information required for proving. This UTXO commitment would include the hash of the scriptPubKey and the value. This keeps everything self-contained.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

kanzure

Newbie

Offline

Activity: 13
Merit: 4

Re: Partial validating nodes

November 12, 2015, 08:39:26 PM

#10

Quote from: TierNolan on June 29, 2015, 11:24:03 AM

- false minting (spending non-existant output)
-- This requires UTXO commitments

- Invalid UTXO commitments

UTXO commitments are not required for proving one of those kinds of fraud, apparently:

https://botbot.me/freenode/bitcoin-core-dev/msg/54014301/

"You do them without a UTXO commitment by instead committing to the input block and offset that the input came from. Then fraud can be proven by simply showing whatever is at that location, if its not the thing its supposed to be."

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Partial validating nodes

November 13, 2015, 12:14:35 AM

#11

Quote from: kanzure on November 12, 2015, 08:39:26 PM

"You do them without a UTXO commitment by instead committing to the input block and offset that the input came from. Then fraud can be proven by simply showing whatever is at that location, if its not the thing its supposed to be."

Ahh ok. You include the proof that the output actually exists as part of the transaction.

It does mean that you need commitments though. You have to include commitments for every input in the block.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

gmaxwell

Moderator
Legendary

Offline

Activity: 4172
Merit: 8419

Re: Partial validating nodes

November 13, 2015, 07:55:30 AM

#12

Quote from: TierNolan on November 13, 2015, 12:14:35 AM

Quote from: kanzure on November 12, 2015, 08:39:26 PM

"You do them without a UTXO commitment by instead committing to the input block and offset that the input came from. Then fraud can be proven by simply showing whatever is at that location, if its not the thing its supposed to be."

Ahh ok. You include the proof that the output actually exists as part of the transaction.

It does mean that you need commitments though. You have to include commitments for every input in the block.

No different from having to commit to input values for fee computation inflation avoidance though agreement or size for size limits though. And the commitments are small, e.g. could be sent in just a couple bytes. They don't even need to be sent to a full node that has the data already. (though it's better to do so, so the block hash can be checked before looking up the inputs)