Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: mustyoshi on July 15, 2013, 09:25:33 PM



Title: What is stopping pruning from being added to the qt client?
Post by: mustyoshi on July 15, 2013, 09:25:33 PM
Can't the client just download and parse the entire blockchain removing the unspent outputs as they get spent?

I don't understand why it hasn't already been implemented, I imagine it would be immensely useful for miners who don't want to foot the entire 10GB blockchain, and just want the probably .5GB block header + unspent output that would be the result of the parsing.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: piotr_n on July 15, 2013, 09:27:57 PM
it does remove unspent outputs as they get spent
it doesn't remove the block data - because then it would not be able to serve blocks to clients that are fresh and need to download the chain.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: mustyoshi on July 15, 2013, 09:31:13 PM
it does remove unspent outputs as they get spent
it doesn't remove the block data - because then it would not be able to serve blocks to clients that are fresh and need to download the chain.
But if transactions have already been embedded in the blockchain, new nodes don't need the data anyways.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: piotr_n on July 15, 2013, 09:36:30 PM
it does remove unspent outputs as they get spent
it doesn't remove the block data - because then it would not be able to serve blocks to clients that are fresh and need to download the chain.
But if transactions have already been embedded in the blockchain, new nodes don't need the data anyways.
Indeed. https://bitcointalk.org/index.php?topic=247110.msg2683001#msg2683001
But you know, they are very busy now with "tackling the low lying fruit" - whatever useless feature it is :)


Title: Re: What is stopping pruning from being added to the qt client?
Post by: gmaxwell on July 16, 2013, 02:09:29 AM
But if transactions have already been embedded in the blockchain, new nodes don't need the data anyways.
They do, in fact. Otherwise miners could just "embed" in the blockchain a bunch of transactions stealing people's coin or creating additional coin out of thin air and your node would happily accept it.  Part of the check and balance of Bitcoin is the ability of miners to cheat is reduced to very narrow cases, making it unprofitable for them to do so.

Bitcoin is based on the concept of autonomous validation— The Bitcoin algorithm is zero trust as much as is possible. It would be great if the whole thing could be completely autonomous, but there appears to be no way to autonomously achieve consensus ordering, so Bitcoin uses a  vote of scarce resource for just to determine ordering. But this requires actually checking the historical data, not just trusting it blindly.  There could be a lot of flexibility in how nodes go about doing this— no need to delay the initial usage for it, for example, but it's not a pointless activity.

piotr_n, Its really frightening to see someone claiming to write Bitcoin software for other people to use who doesn't understand this, and proposing changes which would reduce the security model with some implication that they weren't a trade-off and weren't being done simply because someone was lazy or stupid.  I'll take comfort in the view that you're probably making these claims not because you believe them but because they serve to further your inexplicable insulting conduct.

An additional point of correction, which is obvious to anyone who is familiar with the development of the reference software but which might not be known to readers of the thread:  Bitcoin 0.8 featured a complete rewrite of the entire storage engine used it Bitcoin which was the necessary first step to implementing pruning.  In addition to the massive short term performance improvements this risky and laborious rewrite also achieved the complete decoupling of the historical data and the data needed for validation... prior to 0.7 Bitcoin needed random access to the blocks even during normal operation, which would have made pruning pretty difficult to implement.  Right now all thats required is solving the P2P issues related to announcing and discovering historical blocks in a network where not every full node has them.  Mistakes made in database structure changes or mistakes made in the p2p changes could all cause network wide failures, loss of funds, etc. So these actions must be undertaken carefully.



Title: Re: What is stopping pruning from being added to the qt client?
Post by: piotr_n on July 16, 2013, 08:38:19 AM
piotr_n, Its really frightening to see someone claiming to write Bitcoin software for other people to use who doesn't understand this, and proposing changes which would reduce the security model with some implication that they weren't a trade-off and weren't being done simply because someone was lazy or stupid.  I'll take comfort in the view that you're probably making these claims not because you believe them but because they serve to further your inexplicable insulting conduct.
Excuse me?
What I don't understand about writing bitcoin software, that you do?
You have this patronizing attitude all the time, but I don't think you are smarter than me, so I am not going to bend before you, just because you are here longer.
If I have an idea, I am always open for a criticism - as long as it is technical and specific (not like in your quote above).

Are you saying that I don't understand how the concept of checkpoints reduces bitcoin's security model?
Oh, trust me that I do. But I am not the one who invented them and put them into the code, in the first place - I am not the one who broke the security model already.
I am only a one who proposed getting a further advantage of checkpoints, in order to implement an ultimate purging solutions.

Maybe you just don't understand something here?
Maybe you don't understand that if the only way for a bitcoin software to work is by validating a chain up to a checkpoint that is fixed inside the very same code that validates the chain - then validating the chain up to that point is just a big waste of time, bandwidth, CPU and storage.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: Mike Hearn on July 16, 2013, 09:10:06 AM
Checkpoints are optional. If you don't want to trust them or suspect they might be wrong in some way, delete them and the code will still work in a zero trust manner.

However if old data gets thrown away completely and all you have is a checkpoint, then they aren't optional any more. That's the key difference.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: piotr_n on July 16, 2013, 09:16:06 AM
Coming from this angle, when you want to start disabling things in the source code, then everything becomes optional. :)

But seriously, I'm not criticizing checkpoints here.
Checkpoints are needed - for the very reasons that they were put in, in the first place.
I mean, they should probably be replaced with some other mechanism, something that would not be a value fixed in a source code, something which does not require upgrading the client in order to move a checkpoint, but eventually we still need the protection that they add - no argument here.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: Mike Hearn on July 16, 2013, 10:04:56 AM
Well, I'm not sure they're needed exactly. They're an additional safety measure on the grounds that defence in depth is a good idea. But Bitcoin would still work fine without them.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: TierNolan on July 16, 2013, 10:05:19 AM
Mistakes made in database structure changes or mistakes made in the p2p changes could all cause network wide failures, loss of funds, etc. So these actions must be undertaken carefully.

Actually that is a potential side/sub-project.

What about creating a new node type "block-server".  These nodes maintain historical data only.

You can ask them for blocks (and the block header chain), but not for transactions.

These nodes would download all new blocks and verify the headers and POW.  They wouldn't even have to check any of the internal data.  Any block that is part of the chain up to the most recent checkpoint is stored and any block (including orphans) that can be traced back to the checkpoint would be stored as long as they meet the POW requirements.  By keeping the checkpoint reasonably recent, overloading these servers is harder.

Writing a node like that from the ground up wouldn't be that difficult.

Once there was a reasonable number of those nodes on the network, then block pruning would be much less potentially disastrous.  Even if there is a bug and it causes a "prune" of historical data for the entire network, there would still be some nodes with everything.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: piotr_n on July 16, 2013, 10:17:00 AM
Well, I'm not sure they're needed exactly. They're an additional safety measure on the grounds that defence in depth is a good idea. But Bitcoin would still work fine without them.
Bitcoin would work, but the node's storage could easily get stuffed with bogus blocks that hook very early in the chain, thus are extremely cheap to be created.
And it's almost guaranteed that as soon as you'd switch them off, such attacks would appear.

Moreover, since we're talking about block purging.
Block purging implies that there must be some kind of judgement whether an orphaned block buried deep inside a chain is already old enough to purge it - you cannot have any purging without some sort of checkpoints that would tell you how old an orphan needs to be, to be considered too old.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: TierNolan on July 16, 2013, 10:53:32 AM
Bitcoin would work, but the node's storage could easily get stuffed with bogus blocks that hook very early in the chain, thus are extremely cheap to be created.
And it's almost guaranteed you that as soon as you'd switch them off, such attacks would appear.

Without checkpoints, it could still be reasonably safe.

The process is
- download headers first to get estimate of the longest chain
- only store forks that fork within 5k of the longest chain
- total storage limit of 1MB * (current time - genesis time) / 2.5 mins

Block deletion could happen but have a timeout of say 60 days.  If a block was stored more than 60 days ago and has not been part of the main chain at any point in that time, it can be deleted.

However, it is unlikely to be actually needed.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: piotr_n on July 16, 2013, 11:01:27 AM
The process is
- download headers first to get estimate of the longest chain
- only store forks that fork within 5k of the longest chain
- total storage limit of 1MB * (current time - genesis time) / 2.5 mins
But this is just a different way of doing checkpoints - moving one (current height minus 5K blocks), instead of values fixed in the code.
You basically define a rule that your node does not accept any fork older than 5k blocks.
Which is of course a much better solution than the current checkpoints scheme.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: TierNolan on July 16, 2013, 11:43:43 AM
But this is just a different way of doing checkpoints - moving one (current height minus 5K blocks), instead of values fixed in the code.

No, it isn't a checkpoint.  It is a filter to decide if you should store a block.  The idea is to prevent storing forks that are unlikely to be part of a valid chain.

If I have stored blocks 0 to 50k of a chain and then I find another chain that is 100k blocks long.  If I have checkpointed at 45k on the first chain, then I can't recover.

However, with a filter, I would immediately be able to store all the blocks on the new chain.

The disk space rule might actually kick in though.

However, the quota increases by 1MB every 2.5 mins.  This means that once I find the main chain (and stay locked to it), it will generate 1MB every 10 mins and but the quota will increase by 4 MB in that time, so the "window" will open.

It does raise the question of how to find the longest chain in a hostile environment.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: gmaxwell on July 17, 2013, 07:01:28 AM
What I don't understand about writing bitcoin software, that you do?
You have this patronizing attitude all the time,
I suggest you review this thread before you continue to complain about patronizing (https://bitcointalk.org/index.php?topic=256986.msg2737463#msg2737463).

When someone says "new nodes don't need the data anyways." and you respond "Indeed.", it's a little hard to not get the impression that you actually might have no clue what the Bitcoin security model is,  even when you do follow it up by, "But you know, they are very busy now with "tackling the low lying fruit" - whatever useless feature it is"— suggesting that you're just agreeing with a confused complaint and trivializing the actual nuance and complexity of the issue for the sake accusing people actually doing work on the reference client of working on useless things.

One of these explanations has lower entropy than the other, I suppose. But the trolls in the mining subforum seem to have dulled my Occam's razor.  Feel free to provide your own alternative psychoanalysis. In the meantime, I'm stuck providing the detailed technical discussion that you declined to provide, whatever your reasons.

Quote
Are you saying that I don't understand how the concept of checkpoints reduces bitcoin's security model?
You were suggesting that historic data would be made unavailable and non-validated. That isn't what checkpoints do.

Quote
Maybe you don't understand that if the only way for a bitcoin software to work is by validating a chain up to a checkpoint that is fixed inside the very same code that validates the chain - then validating the chain up to that point is just a big waste of time, bandwidth, CPU and storage.
That simply isn't correct.  If the history is validated then it cannot be forged and replaced with a false history which invents additional coins out of thin air.  Without validation any of this cheating is undetectable.  People often make crazy claims like "Satoshi embedded a billion of his own bitcoin into the system!", it's excellent that we can point out that every copy of the software verifies that this isn't so, and that any user with the source can audit it to confirm that it does. The fidelity of Bitcoin's lifetime conformance to the understood rules is check-able trivially by anyone (or by their designees). Checkpoints don't harm this, but checkpoints without the possibility of validation would destroy it.

[And here I'm writing for the sake of people who randomly find this thread in searches, I don't expect you to learn anything here you don't know:]

Checkpoints serve three main purposes:

(1)  Prevents DOS attacks against node synchronization.  Under the original synchronization behavior an attacker can send a node hundreds of gigabytes of difficulty 1 blocks (which thanks to FPGAs and asics can be very cheaply produced).  This difficulty 1 chain might— once synced far enough in _theory_ eventually have more sum diff than the real chain, so a checkpointless node can be forced to evaluate it.  Checkpoints kill this attack dead, though they aren't the only way to kill it, they're a simple and reliable one.

(2) Prevents isolation attacks against newly started nodes.  People most often talk about the Bitcoin security assumption being that most-hashpower-is-honest, but equally important is that "information is hard to stifle" assumption: Bitcoin is not secure if a network attacker can isolate you from the honest network.  A newly started node might instead be intercepted by a local network attacker which feeds it a low difficulty fantasy hashchain and tricks it into accepting payments on this alternative network, even though the attacker only has a single small GPU miner with about 0.0000001% of the network hashrate.  Checkpoints mean that a user which was reliably able to obtain the software (E.g. via another network or PKI/WOT authenticated signatures) will not be tricked by an attacker which has not put in an enormous amount of energy creating a simulated network.

(3) Simplifies discussions about the risk of mass-chain-rewrites.  Many new users in the past fixated on the risk of 'doomsday rewrites' of the chain. There is no particular reason to be concerned about these: Under any situation where these are possible the system is otherwise doomed but people do anyways, and understanding why this not a likely attack requires sitting down and multiplying out the costs. With checkpoints this concern is thoroughly moot. In particular, a new argument against doomsday rewrites arises: any would-be rewriter takes a risk that a checkpoint is adopted while he is in the process of preparing his rewrite in secret and losing all his work.  The compromise, of course, is that it creates a risk of consensus failure if ever a checkpoint is ambiguously set. But the introduction of checkpoints is transparent and they are never set close to current time or controversially, so no practical parties are worried about this.

Quote
when you want to start disabling things in the source code

If you do not like checkpoints, the reference software has a supported and advertised option to disable them -checkpoints=0, you don't need to modify the source or delete them or anything.  I, and a number of other folks run supervised nodes in this state, and it works fine— though do check to make sure you're not isolated. :)

Quote
Checkpoints are needed - for the very reasons that they were put in, in the first place.
They really aren't. We have a number of other alternatives.

I'd personally like to remove the checkpoints: Difficulty checkpointing and headers-first synchronization are superior replacements, though they are slightly more complicated to implement.  They are superior because they provide protection even far away from the checkpoints and when they are not been recently updated. Eliminating the rigid checkpoints also would allow for some better performance optimizations.

The loss of (3) would be unfortunate but in recent times the community of users is seems to be less worried about rewrites and the benefit would be the elimination of researchers who see 'checkpoints' and come to a totally erroneous conclusion about the security model.

It does raise the question of how to find the longest chain in a hostile environment.
Sync headers first, this is only 80 bytes of data per header. Set a minimum difficulty of 100k past the first point where that was achieved (which can easily be maintained by a dozen ASICs now, so there should be no prospect of a viable Bitcoin falling below that ever again) to eliminate cheap header flooding. Add a minimum sum difficulty hint to prevent isolation from being anything but a DOS unless it is also a high hashpower attack.  Even better robusness to header spamming could come from reverse header syncing (https://en.bitcoin.it/wiki/User:Gmaxwell/Reverse_header-fetching_sync) but it really wouldn't be important.





Title: Re: What is stopping pruning from being added to the qt client?
Post by: piotr_n on July 17, 2013, 08:24:49 AM
Quote
Are you saying that I don't understand how the concept of checkpoints reduces bitcoin's security model?
You were suggesting that historic data would be made unavailable and non-validated. That isn't what checkpoints do.

Not quite.
First of all, "what checkpoints do", among all the bullshit that you've lectured us about, is making sure that a node will have an exact content of UTXO db, at a specific block height.
Second, all I suggested was extending a protocol with a possibility to automatically mine into block a hash that points to moving snapshots of UTXO db. I call such a hash a "checkpoint", but feel free to call it however you like.
There is no better chain compression (and bootstrapping time improvement) than this.

In my concept, it would be totally up to a user whether he wants to get a valid UTXOs by doing it the old fashion way (starting from the genesis block and verifying through all the 250k blocks), or if he prefers to enter a hash of a block that he knows is trusted and, using the reference mined into that block, to fetch the entire content of UTXO within a few minutes.

Your great achievement, so much advanced SPV clients that nobody wants to use, isn't anyhow more secured than the solution I proposed would have been.
The only difference is that the later would at least be useful - very useful, since we see all the time people complaining about bootstrapping times and blockchain size (aka pruning).


Quote
Maybe you don't understand that if the only way for a bitcoin software to work is by validating a chain up to a checkpoint that is fixed inside the very same code that validates the chain - then validating the chain up to that point is just a big waste of time, bandwidth, CPU and storage.
That simply isn't correct.  If the history is validated then it cannot be forged and replaced with a false history which invents additional coins out of thin air.
False history that ends up at the exactly same hash? Hmmm.. Do you even have an idea what a 256 bit number is?

Stop dreaming and get real. It would be easier to brute force the private keys, since there is much more of them, rather than to find a matching tx history leading to the same hash and tricking the entire network into distributing this copy, instead of the original one.

If you think 256-bit hash is not secured enough, I have no problems with you making it 512 bits, but bear in mind that if anyone manages to reverse SHA256^2 then you can as start looking for another job, because the entire bitcoin is totally broken then  - no matter compressed, or not.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: d'aniel on July 17, 2013, 08:48:15 AM
We've got a rabid one here.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: TierNolan on July 17, 2013, 08:52:09 AM
You were suggesting that historic data would be made unavailable and non-validated. That isn't what checkpoints do.

Isn't that what happens with the main client at the moment?  Script is not verified until the client passes the last checkpoint?

Quote
(1)  Prevents DOS attacks against node synchronization.  Under the original synchronization behavior an attacker can send a node hundreds of gigabytes of difficulty 1 blocks (which thanks to FPGAs and asics can be very cheaply produced).  This difficulty 1 chain might— once synced far enough in _theory_ eventually have more sum diff than the real chain, so a checkpointless node can be forced to evaluate it.  Checkpoints kill this attack dead, though they aren't the only way to kill it, they're a simple and reliable one.

That is a 51% attack.  POW builds directly with hashing power.  Building a 1 difficulty chain with more POW than the main chain is just as hard as building a shorter chain that has more POW.

Quote
(2) Prevents isolation attacks against newly started nodes.

This could also be achieved by setting a lower bound on the total POW rather than adding a checkpoint.

I wrote in another thread (https://bitcointalk.org/index.php?topic=257298.msg2741149#msg2741149) about a change to the getheaders message.  It would add a step field, so that a node could ask for every 20000th block header.  This allows much faster detecting of dishonest nodes.

Quote
(3) Simplifies discussions about the risk of mass-chain-rewrites.  

It certainly does that.  There is a risk that a disagreement could cause a fork.

If/when there are more clients, then they should either make sure that they only ever add checkpoints after they have been agreed (either by adding to the reference client or some other consensus system).

Maybe there could be an "advisory" BIP on the topic.


Quote
I'd personally like to remove the checkpoints: Difficulty checkpointing and headers-first synchronization are superior replacements, though they are slightly more complicated to implement.  They are superior because they provide protection even far away from the checkpoints and when they are not been recently updated. Eliminating the rigid checkpoints also would allow for some better performance optimizations.

It seems that rigid checkpointing gives a performance boost as you can skip script checking.

Quote
Sync headers first, this is only 80 bytes of data per header.

As I said in the other thread, this can be improved by allowing a header step.

Ask for every 20160th block header.

Pick one at random and ask for 10 headers in steps of 2016 starting from there.  In face, maybe a count field could also be added.

Repeat with steps of 201 and 20, and 1.

That is very little data.  However, it allows the node to estimate the difficulty of the chain and also detect hostile nodes that lie about their chain.

Quote
Even better robusness to header spamming could come from reverse header syncing (https://en.bitcoin.it/wiki/User:Gmaxwell/Reverse_header-fetching_sync) but it really wouldn't be important.

That works too.  However, I think my suggestion eliminates the spam problem.  Effectively, the other node sends a claim and then your node replies with a challenge.

Also, by having fixed steps from the start, all nodes will agree.  You aren't asking one node for 245000 and another for 245001.  This allows partitioning of the nodes into sets which agree with each.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: jgarzik on July 17, 2013, 08:58:06 AM
What about creating a new node type "block-server".  These nodes maintain historical data only.

That is an existing proposal known as "archive node."



Title: Re: What is stopping pruning from being added to the qt client?
Post by: Peter Todd on July 17, 2013, 11:54:43 AM
We've got a rabid one here.

Sorry about that - piotr_n is a bot I wrote to get all the good material out of gmaxwell's brain to plagiarize for my upcoming book on crypto-coin theory, "Proof of Work Consensus" (yes I see the irony of the title)

I'll see if I can tweak the settings a bit.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: Peter Todd on July 17, 2013, 11:58:09 AM
What about creating a new node type "block-server".  These nodes maintain historical data only.

That is an existing proposal known as "archive node."

I've got another one, "partial node": http://sourceforge.net/mailarchive/message.php?msg_id=31178377


Title: Re: What is stopping pruning from being added to the qt client?
Post by: piotr_n on July 17, 2013, 01:23:58 PM
We've got a rabid one here.

Sorry about that - piotr_n is a bot I wrote to [...]
well - then congratulations, man!
as the first person in the world, you have managed to write a bot that would be smarter than his creator.
that's indeed an achievement that one could be proud of - a breakthrough on the field of AI science, I would say.

but to not get this post deleted because of an OT (which it should), let me add something, on topic.
I've tried to analyze different approaches to calculating a reliable hash of any UTXO database, and it seem to be pretty straight forward.
obviously one would want to be able to calculate the checksum, while having only the UTXO db (without a need to re-parse the entire chain).
obviously one would also want to quickly apply differential changes to the checksum, that come after applying each block.
both can be achieved simple by serializing each UTXO record, then hashing it - and XOR-ing all the hashes together to get the ID of a UTXO snapshot.
extending the hash size to i.e. 512 bits might be a reasonable option as well, though it would be worth to have some numbers to support it.
when an output is being spent, its hash is XOR-ed with the checksum again - this way it gets "removed" from the UTXO db ID.
maybe you don't even need to serialize and hash the record - maybe it would be just equally safe to use (TxID ^ Vout) as the value - EDIT: sorry, what was I thinking :)

as always, feel welcome to slap me with a constructive criticism. just please make sure to double check first what "constructive" means.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: gmaxwell on July 17, 2013, 05:12:58 PM
Isn't that what happens with the main client at the moment?  Script is not verified until the client passes the last checkpoint?
No, not quite. The expensive scriptsig validations are skipped, but inflation prevention/fees/etc are all preserved. All of it is run if checkpoint=0 is set from the commandline.  My preference, after switching to header first sync is to randomly validate signatures in old blocks in proportion to the computation burrying them... and to launch rockets if there is a failure.

Quote
Quote
(1)  Prevents DOS attacks against node synchronization.  Under the original synchronization behavior an attacker can send a node hundreds of gigabytes of difficulty 1 blocks (which thanks to FPGAs and asics can be very cheaply produced).  This difficulty 1 chain might— once synced far enough in _theory_ eventually have more sum diff than the real chain, so a checkpointless node can be forced to evaluate it.  Checkpoints kill this attack dead, though they aren't the only way to kill it, they're a simple and reliable one.
That is a 51% attack.  POW builds directly with hashing power.  Building a 1 difficulty chain with more POW than the main chain is just as hard as building a shorter chain that has more POW.
I suppose I was unclear, I'm pointing out that we must evaluate a competing chain with lots of diff 1 blocks in order to determine if it gets ahead. With the current way sync works doing this opens up a DOS attack because the data is fetched even when the chain never comes close to getting ahead.

Quote
This could also be achieved by setting a lower bound on the total POW rather than adding a checkpoint.
As my message points out…

Quote
This allows much faster detecting of dishonest nodes.
I don't see how it does. It would make your sync fail to produce connectable results in the natural and nonmalicious case of having peers on different branches and would require a bunch of code to switch back to scalar fetching from a single peer once an unconnectable result was detected.  Might be useful, but I'm not sure it would be worth the client's complexity.

Quote
If/when there are more clients, then they should either make sure that they only ever add checkpoints after they have been agreed (either by adding to the reference client or some other consensus system).
Certainly there should be none placed where there is a risk of an honest competing fork existing.

Quote
It seems that rigid checkpointing gives a performance boost as you can skip script checking.
Nah, you don't need checkpoints in order to do that. Though doing randomized checking completely requires proof of treachery.

Quote
However, it allows the node to estimate the difficulty of the chain and also detect hostile nodes that lie about their chain.
Have you seen Amiller's high hash highway stuff?


Title: Re: What is stopping pruning from being added to the qt client?
Post by: justusranvier on July 17, 2013, 05:21:39 PM
All nodes could become archives nodes which only store a fraction of the historical data.

1) Interpret the transaction hash key space as a being circular.
2) Pick a random hash value, call this the target hash.
3) Design a probability function with the shape of a bell curve which produces a value of 1 at the target hash and on average would match 1% (or any other desired fraction) of the key space.
4) Apply the probability function to the hash of any prunable transaction the node is aware of. Permanently store transactions for which the function is true and discard the rest (alternately, save all of them but use the probability function to selectively discard transactions when disk space is low).
5) As long as the total number of nodes is a sufficiently high multiple of 1/(desired fraction), and the nodes have sufficient disk space on average to hold their desired fraction of the key space, then the all of the historical transaction will probably remain retrievable.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: piotr_n on July 17, 2013, 05:25:15 PM
@justusranvier, I think its a good idea, but rather for bitcoin 3.0
IMHO, there is no way you could introduce such a system in a foreseeable future.
It just seems to complex. And people wont like it.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: gmaxwell on July 17, 2013, 05:46:51 PM
All nodes could become archives nodes which only store a fraction of the historical data.

1) Interpret the transaction hash key space as a being circular.
2) Pick a random hash value, call this the target hash.
[...]
That would have very poor locality, resulting in a lot of communications overhead. Because it would be probabilistic it would also multihop probing to find nodes that actually store a particular block.  Please think about the actual access patterns which are needed to perform useful processing on the data. For historical data, stuff beyond a few weeks old, "Pick a random starting point at least $storage from the end, keep $storage worth of block data starting there" is ~optimal considering the access patterns and a lot simpler, and still maintains the relation that so long as the aggregate storage is great enough the probability of being unable to find some of the data is negligible.

It's not necessary to cargo-cult freenet's behavior. Our task is different and fundamentally easier in some regards.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: justusranvier on July 17, 2013, 06:18:46 PM
That would have very poor locality, resulting in a lot of communications overhead. Because it would be probabilistic it would also multihop probing to find nodes that actually store a particular block.
I was suggesting this as a way of storing prunable transactions, not blocks. Why is a multihop probe unreasonable when it comes to retrieving archival data? Also, it's possible to improve the locality with path folding.

It's not necessary to cargo-cult freenet's behavior. Our task is different and fundamentally easier in some regards.
Of course it's easier, because you don't need to include strong anonymity into the feature set. Still, their data storage which is deployed and working in practise with a ~30 TB capacity solves a lot of the same problems.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: gmaxwell on July 17, 2013, 06:59:41 PM
I was suggesting this as a way of storing prunable transactions, not blocks.
There is no need in the bitcoin system for uncorrelated random access to transactions by transaction ID. It's a query thats not needed for anything in the system.

Quote
Why is a multihop probe unreasonable when it comes to retrieving archival data?
Because it's lowers reliability and makes DOS attack resistance hard. It's part of why the sybil problem is not solved in DHT's on anonymous networks. (By anonymous there I mean that anyone can come or go or connect at any point, freenet opennet is actually insecure— at least in theory).  Moreover, it's unreasonable because it doesn't actually improve anything over a simpler solution.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: justusranvier on July 17, 2013, 08:33:17 PM
There is no need in the bitcoin system for uncorrelated random access to transactions by transaction ID. It's a query thats not needed for anything in the system.
When blocks become very large it will be more efficient to download them in parallel from multiple peers. Allowing that means you've got to subdivide the blocks somehow, might as well subdivide at the transaction level.

In addition I'm assuming that all nodes are going to maintain a complete copy of the UTXO set at all times. That means if they wanted to download old blocks the only data they should need to fetch from the network is the pruned transactions from those blocks.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: gmaxwell on July 17, 2013, 08:56:45 PM
When blocks become very large it will be more efficient to download them in parallel from multiple peers. Allowing that means you've got to subdivide the blocks somehow, might as well subdivide at the transaction level.
Without debating the efficiency of parallel fetch (which is, in fact, debatable)— it's unrelated to talking about archival information— if you want hunks of the latest blocks from your peers you can simply ask them for them. If you want to think of it as a hashtable it's a D1HT, but it seems unnecessary to think of it that way. Besides, prefetching the transaction list requires a synchronous delay.  Simply asking your peers for a blind orthogonal 1/Nth of block $LATEST is more efficient.

As a toy example, if you have 8 peers and hear about a block, you could ask each peer to give you transaction n where (n+x)%8==0 where x is their peer index. If not all 8 peers have received the block you can go back and re-request their fraction from the ones that do.  Additionally, peers can filter their responses based on transactions they've already send you or seen you advertise. By grabbing blind hunks like this you avoid transmitting a transaction list (which is a substantial fraction of the total size of the transactions) and avoid round trips both to get the list and negotiate who has/doesn't have what.  Though this is something of a toy example, it's fairly close the functionality our bloom filtered getblock already provides.

Quote
In addition I'm assuming that all nodes are going to maintain a complete copy of the UTXO set at all times. That means if they wanted to download old blocks the only data they should need to fetch from the network is the pruned transactions from those blocks.
If someone has a complete and trusted UTXO set then what need would you expect them to have to fetch archival data?


Title: Re: What is stopping pruning from being added to the qt client?
Post by: TierNolan on July 17, 2013, 08:59:46 PM
Have you seen Amiller's high hash highway stuff?

No I hadn't, that's interesting.  

From what I can see, his proposal is (effectively) to store the lowest hash ever produced by the network.

In much the same way having 10 zeros proves you have done 1024 hashes, the lowest ever hash for the network shows that you have done that work.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: justusranvier on July 17, 2013, 09:00:34 PM
If someone has a complete and trusted UTXO set then what need would you expect them to have to fetch archival data?
It could be a new node which has just entered the network by downloading the UTXO set, and wants to fully validate the chain back to some point in the past to increase its assurance that it really is operating on the correct branch.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: TierNolan on July 18, 2013, 12:20:39 AM
How much flexibility have ASIC chips for header changes?  Does it have to be 80 bytes with the nonce in the exact right location or did they make them more general? 

The current scheme is

80 bytes is expanded to 128 bytes (2 64 byte blocks)

The header is broken up (indexes are in bytes)

Block 1: <version[3:0]> <prev block[31:0]> <merkle[31:4]>   = 64 bytes
Block 2: <merkle[3:0]> <timestamp[3:0]> <bits[3:0]> <nonce[3:0]> <zeros[39:0]> <length[7:0]> =

The first block doesn't contain the nonce, so doesn't change as the counter runs.

This means that they need to perform the following.

First SHA256 pass

sha256(block one) (can be pre-calculated)
sha256(block two)

result: 32 bytes

Second SHA256 pass
sha256(32 bytes rounded up to 1 block)

They are unlikely to have the first block as flexible, since that would be unnecessary hardware for the SHA round.

For the 2nd block, there are a few ways to do it.

Allowing the counter to be moved means they would need to add multiplexors to slide the nonce.  This creates an incentive to not include that flexibility.  However, it might be worth it.

Changing the non-nonce inputs into the 2nd block would just require flip flops, so it is a low cost.  There might be some logic optimisations that can't be performed if those bits aren't hardwired to zero.  This would seem to be well worth it, since they are maintaining forward compatibility.

The second stage isn't affected by the exact nature of the header, so could support a header change.

There are 40 bytes remaining in the header, before the first hash would require 3 rounds.  That puts a maximum limit of 120 bytes for the block header and a rule preventing the nonce from being moved.  However, adding a nonce2 would probably be ok.  It would only be a problem if the communication speed from controller to ASIC was so slow that 4 billion hashes could be completed before the next nonce2 was sent.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: TierNolan on July 18, 2013, 04:26:10 PM
(Re: Amiller's high hash highway)

No I hadn't, that's interesting.  

Ok, reading more.  The rule is that you have 2 links, one to the previous block and one to the best (lowest) hash before previous that beats previous.  He uses number of leading zeros rather than has value for some reason.  He has a big discussion that there is probably only 1 "best" hash in that context.  Using the full value means that you are pretty much guaranteed that.

If 100 billion hashes are performed, then each hash has an equal chance of being the best.  The fact that there is a difficulty threshold means some never become block headers, but the best ones always will, so it doesn't affect the result.

This means that the best hash so far (ignoring the previous block for some reason) will on average be 50% of the way back, in terms of POW.  It splits the chain into 2 equally important halves.  Each step backwards would decrease total POW by on average 50%, so it takes log(total POW / POW(genesis)) steps to get back the genesis block.

Blocks that don't have the 2nd link would have to be ignored (so assumed to have a worthless (infinitely high) hash).  So, blocks after 270k wouldn't be allowed to link to blocks before 270k and 270k would link to the genesis block.  Effectively, the blocks from 1 to 269999 would be considered to have an infinite hash.

[edit]

Reading even more.  The reason for taking zeros is so that collisions can occur.

If the highest hash has 50 zeros, then there will be 2 hashes with 49 and 4 hashes with 48 and so on.

The "up" link goes to the most recent block with more zeros than this block (or the genesis block, if none).

If the current block had 47 zeros, then you could find all 48 zero blocks following the chain

Code:
while (block != genesis)
  if (block.zeros == 48) {
    addToList(block)
    block = block.prev;
  } else if (block.zeros < 48) {
    block = block.up;
  } else {
    block = block.prev;
}

If the current block has less than 48, you go upwards.  This will skip blocks with zeros less than or equal to the current block, so will never skip a 48 block.

If the current block has 48, then add it to the list and go to the previous block.

If the current block has more than 48, you need to go to the previous block too, since using the up link could cause 48's to be skipped.

This can be used to find a chain of each type of difficulty.  This gives more points, so less variance to work out total difficulty.

However, even just showing the 50 zero header would be strong evidence of the total hashing power.  The up link would point directly to the genesis block.  It would also show the minimum work against that genesis block.


Title: Re: What is stopping pruning from being added to the qt client?
Post by: mustyoshi on July 21, 2013, 05:38:43 AM
I am starting to think the only way to prune effectively is to create an archive node, and such a rule for lightnodes that they always keep a connection or route to an archive node.

Transactions' Inputs would only be pruned after the coinbase in their block was spendable, that would be the rule so that all lightnodes have the same pruned chain.

When a new node connected they would either find an archive node and start the whole process, or query lightnodes to determine a consensus of the legitimacy of a pruned block. Using the archival node would be preferred due to it being verifiable. To ensure archival nodes exist in sufficient quantities, perhaps there could be a system to buy access to them? Thus making them an avenue for profit.