Pruning of old blocks from disk

domob (OP)

Legendary

Offline

Activity: 1135
Merit: 1166

Pruning of old blocks from disk

August 30, 2014, 04:19:54 PM

I wonder whether it may be a good idea to give users the option (not mandatory and not by default) to remove old blocks from disk (the undo and block files). This could allow them to run "full nodes" with much less required disk space - but still the same trustlessness as a real full node.

From digging into the code, I found these places where the files are used/read:

Undo files:

ConnectBlock (written) and DisconnectBlock (read): The obvious places, but they only need "recent" files (at most until the last checkpoint).
VerifyDB in a high level: Removing old blocks will obviously prevent us from verifying their integrity, but that is no problem.

At least for the undo files, it should be straight-forward to allow removing old ones without sacrificing any functionality at all. It would save quite some disk space.

Block files:

ConnectTip/DisconnectTip: They only need recent ones, so removing "old" blocks (before the last checkpoint) should be fine.
-reindex and friends, -printblock, getblock RPC, PrintBlockTree: Obviously, but these are "extra functionality" that is not necessary to run the node. Users who want to use them should be sophisticated enough to know what they are doing when deciding to prune blocks from disk.
VerifyDB in a high level: Same as with the undo files.
getrawtransaction (without tx index) and rescanning for wallet transactions: This is also functionality that is not strictly necessary to run the node. Again, users who prune blocks should know what they are doing and that this may compromise the ability to import private keys and the like.
Processing network requests for blocks: See below.

In my opinion, the only "critical" functionality that is removed when old blocks are deleted is the ability to send those blocks to other nodes while these are trying to bootstrap. If not too many nodes prune blocks and/or not all blocks are pruned but instead only a random fraction, it should still be possible for other nodes to bootstrap. In addition, more networking logic could be added (if asked for a block that has been removed, ask your peers for it and optionally store it again).

Another issue is the how of removing blocks from disk. The simplest way is probably to remove whole block files, since otherwise we need to add a system that tracks empty places in the block files to reuse them for later blocks. This seems like overcomplication.

What do you think of these ideas, are they worthwhile or not? Are there already ideas about how to do networking (bootstrapping) when nodes may remove blocks from disk? I'm willing to work on this if the idea is appreciated and if the consensus is that such incomplete nodes would not be too bad for the network. (After all, it may lead to more people running "full" nodes if less disk is required!)

Use your Namecoin identity as OpenID: https://nameid.org/
Donations: 1domobKsPZ5cWk2kXssD8p8ES1qffGUCm | NMC: NCdomobcmcmVdxC5yxMitojQ4tvAtv99pY
BM-GtQnWM3vcdorfqpKXsmfHQ4rVYPG5pKS | GPG 0xA7330737

Allphonic

Newbie

Offline

Activity: 7
Merit: 0

Re: Pruning of old blocks from disk

August 30, 2014, 05:30:16 PM

This is a great idea, I don't like having all of that huge data on my computer.
But, if everyone removes old blocks, just a few will have the complete blockchain, and the initial sync will take ages (it's already too slow...)

gmaxwell

Moderator
Legendary

Offline

Activity: 4186
Merit: 8424

Re: Pruning of old blocks from disk

August 30, 2014, 06:09:59 PM

The reason the blocks are accessed in so few places now is because 0.8 did most of the work for this change, it just didn't carry through all the way to the deletion part.

The deletion itself is already implemented in patches staged in github which will probably be merged soon. It requires turning off the node-network service bit as failing to do so would be very detrimental to the network (cause new nodes to be unable to bootstrap in reasonable time). It's also important to not remove the most recent blocks, since they are potentially needed for reorganization.

What I hope to ultimately do here is have a knob where you can set the amount of space you'd like to use (subject to a minimum that is at least enough for the most recent blocks and the chainstate), the host would then pick a uniformly random sequence of block ranges to keep, up to their maximum. Bootstrapping could then continue normally even if no single peer were available which had all the blocks. The caching approach you mention doesn't seem to make a lot of sense to me, since access to very old blocks is uniformly probable... basing what you save on requests opens stupid dos attacks where some goes and fetches block 66,666 from all nodes over and over again to make that one block super popular to the expense of all others.

If you'd like to work on this, I might suggest starting from 4481+4468 and work on making it so that the depth that undo files are kept is different from block files (e.g. keep undo for 2016 blocks and blocks for 288) and make it so if blocks are needed for reorg beyond the block limit it can go re-fetch them.

Another path would be working on improving key-birthday support in the wallet— e.g. comprehensive support for dates on imported keys, and support for rescanning ranges where you don't have the blocks using bloom filtering.

A third path is working on the encoding for signaling sparse blocks— I'd like to see it so that nodes have a compact random seed (e.g. could just be 32 bits) where knowing the current height, seed, and number of block ranges, you'd know which ranges a node has (e.g. working like how selecting N uniform random entries over large data doesn't need more then N storage). I've got one solution, but it requires O(n) computation with the current height per peer to figure out what ranges a peer currently has, and perhaps better is possible. My thinking here is that we'd allocate a service bit that says "I keep the last 2016 blocks plus some sparse range" and a network message to advertise the sparse range seed on connect. (Later, we'd create a new addr message that lets you just include the seed directly).

domob (OP)

Legendary

Offline

Activity: 1135
Merit: 1166

Re: Pruning of old blocks from disk

August 31, 2014, 09:52:14 AM

Quote from: gmaxwell on August 30, 2014, 06:09:59 PM

The reason the blocks are accessed in so few places now is because 0.8 did most of the work for this change, it just didn't carry through all the way to the deletion part.

Yep, I know - the switch to UTXO set for tx verification removes the need to access block files except for the places mentioned.

Quote from: gmaxwell on August 30, 2014, 06:09:59 PM

The deletion itself is already implemented in patches staged in github which will probably be merged soon. It requires turning off the node-network service bit as failing to do so would be very detrimental to the network (cause new nodes to be unable to bootstrap in reasonable time). It's also important to not remove the most recent blocks, since they are potentially needed for reorganization.

Thanks for pointing this out - part of my motivation to post here was exactly to find out whether someone already works on this or not.

Do you know whether someone also intents to work on the later ideas you mention below? I don't want to duplicate any work, but I'm interested in working on this set of features since I believe it is both very useful and also interesting.

Quote from: gmaxwell on August 30, 2014, 06:09:59 PM

What I hope to ultimately do here is have a knob where you can set the amount of space you'd like to use (subject to a minimum that is at least enough for the most recent blocks and the chainstate), the host would then pick a uniformly random sequence of block ranges to keep, up to their maximum. Bootstrapping could then continue normally even if no single peer were available which had all the blocks. The caching approach you mention doesn't seem to make a lot of sense to me, since access to very old blocks is uniformly probable... basing what you save on requests opens stupid dos attacks where some goes and fetches block 66,666 from all nodes over and over again to make that one block super popular to the expense of all others.

Yes, I know that the caching idea has lots of problems. I also thought about randomly removing blocks so that all nodes together will have the full history available even if each node only has a fraction - but I didn't go as far as you did with the "predicatable" block ranges. I like this idea very much! If you are trying to bootstrap and currently have no connected peers which have a particular block, would you randomly reconnect to new peers until you get one? Or implement a method to explicitly ask the network "who has this block range"?

Quote from: gmaxwell on August 30, 2014, 06:09:59 PM

My opinion is that the undo files are a good starting point - they are only used for chain reorgs, so the networking issues are not present with them. If you keep just the last couple of blocks in them (or to the last checkpoint), you should have exactly the same functionality as currently, right? So why not remove them before the last checkpoint without turning off the "network" service flag, or even do this always (not only when enabled explicitly)? This seems like the logical first step to me (unless there are problems I'm missing with that approach).

Quote from: gmaxwell on August 30, 2014, 06:09:59 PM

A third path is working on the encoding for signaling sparse blocks— I'd like to see it so that nodes have a compact random seed (e.g. could just be 32 bits) where knowing the current height, seed, and number of block ranges, you'd know which ranges a node has (e.g. working like how selecting N uniform random entries over large data doesn't need more then N storage). I've got one solution, but it requires O(n) computation with the current height per peer to figure out what ranges a peer currently has, and perhaps better is possible. My thinking here is that we'd allocate a service bit that says "I keep the last 2016 blocks plus some sparse range" and a network message to advertise the sparse range seed on connect. (Later, we'd create a new addr message that lets you just include the seed directly).

This sounds like an interesting task to me. What about this approach: Use some seed to initialise a PRNG (might be something simple like linear congruence), then create uniformly distributed (on some interval) random numbers N1, N2, ... and choose sizes for the block ranges S1, S2, ... (might be chosen such that Si blocks form one block file of 17 MiB). Then store these ranges:

[N1, N1 + S1),
[N1 + S1 + N2, N1 + S1 + N2 + S2),
...

Is this the O(n) approach you mention? I don't think that O(n) with n being the block height is too bad. However, I think that it is in theory possible to choose the ranges for Ni as well as the values for Si in such a way that you get O(log n):

Ni randomly in [2^i, 2^(i+1)),
Si = 2^i

By tuning the base of the exponential or introducing some factors, it should be possible to tune the stored fraction of blocks (I've not done the math, this is just a rough idea). I don't know whether exponentially rising sizes of the ranges have any detrimental effect on the statistical properties of which blocks are kept how often by the network. Maybe something more clever is also possible.

Use your Namecoin identity as OpenID: https://nameid.org/
Donations: 1domobKsPZ5cWk2kXssD8p8ES1qffGUCm | NMC: NCdomobcmcmVdxC5yxMitojQ4tvAtv99pY
BM-GtQnWM3vcdorfqpKXsmfHQ4rVYPG5pKS | GPG 0xA7330737

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Pruning of old blocks from disk

August 31, 2014, 12:12:22 PM

Quote from: gmaxwell on August 30, 2014, 06:09:59 PM

My thinking here is that we'd allocate a service bit that says "I keep the last 2016 blocks plus some sparse range" and a network message to advertise the sparse range seed on connect. (Later, we'd create a new addr message that lets you just include the seed directly).

Why not just store blocks at regular intervals? What is the benefit of making it (pseudo) random?

For example, an offset and size could be included in the services field. With 16 bits, you could specify an 8 bit size and an 8 bit offset.

A node would keep 256 blocks out of every (SIZE * 256) at an offset of (SIZE * OFFSET) (and also the last 2016 blocks).

That would allow nodes to indicate that they only store 1/255 of the chain. Nodes that want to store less couldn't be supported. Setting size to all zeros would mean that it doesn't store sparse blocks.

At maximum size (255), the node would store 256 blocks out of every 65280.

This means that a node always stores blocks in 256 block "runs" which make it easier for downloading.

Some bits could be saved by making the SIZE field be a power of 2. With 3 bits, it would still support all powers of 2 up to 256.

The address manager could be updated so that it makes sure that it has 3-4 nodes to cover each block, when discarding addresses. The "getaddr" message could be adjusted so you can request a particular block.

There is a BIP for custom services.

The author also included a draft for custom services discovery, but never included it as a BIP.

You send the name of the service and then 1 bit per entry in the addr list. If there are lots of entries in the addr message, then the cost of the service name is extremely low per entry. It also supported RLE, but that is less likely to help here. He proposed sending his custom message immediately prior to sending an addr message, so the addr message didn't have to be adjusted.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

gmaxwell

Moderator
Legendary

Offline

Activity: 4186
Merit: 8424

Re: Pruning of old blocks from disk

September 02, 2014, 08:39:13 AM

Quote

Why not just store blocks at regular intervals? What is the benefit of making it (pseudo) random?

I have X gigabytes today that I want to contribute to the Bitcoin network. I'd like it to use all of that X as soon as possible.

But I don't want to be in a situation where as the network grows the distribution of blocks stored becomes non-uniform. E.g. because I used up my space and then more blocks came into existence I'm over-heavy in old blocks... or where my locality becomes poor (creating connection overhead for peers having to fetch only a few blocks per connection).

I don't want to have to rely on trying to measure my peers to find out what sections are missing and need storing, since an attacker could falsely make some areas of the chain seem over represented in order to make them unrepresented.

Eventually, if sparse nodes are to be useful for nodes in their IBD you'll want to learn about what ranges nodes support before you try connecting to them, so I don't want to have to signal more than a few bytes of data to indicate what ranges I'm storing, or have to continually keep updating peers about what I'm storing as time goes on... as that would require a lot of bandwidth.

I want to be able to increase or decrease my storage without totally changing the blocks I'm storing or biasing the selection. (though obviously that would require some communication to indicate the change).

So those are at least the motivations I've mostly been thinking about there.

Quote

I don't think anyone is coding on it. There has been a bunch of discussion in the past, so at least for my part I have an idea of what I want to see done eventually (the stuff I outlined).

Pieter posted on bitcoin-development about service bits for this... though that wasn't talking about sparse blocks support, but just bits indicating that you can some amount of the most recent blocks. (He also did some measurements which showed that the request probability became pretty uniform after about 2000 blocks deep, obviously recent blocks were much more frequently requested due to nodes being offline for just a few days/weeks).

Quote

My thinking is that initially you'll have to connect to a node to find out what ranges it has (e.g. we'll just add a handshake for it to the p2p protocol) and you'd just disconnect peers that weren't useful to you while also learning who has what (e.g. if you need blocks 1000-2000 but find 100000-110000 you'll know where to go later), but latter the address message should be revised so that it could carry the data with it. So your peers will tell you about what nodes are likely to have the blocks you need.

Quote

My opinion is that the undo files are a good starting point - they are only used for chain reorgs, so the networking issues are not present with them. If you keep just the last couple of blocks in them (or to the last checkpoint), you should have exactly the same functionality as currently, right? So why not remove them before the last checkpoint without turning off the "network" service flag, or even do this always (not only when enabled explicitly)? This seems like the logical first step to me (unless there are problems I'm missing with that approach).

We're (At least Sipa and I) planning on removing checkpoints almost completely after headers first is merged. They are really damaging to people's reasoning about the security model, and headers first removes practically actual use of them. (Instead they'd be converted into a "chainwork sufficiency" number, e.g. how much work should the best chain have, and something to put the wallet into safe-mode and throw an error if a large reorg happens).

Undo files take up comparatively little space, all of them right now amount to only about two and a half gigs... and unlike the blocks, if you do want them again you can't just go fetch them.

In general I'm concerned about proscribing any particular depth beyond which a node should refuse to reorg— any at all, or at least not any that is remotely short. The risk is that if someone does create a reorg at that limit they can totally split the consensus. (e.g. let half the nodes get one block that crosses the threshold then announce the reorg). Doubly so when you consider corner cases like the initial block download, e.g. where you get a node stuck on a bogus chain (though that would be part of the idea behind a sufficiency test).

Quote

Is this the O(n) approach you mention? I don't think that O(n) with n being the block height is too bad.

Yep, thats basically it. The only reason I don't really like O(n) is perhaps your addr database has 100,000 entries— 3/4 of which are bogus garbage put out by broken or malicious nodes. Now you need to do a fair amount of work just to figure out which peers you should be trying.

Quote

However, I think that it is in theory possible to choose the ranges for Ni as well as the values for Si in such a way that you get O(log n):

Ni randomly in [2^i, 2^(i+1)),
Si = 2^i

By tuning the base of the exponential or introducing some factors, it should be possible to tune the stored fraction of blocks (I've not done the math, this is just a rough idea). I don't know whether exponentially rising sizes of the ranges have any detrimental effect on the statistical properties of which blocks are kept how often by the network. Maybe something more clever is also possible.

Oh interesting. I got stuck trying to come up with O(1) that I didn't think about log shaped solutions. I'll give that some thought...

I think in general we want to be able to claim that if the node seeds are uniformly distributed, that we expect the block distribution to approach uniform at all times as the node count approaches infinite, regardless of how the size is set. In practice I expect (and hope) that there will be a fair number of nodes with really large size settings (e.g. all the blocks) which will help even out any non-uniformity— really, in my mind most of the reason for supporting sparse nodes is to make sure everyone can contribute at whatever level they want, and to make sure that those who do choose to contribute a lot of storage aren't totally saturated with requests.

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Pruning of old blocks from disk

September 02, 2014, 10:10:11 PM

Quote from: gmaxwell on September 02, 2014, 08:39:13 AM

Quote

Why not just store blocks at regular intervals? What is the benefit of making it (pseudo) random?

If SIZE was a power of 2, then you could be within 50% of X GB at all times. Once you have > X GB of data stored, you double SIZE. This causes half the data to be deleted.

The node could select an 8-bit OFFSET at the start and just keep using that.

The OFFSET could be multiplied by 256 to give the actual offset.

SIZE = 256 means storing all blocks
SIZE = 512 means storing every second "run" of 256 blocks
SIZE = 1024 means storing every 4th "run" of 256 blocks

For example, with OFFSET set to 3 and SIZE set to 256 initially;

Run 0 (0 - 255)
Run 1 (256 - 511)
Run 2 (512 - 767)
Run 3 (768 - 1023)
....

Once SIZE is doubled, then all even runs are deleted and not stored in the future.

~~Run 0 (0 - 255)~~
Run 1 (256 - 511)
~~Run 2 (512 - 767)~~
Run 3 (768 - 1023)

and then only runs with index 3 mod 4

~~Run 0 (0 - 255)~~
~~Run 1 (256 - 511)~~
~~Run 2 (512 - 767)~~
Run 3 (768 - 1023)

The maximum size would be 65536, which would mean storing one "run" of 256 out of every 65536 blocks (or 1/256 of all the blocks).

With 16 bits, the OFFSET could be increased to 12 bits and the SIZE stored as the power. The runs be increased to 4096 blocks and nodes could indicate that they will store 1/4096 of the blocks.

Quote

I don't want to have to rely on trying to measure my peers to find out what sections are missing and need storing, since an attacker could falsely make some areas of the chain seem over represented in order to make them unrepresented.

This would be random based on the OFFSET being random, so there is no need to know what other nodes are doing. Since there would only be 256 (or 4096) possible offsets, then it is less likely that some blocks would be lost.

Quote

Interesting. Are you planning to have a maximum block count heuristic? For example, a peer could still spam with lots of low difficulty headers. This is obviously less problematic than when they could send 1MB block spam.

The rule could be that you don't accept more than 1 header chain per peer you connect to. If they try to re-org before they reach the minimum chainwork, then they are probably either not fully caught up themselves or are spamming you, so you should download the header chain from another peer.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

domob (OP)

Legendary

Offline

Activity: 1135
Merit: 1166

Re: Pruning of old blocks from disk

September 08, 2014, 07:36:05 AM

One issue with my proposal (and also the other one by TierNolan) is that in the simplest form, this allows to configure the fraction of the chain you are willing to store, rather than an absolute storage size. I think gmaxwell prefers the latter, and I also think the latter is a better thing from the user-experience point-of-view. I think that my proposal with logarithmic complexity can be tuned to give a configurable disk size (or at least a configurable absolute number of blocks to store) by making the numbers Si not only grow with i, but also decrease with the total chain height.

Another issue is that the exponentially growing size of both block ranges kept and "holes" left makes me somewhat "uneasy" - but it amortises up to keeping the desired fraction (or maybe absolute block count). Do you think there are practical disadvantages to have this structure with growing sizes Si?

Do you want me to do some math and work out more details about how to choose the Si and what statistical properties my proposal has? Is it worth the effort, and a possible solution to implement?

Use your Namecoin identity as OpenID: https://nameid.org/
Donations: 1domobKsPZ5cWk2kXssD8p8ES1qffGUCm | NMC: NCdomobcmcmVdxC5yxMitojQ4tvAtv99pY
BM-GtQnWM3vcdorfqpKXsmfHQ4rVYPG5pKS | GPG 0xA7330737

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Pruning of old blocks from disk

September 09, 2014, 12:09:41 AM

Quote from: domob on September 08, 2014, 07:36:05 AM

The point is that you can just keep halving. If you have a limit of 500MB and you are storing 25%, then once it hits 500MB, drop your coverage to 12.5% and immediately, you can delete 50% of what you stored and drop the disk usage to 250MB.

Quote

Another issue is that the exponentially growing size of both block ranges kept and "holes" left makes me somewhat "uneasy" - but it amortises up to keeping the desired fraction (or maybe absolute block count). Do you think there are practical disadvantages to have this structure with growing sizes Si?

If you are only willing to store 1%, then there are going to be holes. The reason to keep contiguous blocks that they can be downloaded and verified as a unit.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

domob (OP)

Legendary

Offline

Activity: 1135
Merit: 1166

Re: Pruning of old blocks from disk

September 09, 2014, 05:19:59 AM

#10

Quote from: TierNolan on September 09, 2014, 12:09:41 AM

Quote from: domob on September 08, 2014, 07:36:05 AM

Quote

If you are only willing to store 1%, then there are going to be holes. The reason to keep contiguous blocks that they can be downloaded and verified as a unit.

Yes. I think that these points are not big issues, just wanted to point them out.

Use your Namecoin identity as OpenID: https://nameid.org/
Donations: 1domobKsPZ5cWk2kXssD8p8ES1qffGUCm | NMC: NCdomobcmcmVdxC5yxMitojQ4tvAtv99pY
BM-GtQnWM3vcdorfqpKXsmfHQ4rVYPG5pKS | GPG 0xA7330737

Pages: [1]

Bitcoin Forum > Bitcoin > Development & Technical Discussion > Pruning of old blocks from disk

« previous topic next topic »