Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: TierNolan on June 30, 2011, 11:08:43 AM



Title: P2P block download - use bit torrent like method
Post by: TierNolan on June 30, 2011, 11:08:43 AM
At the moment, the block chain is downloaded linearly.  This places a large load on the other node and it isn't possible for the downloader to pay back the other node.

A better method would be to download the blocks like in bittorrent.

Download all the headers. 

This should be fast, since it is 80 bytes per block. 

80 bytes every 10 minutes is around 4MB per year.

Download missing blocks

Since you have all the headers, you can ask for any 500 blocks that you want.

You could download blocks 1000-1500 from a seed and another peer could download 500-1000 from a seed.  You could then share the blocks with each other.  This greatly reduces the load on the seeding computers.

Blocks near the end of the chain are more valuable, since they are more likely to have active transactions.  The random selection could be biased towards trying to get those first.


Title: Re: P2P block download - use bit torrent like method
Post by: XIU on June 30, 2011, 11:21:33 AM
Isn't the reason that the blocks have to verified linear anyway? As in part of the hash is the result of the previous hash?


Title: Re: P2P block download - use bit torrent like method
Post by: TierNolan on June 30, 2011, 11:25:43 AM
Isn't the reason that the blocks have to verified linear anyway? As in part of the hash is the result of the previous hash?

Mostly, it is the block chain itself that needs to be verified.  Once you have that, you can distinguish fake blocks from real blocks.

Blocks that have inputs connected to blocks that you haven't downloaded could be tagged as pending until the corresponding input block is downloaded.

However, any block that is more than 100-200 blocks away from the last block in the chain is almost certain to be valid and doesn't really need to be checked.


Title: Re: P2P block download - use bit torrent like method
Post by: just_someguy on June 30, 2011, 12:49:55 PM
Quote
Mostly, it is the block chain itself that needs to be verified.  Once you have that, you can distinguish fake blocks from real blocks.

Unless you are doing simple payment verification you can't verify a block until you have all previous blocks.

You can't know if anything in the block you are examining is a double spend unless you possess the data where those double spends would exist.
If you are downloading the headers with the intention of downloading all the full blocks then you might as well just download the full blocks since you are going to do the same work twice.




Title: Re: P2P block download - use bit torrent like method
Post by: TierNolan on June 30, 2011, 01:32:22 PM
If you are downloading the headers with the intention of downloading all the full blocks then you might as well just download the full blocks since you are going to do the same work twice.

That isn't entirely true.  You could download the chain and then download the last 2 weeks worth of blocks.  If anyone sent you money the last 2 weeks, you could verify the transaction directly.

By verify the block, I just meant confirm that the block is in the chain.

Also, the main proposal was to improve download speed and reduce strain on the seed nodes/miners.

Like in bittorrent, if one client downloads blocks 1-1000 and the other downloads 1001-2000, they can then swap blocks with each other and not download from the seed.  This means that the seed only has to send the chain once.

In fact, it would be equivalent to just using bit-torrent to download the chain and then rescanning it.  I am not sure if -rescan checks all the transactions though.


Title: Re: P2P block download - use bit torrent like method
Post by: gmaxwell on June 30, 2011, 02:03:09 PM
Also, the main proposal was to improve download speed and reduce strain on the seed nodes/miners.

Since the overwhelming majority of all the nodes already have all the blocks, it doesn't matter which blocks you fetch... except: making it sparse would add the  overhead of having to communicate what blocks you do/don't have.

The reasons that it's slow right now doesn't appear have anything to do with the downloading external to bitcoin.  One reason its slow, for example, is that downloading a run of 500 blocks is frequently enough to trigger a flood protection disconnect now. So a new node spends a lot of time being disconnected over and over again from its neighbors, and when it reconnects it wastes a lot of time doing addr.dat exchange.  It also may get itself flooded off all the competent neighbors...







Title: Re: P2P block download - use bit torrent like method
Post by: bitcool on June 30, 2011, 02:11:59 PM
I too think the speed of block chain downloading is an issue.
Even you want to verify all blocks in the chain, can this be a two step process -- download all blocks concurrently first, and then verify the chain afterward? Just wondering.


Title: Re: P2P block download - use bit torrent like method
Post by: TierNolan on June 30, 2011, 02:17:42 PM
I too think the speed of block chain downloading is an issue.
Even you want to verify all blocks in the chain, can this be a two step process -- download all blocks concurrently first, and then verify the chain afterward? Just wondering.

Exactly.  300MB takes a few hours, no matter how fast your connection because it is p2p, but not torrent like.

However, if you download just the headers first (80 *130k = 10MB), you can at least verify that the blocks you get are real.

All nodes, including client nodes, would likely have the headers so you could p2p them too.

The main point is to download the chain itself from the last block and work backwards.  For most people, they will only need blocks from the last few weeks anyway.


Title: Re: P2P block download - use bit torrent like method
Post by: patvarilly on June 30, 2011, 03:41:48 PM
Isn't the reason that the blocks have to verified linear anyway? As in part of the hash is the result of the previous hash?

The hash is calculated solely based on the 80 bytes of the block header (which contains the hash of the previous block), and not the contents of the block (https://en.bitcoin.it/wiki/Protocol_specification#Block_Headers).  So you can verify the integrity of the block chain using only the headers.  If you get block headers for which you don't have the previous block's header, e.g., not linearly, they'll be stored in memory as orphan chains.  Eventually, you'll get a block that allows you to link the orphan chains to the main block chain.  As is pointed out in the posts, it might be much faster to download lots of small parts of the chain from lots of peers separately, and then join them together on the client side, because the individual peer-to-peer speed might be quite limited.


Title: Re: P2P block download - use bit torrent like method
Post by: TierNolan on June 30, 2011, 03:46:59 PM
As is pointed out in the posts, it might be much faster to download lots of small parts of the chain from lots of peers separately, and then join them together on the client side, because the individual peer-to-peer speed might be quite limited.

Also, it means that there is no sharing, which is key to p2p (at least bit torrent).

The only people who want blocks from you are behind you in the chain.  You give them blocks, but they never have any blocks you need.

This means that it is best to just be a pure leecher.


Title: Re: P2P block download - use bit torrent like method
Post by: just_someguy on June 30, 2011, 05:15:22 PM
Quote
The hash is calculated solely based on the 80 bytes of the block header (which contains the hash of the previous block), and not the contents of the block

That's a bit misleading.
Part of the 80 bytes is the merkle root of the transactions which is essentially the contents of the block.


Title: Re: P2P block download - use bit torrent like method
Post by: TierNolan on June 30, 2011, 05:49:42 PM
That's a bit misleading.
Part of the 80 bytes is the merkle root of the transactions which is essentially the contents of the block.

True, however, for the purposes of verifying that the headers form a valid chain, the Merkle root might as well be a random number.

It is used to confirm that the information contained in the blocks is valid.


Title: Re: P2P block download - use bit torrent like method
Post by: kjj on June 30, 2011, 07:02:21 PM
I'm pretty sure that verification time is the dominant factor when fetching a large number of blocks.  For each block, the client verifies each transaction.  That is a lot of work.


Title: Re: P2P block download - use bit torrent like method
Post by: TierNolan on June 30, 2011, 09:43:15 PM
I'm pretty sure that verification time is the dominant factor when fetching a large number of blocks.  For each block, the client verifies each transaction.  That is a lot of work.

I could see that being true in the future, but it shouldn't be a massive deal at the moment.  The client should still try to download the blocks as fast as possible anyway.

I have noticed that the client uses a lot of swap space though.


Title: Re: P2P block download - use bit torrent like method
Post by: TierNolan on June 30, 2011, 09:53:31 PM
I am downloading the chain again, and the process is at 90% CPU, so I guess I was wrong about the bottle-neck.


Title: Re: P2P block download - use bit torrent like method
Post by: just_someguy on June 30, 2011, 09:54:46 PM
Quote
True, however, for the purposes of verifying that the headers form a valid chain, the Merkle root might as well be a random number.

It is used to confirm that the information contained in the blocks is valid.

If you aren't verifying anything or looking for transactions that are relevant to you then you might as well just download the block chain all at once:
http://bitcoin.bluematt.me/bitcoin-nightly/blockchain-nightly/


Title: Re: P2P block download - use bit torrent like method
Post by: TierNolan on June 30, 2011, 09:56:01 PM
If you aren't verifying anything or looking for transactions that are relevant to you then you might as well just download the block chain all at once:
http://bitcoin.bluematt.me/bitcoin-nightly/blockchain-nightly/

Does the -rescan parameter just scan for transactions, or does it check the chain for validity?


Title: Re: P2P block download - use bit torrent like method
Post by: just_someguy on June 30, 2011, 10:11:36 PM
It will rescan for transactions.... but if the chain download lies to you and gives you an alternate chain you won't know.
It will become apparent though as soon as you get no new blocks.


Title: Re: P2P block download - use bit torrent like method
Post by: TierNolan on June 30, 2011, 10:25:09 PM
It will rescan for transactions.... but if the chain download lies to you and gives you an alternate chain you won't know.
It will become apparent though as soon as you get no new blocks.

Ahh, ok.  I think there needs to be a -verifychain command.


Title: Re: P2P block download - use bit torrent like method
Post by: deepceleron on July 14, 2011, 09:55:50 PM
I'm going to bump this because I used search and knew I couldn't be the first to think of this.

Bitcoin does need aggressive torrent-like out-of-order block downloading, especially on new client install or a restart after not being used for a while. If instead of requesting block 1, then block 2, etc, I request a simultaneous 500 blocks from 500 nodes I am going to get them much quicker. Think of this as a torrent-like look ahead. My client can start verifying it's way up the block chain as it starts assembling them and adding them to the block database, and if it doesn't get a few blocks after an expected time or finds invalid blocks, it can request them again from several clients.

Alternately, it could get bitcoin_blockchain_120000.zip or similar from a trusted source upon new installation.

Bittorrent can get me data as fast as my connection, why should it take half a day to get 200MB on Bitcoin?


Title: Re: P2P block download - use bit torrent like method
Post by: JoelKatz on July 14, 2011, 10:16:35 PM
Bittorrent can get me data as fast as my connection, why should it take half a day to get 200MB on Bitcoin?
Because you are performing multiple cryptographic operations to validate every single byte of those 200MB. For each block received, you must at least:

1) Check that the hash of the block meets the target.
2) Check that the block correctly links to the preceding block.
3) Calculate the Merkle tree for all the transactions and verify it against the block header.
4) Retrieve every transaction whose outputs are claimed by transactions in this block.
5) Check the difficulty against the block count and timestamps.
6) For every input of most transactions, you must check that the public key has the correct hash and that the signature is correct.

I picked a random block (136,301) and did the math, it's about 500 signatures and 800 hashes for that one block. A typical mid-range modern CPU can verify 800 signatures a second and compute 300,000 hashes per second. (Including the overhead to set up each operation. You can do them faster in a tight brainless loop where the code stays in caches, that's unrealistic.) So processing this one block would take about half a second. (In fairness, this was an unusually large block.)

Now you're trying to do this 136,000 times. That's 18 hours at 2 blocks per second.


Title: Re: P2P block download - use bit torrent like method
Post by: TierNolan on July 14, 2011, 10:58:20 PM
Now you're trying to do this 136,000 times. That's 18 hours at 2 blocks per second.

You could just trust any block that is more than 1000 blocks behind the head of the chain.

All other blocks could be verified by just checking that the hashes match.

If security is really required, then you could do the verification slowly once the entire chain has been downloaded.


Title: Re: P2P block download - use bit torrent like method
Post by: JoelKatz on July 15, 2011, 12:21:33 AM
Now you're trying to do this 136,000 times. That's 18 hours at 2 blocks per second.

You could just trust any block that is more than 1000 blocks behind the head of the chain.
That creates a chicken and egg problem. You can't know the correct block X until you know the correct block X-1. So saying "if it's 1,000 blocks behind a valid block, it's valid" won't help.

Suppose there are 136,000 blocks in the chain. You receive, from an untrusted peer, a block that purports to be block 53,000. How can you tell it's more than 1,000 blocks behind the block 136,000 that you don't even know exists yet because you have no way to verify it?

Quote
If security is really required, then you could do the verification slowly once the entire chain has been downloaded.
So then why bother to download the blocks really quickly? There's nothing you can do with an unverified block -- you don't know it's valid.


Title: Re: P2P block download - use bit torrent like method
Post by: TierNolan on July 15, 2011, 12:31:57 AM
That creates a chicken and egg problem. You can't know the correct block X until you know the correct block X-1. So saying "if it's 1,000 blocks behind a valid block, it's valid" won't help.

The block chain can be verified first without verifying the transactions contained in it.  Only the 80 byte block header is required to verify that each header is part of a chain.

Quote
Suppose there are 136,000 blocks in the chain. You receive, from an untrusted peer, a block that purports to be block 53,000. How can you tell it's more than 1,000 blocks behind the block 136,000 that you don't even know exists yet because you have no way to verify it?

If you receive 100 different chains from 100 different peers, you can still check which one is the longest chain.  That gets you all the block headers for the strongest chain.

You can use that to verify where each block slots into the sequence of blocks.

If someone sends you money you can verify that it hasn't been double spent by seeing if it is included in the block chain, with 6 confirms, it isn't likely to be a double spend.


Title: Re: P2P block download - use bit torrent like method
Post by: deepceleron on July 15, 2011, 12:58:16 AM
Now you're trying to do this 136,000 times. That's 18 hours at 2 blocks per second.

You could just trust any block that is more than 1000 blocks behind the head of the chain.
That creates a chicken and egg problem. You can't know the correct block X until you know the correct block X-1. So saying "if it's 1,000 blocks behind a valid block, it's valid" won't help.

Suppose there are 136,000 blocks in the chain. You receive, from an untrusted peer, a block that purports to be block 53,000. How can you tell it's more than 1,000 blocks behind the block 136,000 that you don't even know exists yet because you have no way to verify it?

Quote
If security is really required, then you could do the verification slowly once the entire chain has been downloaded.
So then why bother to download the blocks really quickly? There's nothing you can do with an unverified block -- you don't know it's valid.


I'm suggesting having a deep queue of the next blocks ready for verification so the only bottleneck is processing them. If after verifying block #100 I have to hit the p2p network again, request block #101, and wait for the client I request a block from to respond (or not) and send it to me, that's adding that much more latency per block. Block #101 from the network is as likely to be valid if I get it on-demand, or if I've already downloaded blocks 102-500 too. If I've filled my queue and still have network speed to burn, I could have even requested block #101 three times and MD5'd them or have a pool of potential blocks in case I need to sort out which is the longest block chain (although this is silly, the chance of picking up in-the-wild invalid early blocks seems so low that we can almost assume everything we get is good and ignore optimizing the queue around this possibility).

A basic proof to see if there is utility would be to bootstrap a fresh client on a local-only network with several nodes and see how close you can peg CPU to 100% (this still has the client idle requesting the next block). How long does it take to get all the blocks? Then do the same on the P2P network. If it takes longer, there is room for improvement.



Title: Re: P2P block download - use bit torrent like method
Post by: TierNolan on July 15, 2011, 08:43:05 AM
I'm suggesting having a deep queue of the next blocks ready for verification so the only bottleneck is processing them. If after verifying block #100 I have to hit the p2p network again, request block #101, and wait for the client I request a block from to respond (or not) and send it to me, that's adding that much more latency per block. Block #101 from the network is as likely to be valid if I get it on-demand, or if I've already downloaded blocks 102-500 too.

The current system requests blocks in groups of up to 500 blocks at a time.  It possibly does download -> verify -> download -> verify.

The key point is that once you have the block headers, you can download the full blocks in any order.  You can verify any  block by checking that the header matches and that the merkle root is correct.  This proves that it is a valid part of the chain.  This can be done without checking any of the signatures.


Title: Re: P2P block download - use bit torrent like method
Post by: JoelKatz on July 15, 2011, 02:00:39 PM
That creates a chicken and egg problem. You can't know the correct block X until you know the correct block X-1. So saying "if it's 1,000 blocks behind a valid block, it's valid" won't help.

The block chain can be verified first without verifying the transactions contained in it.  Only the 80 byte block header is required to verify that each header is part of a chain.
Right, but so what? Just because the header is valid you cannot conclude the block is valid.

Quote
Quote
Suppose there are 136,000 blocks in the chain. You receive, from an untrusted peer, a block that purports to be block 53,000. How can you tell it's more than 1,000 blocks behind the block 136,000 that you don't even know exists yet because you have no way to verify it?

If you receive 100 different chains from 100 different peers, you can still check which one is the longest chain.  That gets you all the block headers for the strongest chain.
The "longest chain" test only applies among valid chains. A longer chain that is not valid is not strongest.

Quote
If someone sends you money you can verify that it hasn't been double spent by seeing if it is included in the block chain, with 6 confirms, it isn't likely to be a double spend.
Again, chicken and egg problem. You can't tell if it's included in the block chain with 6 confirms until you know which chain is the valid one, and that requires checking the transactions.


Title: Re: P2P block download - use bit torrent like method
Post by: TierNolan on July 15, 2011, 03:23:30 PM
Right, but so what? Just because the header is valid you cannot conclude the block is valid.

The further from the end of the chain that a block occurs, the more likely it is to be valid.  If you accept a transactions after 6 confirms, then you should accept a block that is 6+ back along the block chain.

To fake it, you need > 50% of the hashing power of the network since the point where you fork it.

Quote
The "longest chain" test only applies among valid chains. A longer chain that is not valid is not strongest.

The chain "timestamp"/certification process is secured by the hashing power of the network not by the digital signatures on the transactions.

If you were able to generate an alternative chain that was longer than the main chain, then you could just generate a chain which is 100% coinbase transactions all payable to you.

As long as a chain starts from the genesis block, has valid timestamps (increasing properly and not more than 2 hours in the future), correctly calculated difficulty updates and all hashes meet the difficulty target, then it can be assumed to be a valid chain.  This can be confirmed entirely by looking at the headers.