jgarzik (OP)
Legendary
Offline
Activity: 1596
Merit: 1100
|
|
November 25, 2010, 07:07:51 AM |
|
It appears that blk0001.dat, where bitcoin stores block chain information, is compatible across Windows, Linux, 32-bit and 64-bit.
Therefore, why not save new users some time by shipping blocks 1-74000 with each release?
Presumably, indexing and verifying a local file would be faster, and use fewer network resources, than downloading all those blocks via P2P.
|
Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own. Visit bloq.com / metronome.io Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
|
|
|
wumpus
|
|
November 25, 2010, 01:37:13 PM |
|
Huh, isn't P2P supposed to be faster because you can download from many users at once instead of one source? (also the reason why some gaming companies use bittorrent to distribute updates)
|
Bitcoin Core developer [PGP] Warning: For most, coin loss is a larger risk than coin theft. A disk can die any time. Regularly back up your wallet through File → Backup Wallet to an external storage or the (encrypted!) cloud. Use a separate offline wallet for storing larger amounts.
|
|
|
RHorning
|
|
November 25, 2010, 02:25:37 PM |
|
I have mixed feelings about this. Part of the problem is that there is perceived to be a free good, a network hosting service in the form of Source Forge, which will certainly allow those performing software releases to include considerably more data than is currently the case for Bitcoins. If somehow we were paying for this service as a community in terms of $$$ per MiB, I think it would be a no brainer that this should stay out of the distribution. Unfortunately for this consideration, it is a free good from the perspective of most users. The other issue is that the network bandwidth between nodes is also a free good. I've suggested in this thread that perhaps the presumption of network bandwidth may also not be considered a free good either. In fact, I believe that it shouldn't be the case, but that is a completely separate issue entirely. The network bandwidth for downloading the blocks is to me a wash either way, although a new client coming "on-line" trying to get the full block chain does suck up a whole bunch of blocks through the Bitcoin network and that impacts anybody who happens to be connected to those nodes. BTW, this is one of the reasons I think it would be incredibly useful to start "charging" for bandwidth as a means to discourage this behavior... and of course to earn a few extra Bitcoins on the side. If you can obtain blocks "free" from another source, some people might get more creative on how to get that accomplished including downloading a second package on some free file hosting service (perhaps included with the main client distributions) or coming up with a scheme on how to bootstrap new clients that impacts the network in a less obtrusive fashion. I guess what I'm saying is that while this is a simple solution to a complex problem, it doesn't solve all of the problems including perhaps clients which may store the block data in another format. There also isn't any apparent reason to necessarily encourage other software client distros to include this kind of data or for that matter to put in more than the most minimum number of blocks. Still, raising the issue is useful here and I hope it raises a discussion about the problem. Huh, isn't P2P supposed to be faster because you can download from many users at once instead of one source? (also the reason why some gaming companies use bittorrent to distribute updates)
I agree it seems very odd that you would take something which is by its nature distributed through P2P channels and instead put it into a conventional client-server distribution model. Part of why I'm saying that perhaps more thought ought to go into this is perhaps to encourage a bittorrent distribution connection of some sort for a large collection of blocks if somebody has had their client off for awhile or some other kind of experimentation on how to solve this same problem. The problem is that new clients are demanding the whole block chain and really can't get into "mining" or confirming new transactions until they have that chain. Let's solve that problem, which is a larger issue. The other issue is that it seems like a waste of bandwidth to include these blocks in a client when all you are doing is updating the software. I would be just as worried that the block chain might get wiped out by the installation software with this "older" version of the chain, forcing older clients to update to the current block all over again, although this is certainly an installation bug. Just because it is a free good doesn't imply there are no other consequences to going this route.
|
|
|
|
wumpus
|
|
November 25, 2010, 02:31:34 PM |
|
Indeed, shipping the data with the client is just a kludge.
If the main reason that the bitcoin P2P protocol is so inefficient in transferring large amounts of blocks, that should be fixed. I think that's because of the HDD syncing going on. Maybe this should be held off for the initial download, or the protocol should be made more bittorrent-like for the blocks [0..last-10000], as they are basically set in stone.
|
Bitcoin Core developer [PGP] Warning: For most, coin loss is a larger risk than coin theft. A disk can die any time. Regularly back up your wallet through File → Backup Wallet to an external storage or the (encrypted!) cloud. Use a separate offline wallet for storing larger amounts.
|
|
|
satoshi
Founder
Sr. Member
Offline
Activity: 364
Merit: 7424
|
|
November 25, 2010, 05:51:39 PM |
|
It's not the downloading that takes the time, it's verifying and indexing it.
Bandwidthwise, it's more efficient than if you downloaded an archive. Bitcoin only downloads the data in blk0001.dat, which is currently 55MB, and builds blkindex.dat itself, which is 47MB. Building blkindex.dat is what causes all the disk activity.
During the block download, it only flushes the database to disk every 500 blocks. You may see the block count pause at ??499 and ??999. That's when it's flushing.
Doing your own verifying and indexing is the only way to be sure your index data is secure. If you copy blk0001.dat and blkindex.dat from an untrusted source, there's no way to know if you can trust all the contents in them.
Maybe Berkeley DB has some tweaks we can make to enable or increase cache memory.
|
|
|
|
MrFlibble
Newbie
Offline
Activity: 25
Merit: 0
|
|
November 25, 2010, 11:19:55 PM |
|
My first reaction was "+1 for fast setup", but most of the 24hr delay I suffered was local disc. Disabling fsync (?) on the database while in catch-up mode would help the most. Huh, isn't P2P supposed to be faster because you can download from many users at once instead of one source? (also the reason why some gaming companies use bittorrent to distribute updates)
Good point. But since the sha-256 of the block is wired into the code, it is perfectly reasonable to ship the data too. When the blockchain is over 500meg, I think transfer efficiency will become important. We have options, - ship blockchain from SF until it's not politely within their AUP, then re-evaluate. I couldn't find a file size limit, even for the project website service (only a quick surf of their docs).
- ship 'small' binaries from SF, and 'large' releases with data via BitTorrent
- ship 'small' release, including the .torrent for the blockchain and a fetcher script. This looks for one of three popular command line BitTorrent clients for the platform and uses that to fetch the chain, or whinge if it can't.
http://sourceforge.net/apps/trac/sourceforge/wiki/Developer%20web says Note: All file releases should be a single file. Multiple files for the same release should be archived together (tar, deb, zip, etc.). We recommend using rsync for all uploads over 20 megabytes in size, as rsync allows for resuming canceled or interrupted transfers.
Hmm, shipping the blockchain for each binary arch would be perverse. Then, who provides the tracker & seed for the data? Someone with incentive or community spirit? Well, this forum+wiki seem to live on http://www.slicehost.com/ => min $20/month. It could probably share without hurting the website, and (I think) the seed could be severely throttled to make other BT seeds pull more weight.
|
|
|
|
jgarzik (OP)
Legendary
Offline
Activity: 1596
Merit: 1100
|
|
November 26, 2010, 01:47:43 AM |
|
It's not the downloading that takes the time, it's verifying and indexing it.
This is not true of many novice users, who say things like "well it took several hours to catch all the 90 000 blocks but finally it arrived" (quoted from one new user, on IRC, today). Bandwidthwise, it's more efficient than if you downloaded an archive.
Agreed. Compressed in an archive, blk0001.dat is around 36MB. Bitcoin only downloads the data in blk0001.dat, which is currently 55MB, and builds blkindex.dat itself, which is 47MB. Building blkindex.dat is what causes all the disk activity.
During the block download, it only flushes the database to disk every 500 blocks. You may see the block count pause at ??499 and ??999. That's when it's flushing.
It remains the download, not the verification, that has the highest variability of experience, where first time users see a delay of 30 minutes to several hours before the software is actually usable. Some P2P nodes may be extremely slow (I see high variability in latency and throughput for old blocks, and blocks larger than 512 bytes). End user bandwidth may be low, spotty or expensive. Firewalls are often a problem. I'm betting that the above complaint from a new user was due to a Microsoft firewall; but the point stands: large variance of network configuration and capability implies the P2P download impact may be far, far greater than impact of on-disk verification of 90,000 blocks. Doing your own verifying and indexing is the only way to be sure your index data is secure. If you copy blk0001.dat and blkindex.dat from an untrusted source, there's no way to know if you can trust all the contents in them.
Who said untrusted? The proposal is that you distribute blk0001.dat (and only blk0001.dat) in the bitcoin.org official client downloads. And of course the client will spend some time verifying blk0001.dat upon first use. This is unavoidable, and nobody has proposed changing or eliminating verification. Just shipping blk0001.dat with official bitcoin would eliminate several headaches that new bitcoin users continue to experience.
|
Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own. Visit bloq.com / metronome.io Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
|
|
|
jgarzik (OP)
Legendary
Offline
Activity: 1596
Merit: 1100
|
|
November 26, 2010, 02:07:43 AM |
|
Maybe Berkeley DB has some tweaks we can make to enable or increase cache memory.
Which of the ACID properties do you need, while downloading? Adding BDB records is simply appending to a log file, until you issue a checkpoint. The checkpoint then updates the main database file. Under a normal BDB transaction, you are guaranteed that each log record will be sync'd to disk platter, before the transaction commit succeeds. This is very strict, but required for full ACID. Enabling DB_TXN_NOSYNC still gives you a lot: "database integrity will be maintained, but if the application or system fails, it is possible some number of the most recently committed transactions may be undone during recovery" bitcoin can obviously recover if recent transactions are undone, so, it seems useful for this flag to be set for 100% of the initial block download. That leaves checkpointing, which is a balance between amount of work performed at checkpoint time -- number of records that must be copied from log to database file -- and wall clock time. Just gotta try some values and see what "feels" right -- maybe checkpoint every 10,000 blocks?
|
Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own. Visit bloq.com / metronome.io Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
|
|
|
satoshi
Founder
Sr. Member
Offline
Activity: 364
Merit: 7424
|
|
November 26, 2010, 05:32:01 PM |
|
I tested it on a slow 7 year old drive, where bandwidth and CPU were clearly not the bottleneck. Initial download took 1 hour 20 minutes. If it's taking a lot longer than that, certainly 24 hours, then it must be downloading from a very slow node, or your connection is much slower than around 15KB per sec (120kbps), or something else is wrong. It would be nice to know what appears to be the bottleneck when that happens. Every 10 minutes or so when the latest block is sent, it should have the chance to change to a faster node. When the latest block is broadcast, it requests the next 500 blocks from other nodes, and continues the download from the one that sends it fastest. At least, that's how it should work. Maybe Berkeley DB has some tweaks we can make to enable or increase cache memory.
Which of the ACID properties do you need, while downloading? It may only need more read caching. It has to read randomly all over blk0001.dat and blkindex.dat to index. It can't assume the file is smaller than memory, although it currently still is. Caching would be effective, since most dependencies are recent. Someone should experiment with different Berkeley DB settings and see if there's something that makes the download substantially faster. If something substantial is discovered, then we can work out the particulars. Adding BDB records is simply appending to a log file, until you issue a checkpoint. The checkpoint then updates the main database file. We checkpoint every 500 blocks.
|
|
|
|
RHorning
|
|
November 26, 2010, 05:42:17 PM |
|
Who said untrusted? The proposal is that you distribute blk0001.dat (and only blk0001.dat) in the bitcoin.org official client downloads. And of course the client will spend some time verifying blk0001.dat upon first use. This is unavoidable, and nobody has proposed changing or eliminating verification.
Just shipping blk0001.dat with official bitcoin would eliminate several headaches that new bitcoin users continue to experience.
My personal suggestion is to have the block data as a separate download, but strongly recommended. If you want to simplify the installation for Windows users and otherwise clueless computer users that can't take a block file of this nature and put it into the correct directory, perhaps setting up a formal installation file to put it where it needs to go would be more "user friendly", but all it really has to contain is just the block data. The purpose of this is mainly so those who are updating to a new version can do so without having to also keep downloading the same block data, which by definition is going to grow over time.
|
|
|
|
zipslack
Newbie
Offline
Activity: 43
Merit: 0
|
|
November 26, 2010, 06:08:40 PM |
|
The purpose of this is mainly so those who are updating to a new version can do so without having to also keep downloading the same block data, which by definition is going to grow over time.
I'm not sure how it is for you, but when I upgrade Bitcoin I don't have to re-download any blocks. It just picks up right where it left off before the upgrade.
|
|
|
|
RHorning
|
|
November 26, 2010, 07:17:25 PM |
|
The purpose of this is mainly so those who are updating to a new version can do so without having to also keep downloading the same block data, which by definition is going to grow over time.
I'm not sure how it is for you, but when I upgrade Bitcoin I don't have to re-download any blocks. It just picks up right where it left off before the upgrade. That is the point. If the blocks are included in the update it would also by definition include blocks you already have obtained via the network. This is why I'm suggesting that it ought to be a separate but strongly recommended download for new users instead of something combined in the normal distros.
|
|
|
|
jgarzik (OP)
Legendary
Offline
Activity: 1596
Merit: 1100
|
|
November 28, 2010, 02:33:29 AM |
|
Another new user on IRC, Linux this time, was downloading at a rate of 1 block every 4 seconds -- estimated total download time around 4 days.
Other commenters in this thread are correct that upgrading users don't need a block database... but something needs to be done to improve the initial block download experience for new users. Improve the database all you want.. you'll still have peers giving you blocks slowly for any number of reasons.
We have the hashes for genesis block through block 74000 hardcoded (compiled) into bitcoin, so there's no reason why we shouldn't be able to automatically download a compressed zipfile of the block database from anywhere, unpack it, verify it, and start running.
|
Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own. Visit bloq.com / metronome.io Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
|
|
|
tyler
Newbie
Offline
Activity: 56
Merit: 0
|
|
November 28, 2010, 07:23:04 AM |
|
Other commenters in this thread are correct that upgrading users don't need a block database... but something needs to be done to improve the initial block download experience for new users. Improve the database all you want.. you'll still have peers giving you blocks slowly for any number of reasons.
*something* needs to be done, the block chain will be *huge* in the next year or so, correct?
|
|
|
|
jgarzik (OP)
Legendary
Offline
Activity: 1596
Merit: 1100
|
|
November 28, 2010, 07:33:55 AM |
|
Other commenters in this thread are correct that upgrading users don't need a block database... but something needs to be done to improve the initial block download experience for new users. Improve the database all you want.. you'll still have peers giving you blocks slowly for any number of reasons.
*something* needs to be done, the block chain will be *huge* in the next year or so, correct? Yes, correct. Presumably at some point there will be a lightweight client that only downloads block headers, but there will still be hundreds of thousands of those...
|
Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own. Visit bloq.com / metronome.io Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
|
|
|
zipslack
Newbie
Offline
Activity: 43
Merit: 0
|
|
November 28, 2010, 08:53:00 AM |
|
This is why I'm suggesting that it ought to be a separate but strongly recommended download for new users instead of something combined in the normal distros.
Sorry, I misunderstood you. We have the hashes for genesis block through block 74000 hardcoded (compiled) into bitcoin, so there's no reason why we shouldn't be able to automatically download a compressed zipfile of the block database from anywhere, unpack it, verify it, and start running.
I suppose you are referring to the checkpoints? If so, as I understand it, they are only applied while verifying a block which has been downloaded. The contents of blk0001.dat and blkindex.dat are never checked by the client, because the client is designed to check that data before it gets written to those files. As satoshi indicated in this thread, Doing your own verifying and indexing is the only way to be sure your index data is secure. If you copy blk0001.dat and blkindex.dat from an untrusted source, there's no way to know if you can trust all the contents in them.
|
|
|
|
jgarzik (OP)
Legendary
Offline
Activity: 1596
Merit: 1100
|
|
November 28, 2010, 10:02:22 AM Last edit: November 28, 2010, 10:21:26 AM by jgarzik |
|
I suppose you are referring to the checkpoints? If so, as I understand it, they are only applied while verifying a block which has been downloaded. The contents of blk0001.dat and blkindex.dat are never checked by the client, because the client is designed to check that data before it gets written to those files.
Not quite true. "-checkblocks" (CheckBlock()) performs quite a few checks on the contents of blk0001.dat / blkindex.dat. AcceptBlock() does a bit more, adding context, but not much more. But let's ignore that for the moment. I think a more important point you're missing is that nobody is proposing that verification be skipped. The bitcoin code is quite capable of verifying and indexing untrusted blk0001.dat data. It would just need a few modifications to behave sensibly if blkindex.dat is missing. The proposal is simply: don't download massive amounts of uncompressed data using a protocol (bitcoin P2P) that wasn't designed for bulk data transfer. As satoshi indicated in this thread, Doing your own verifying and indexing is the only way to be sure your index data is secure. If you copy blk0001.dat and blkindex.dat from an untrusted source, there's no way to know if you can trust all the contents in them.
The client is clearly capable of verifying the cryptographic integrity of blk0001.dat from an untrusted source, because it does that for blocks coming in over the network, and blk0001.dat contains... serialized blocks originally received from untrusted sources over the network. It does not seem overly difficult to pass in blk0001.dat file position data to ProcessBlock(), and simply skip the WriteToDisk() storage call in downstream callee AcceptBlock().
|
Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own. Visit bloq.com / metronome.io Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
|
|
|
RHorning
|
|
November 28, 2010, 04:09:46 PM |
|
The client is clearly capable of verifying the cryptographic integrity of blk0001.dat from an untrusted source, because it does that for blocks coming in over the network, and blk0001.dat contains... serialized blocks originally received from untrusted sources over the network.
It does not seem overly difficult to pass in blk0001.dat file position data to ProcessBlock(), and simply skip the WriteToDisk() storage call in downstream callee AcceptBlock()
Unless I'm mistaken here, what this implies is that at the moment the "official" client presumes that blk0001.dat contains validated data, so if you download that data from another source which may have been compromised, at the moment there is no way to verify this information. This is but a temporary danger to be aware of while the software attempts to cope with this particular issue. On the other hand, somebody could also put into the UI or as a command-line switch on bitcoind some sort of "reverifications" of the block data which would be performed locally. I think there are other applications for this including perhaps as a precaution against some virus on your computer manipulating data in the block chain where this would be useful anyway, but it seems like an option which ought to be added to the software. Since the verification code is already in the software, it is merely setting up the algorithm and triggering mechanism to perform that verification. Indeed if there is a particular block which is of concern during the verification process, an effort to "heal" the chain based upon block requests to peer nodes could be used to fix potential errors or even discard the whole chain. I hope such a feature eventually is added.
|
|
|
|
MoonShadow
Legendary
Offline
Activity: 1708
Merit: 1010
|
|
November 28, 2010, 04:25:15 PM |
|
My understanding was that the client already did a blockchain recheck upon startup if the index was missing. I did this when I first started, and it sure seemed like it was marching through the chain. Doesn't it require an index to function anyway? Why would it assume that the blockchain was valid upon startup? Anyone could have edited it. The genesis block is encoded into the client, isn't it? That and the blockchain checkpoints are the only parts that are assumed correct, or am I wrong? There is no good reason to prevent a blockchain download via other methods. In a future with the bitcoin network running close to it's capacity, downloading the entire blockchain over the P2P network will be harmful.
Even a chain that has already been pruned of it's merkle trees should be able to be verified from the start, otherwise what good is using a merkle tree at all?
|
"The powers of financial capitalism had another far-reaching aim, nothing less than to create a world system of financial control in private hands able to dominate the political system of each country and the economy of the world as a whole. This system was to be controlled in a feudalist fashion by the central banks of the world acting in concert, by secret agreements arrived at in frequent meetings and conferences. The apex of the systems was to be the Bank for International Settlements in Basel, Switzerland, a private bank owned and controlled by the world's central banks which were themselves private corporations. Each central bank...sought to dominate its government by its ability to control Treasury loans, to manipulate foreign exchanges, to influence the level of economic activity in the country, and to influence cooperative politicians by subsequent economic rewards in the business world."
- Carroll Quigley, CFR member, mentor to Bill Clinton, from 'Tragedy And Hope'
|
|
|
satoshi
Founder
Sr. Member
Offline
Activity: 364
Merit: 7424
|
|
November 28, 2010, 05:13:01 PM Last edit: November 28, 2010, 08:07:59 PM by satoshi |
|
Despite everything else said, the current next step is: Someone should experiment with different Berkeley DB settings and see if there's something that makes the download substantially faster. If something substantial is discovered, then we can work out the particulars.
In particular, I suspect that more read caching might help a lot. Another new user on IRC, Linux this time, was downloading at a rate of 1 block every 4 seconds -- estimated total download time around 4 days.
Then something more specific was wrong. That's not due to normal initial download time. Without more details, it can't be diagnosed. If it was due to slow download, did it speed up after 10-20 minutes when the next block broadcast should have made it switch to a faster source? debug.log might have clues. How fast is their Internet connection? Was it steadily slow, or just slow down at one point? We have the hashes for genesis block through block 74000 hardcoded (compiled) into bitcoin, so there's no reason why we shouldn't be able to automatically download a compressed zipfile of the block database from anywhere, unpack it, verify it, and start running.
The 74000 checkpoint is not enough to protect you, and does nothing if the download is already past 74000. -checkblocks does more, but is still easily defeated. You still must trust the supplier of the zipfile. If there was a "verify it" step, that would take as long as the current normal initial download, in which it is the indexing, not the data download, that is the bottleneck. Presumably at some point there will be a lightweight client that only downloads block headers, but there will still be hundreds of thousands of those...
80 bytes per header and no indexing work. Might take 1 minute. uncompressed data using a protocol (bitcoin P2P) that wasn't designed for bulk data transfer.
The data is mostly hashes and keys and signatures that are uncompressible. The speed of initial download is not a reflection of the bulk data transfer rate of the protocol. The gating factor is the indexing while it downloads.
|
|
|
|
|