Bitcoin Forum
September 15, 2024, 12:30:34 PM *
News: Latest Bitcoin Core release: 27.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2]  All
  Print  
Author Topic: RFC: ship block chain 1-74000 with release tarballs?  (Read 5013 times)
jgarzik (OP)
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1099


View Profile
November 28, 2010, 06:59:49 PM
Merited by ABCbits (1)
 #21

If there was a "verify it" step, that would take as long as the current initial download, in which it is the indexing, not the data download, that is the bottleneck.
[...]
The speed of initial download is not a reflection of the bulk data transfer rate of the protocol.  The gating factor is the indexing while it downloads.

Sorry, these users' disk and CPU were not at 100%.  It is clear the bottleneck is not the database or indexing, for many users.

Quote
The data is mostly hashes and keys and signatures that are uncompressible.

bzip2 gives you 33% compression ratio, saving many megabytes off a download:

Code:
[jgarzik@bd data]$ tar cvf /tmp/1.tar blk0001.dat
blk0001.dat

[jgarzik@bd data]$ tar cvf /tmp/2.tar blk*.dat
blk0001.dat
blkindex.dat

[jgarzik@bd data]$ bzip2 -9v /tmp/[12].tar
  /tmp/1.tar:  1.523:1,  5.253 bits/byte, 34.34% saved, 55439360 in, 36402074 out.
  /tmp/2.tar:  1.512:1,  5.291 bits/byte, 33.86% saved, 103690240 in, 68577642 out.

I wouldn't call 33% "uncompressible"

Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
theymos
Administrator
Legendary
*
Offline Offline

Activity: 5306
Merit: 13296


View Profile
November 28, 2010, 10:34:53 PM
 #22

Sorry, these users' disk and CPU were not at 100%.  It is clear the bottleneck is not the database or indexing, for many users.

It seemed to me that it was some sort of disk problem or network condition on his end. Some selected quotes from my IRC log:
Quote
<manveru> also, when i woke up, there were thousands of entries in the debug.log that look like: trying connection  lastseen=-135.6hrs lasttry=-358582.4hrs
<theymos> How many connections do you have?
<manveru> 2 right now
<theymos> How many blocks do you have?
<manveru> blockcount and blocknumber are 29124
<theymos> How fast is that increasing?
<manveru> around 1 every 4 seconds
<jgarzik> manveru: 32-bit or 64-bit linux?
<manveru> 64
<manveru> now 'blkindex.dat flush' takes a few minutes :|
<manveru> still hangs on flush
<theymos> manveru: Are you on some network file system?
<manveru> no, just a normal harddisk
<manveru> it's only 5200 rpm though

Also, replacing the blocks might have prevented him from noticing a transaction:
Quote
<manveru> jgarzik: sent me the blocks, but it didn't change my balance
<MT`AwAy> manveru: in your getinfo you're at block 94236 ?
<manveru> yeah

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
jgarzik (OP)
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1099


View Profile
November 29, 2010, 07:01:12 PM
Last edit: November 29, 2010, 08:10:15 PM by jgarzik
 #23

Building blkindex.dat is what causes all the disk activity.
[...]
Maybe Berkeley DB has some tweaks we can make to enable or increase cache memory.

The following code in AddToBlockIndex(main.cpp) is horribly inefficient, and dramatically slows initial block download:

Code:
   CTxDB txdb;
    txdb.WriteBlockIndex(CDiskBlockIndex(pindexNew));

    // New best
    if (pindexNew->bnChainWork > bnBestChainWork)
        if (!SetBestChain(txdb, pindexNew))
            return false;

    txdb.Close();

This makes it impossible to use a standard technique for loading large amounts of records into a database (db4 or SQL or otherwise):  wrap multiple record insertions into a single database transaction.  Ideally, bitcoin would only issue a TxnCommit() for each 1000 blocks or so, during initial block download.  If a crash occurs, the database remains in a consistent state.

Furthermore, database open + close for each new block is incredibly expensive.  For each database-open and database-close operation, db4
  • diagnose health of database, to determine if recovery is needed.  this test may require data copying.
  • re-init memory pools
  • read database file metadata
  • acquire file locks
  • read and initialize b-tree or hash-specific metadata.  build hash table / b-tree roots.
  • forces a sync, even if transactions called with DB_TXN_NOSYNC
  • fsync memory pool

And, additionally, bitcoin forces a database checkpoint, pushing all transactions from log into main database.

That's right, that long list of operations is executed per-database (DB), not per-environment (DB_ENV), for a database close+open cycle.  To bitcoin, that means we do this for every new block.  Incredibly inefficient, and not how db4 was designed to be used.

Recommendations:

1) bitcoin should be opening databases, not just environment, at program startup, and closing database at program shutdown.  db4 is designed to handle crashes, if proper transactional use is maintained -- and bitcoin already uses db4 transactions properly.

2) For the initial block download, txn commit should occur once every N records, not every record.  I suggest N=1000.



EDIT:  Updated a couple minor details, and corrected some typos.

Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
satoshi
Founder
Sr. Member
*
qt
Offline Offline

Activity: 364
Merit: 7065


View Profile
November 29, 2010, 08:19:12 PM
Last edit: November 29, 2010, 08:53:12 PM by satoshi
 #24

It seems like you're inclined to assume everything is wrong more than is actually so.

Writing the block index is light work.  Building the tx index is much more random access per block.  I suspect reading all the prev txins is what's slow.  Read caching would help that.  It's best if the DB does that.  Maybe it has a setting for how much cache memory to use.

Quote
1) bitcoin should be opening databases, not just environment, at program startup, and closing database at program shutdown.
Already does that.  See CDB.  The lifetime of the (for instance) CTxDB object is only to support database transactions and to know if anything is still using the database at shutdown.

Quote
And, additionally, bitcoin forces a database checkpoint, pushing all transactions from log into main database.
If it was doing that it would be much slower.  It's supposed to be only once a minute or 500 blocks:

    if (strFile == "blkindex.dat" && IsInitialBlockDownload() && nBestHeight % 500 != 0)
        nMinutes = 1;
    dbenv.txn_checkpoint(0, nMinutes, 0);

Probably should add this:
    if (!fReadOnly)
        dbenv.txn_checkpoint(0, nMinutes, 0);

Quote
2) For the initial block download, txn commit should occur once every N records, not every record.  I suggest N=1000.
Does transaction commit imply flush?  That seems surprising to me.  I assume a database op wrapped in a transaction would be logged like any other database op.  Many database applications need to wrap almost every pair of ops in a transaction, such as moving money from one account to another. (debit a, credit b)  I can't imagine they're required to batch all their stuff up themselves.

In the following cases, would case 1 flush once and case 2 flush twice?

case 1:
write
write
write
write
checkpoint

case 2:
begin transaction
write
write
commit transaction
begin transaction
write
write
commit transaction
checkpoint

Contorting our database usage will not be the right approach.  It's going to be BDB settings and caching.
jgarzik (OP)
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1099


View Profile
November 29, 2010, 09:00:42 PM
 #25

Yeah, I missed the database-open caching buried in all the C++ constructors.  Major red herring, sorry about that.

db4 cache control is http://download.oracle.com/docs/cd/E17076_01/html/api_reference/CXX/dbset_cachesize.html

Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
jgarzik (OP)
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1099


View Profile
November 30, 2010, 12:06:57 AM
Last edit: November 30, 2010, 12:27:59 AM by jgarzik
 #26

I instrumented my import using the -initblock=FILE patch posted last night, putting printf tracepoints in TxnBegin, TxnCommit, TxnAbort, Read and Write:

Code:
ProcessBlock: ACCEPTED
CDB::Write()
DB4: txn_begin
CDB::Write()
CDB::Write()
CDB::Write()
DB4: txn_commit
SetBestChain: new best=000000005b5c1859db19  height=1751  work=7524897523416
ProcessBlock: ACCEPTED
CDB::Write()
DB4: txn_begin
CDB::Write()
CDB::Write()
CDB::Write()
DB4: txn_commit
SetBestChain: new best=00000000f396ab6b62ba  height=1752  work=7529192556249
ProcessBlock: ACCEPTED
CDB::Write()
DB4: txn_begin
CDB::Write()
CDB::Write()
CDB::Write()
DB4: txn_commit
SetBestChain: new best=000000000c6bcf972117  height=1753  work=7533487589082

So, it appears that we have a CDB::Write() that occurs outside of a transaction (vTxn is empty??).

txnid==NULL is perfectly legal for db4, but it does mean that callpath may be operating outside of the DB_TXN_NOSYNC flag that is set in ::TxnBegin().  Thus, a CDB::Write() outside of a transaction may have synchronous behavior (DB_TXN_SYNC) as governed by DB_AUTO_COMMIT database flag.

EDIT:  Wrapping WriteBlockIndex() inside a transaction does seem to speed up local disk import (-initblocks).

Code:
--- a/main.cpp
+++ b/main.cpp
@@ -1427,7 +1427,10 @@ bool CBlock::AddToBlockIndex(unsigned int nFile, unsigned
     pindexNew->bnChainWork = (pindexNew->pprev ? pindexNew->pprev->bnChainWork
 
     CTxDB txdb;
+    txdb.TxnBegin();
     txdb.WriteBlockIndex(CDiskBlockIndex(pindexNew));
+    if (!txdb.TxnCommit())
+       return false;
 


Of course that implies begin+commit+begin+commit in quick succession (SetBestChain), so maybe a less naive approach might be preferred (nested transactions, or wrap both db4 writes in the same transaction).

Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
jgarzik (OP)
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1099


View Profile
November 30, 2010, 07:28:00 AM
 #27

I timed two runs with clean data directories (no contents), -noirc, -addnode=10.10.10.1, Linux 64-bit.  Hardware: SATA SSD

Mainline, no patches:
     32 minutes to download 94660 blocks.

Mainline + TxnBegin/TxnCommit in AddToBlockIndex():
     25 minutes to download 94660 blocks.


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
satoshi
Founder
Sr. Member
*
qt
Offline Offline

Activity: 364
Merit: 7065


View Profile
December 01, 2010, 09:25:39 PM
 #28

That's a good optimisation.  I'll add that next time I update SVN.

More generally, we could also consider this:

        dbenv.set_lk_max_objects(10000);
        dbenv.set_errfile(fopen(strErrorFile.c_str(), "a")); /// debug
        dbenv.set_flags(DB_AUTO_COMMIT, 1);
+       dbenv.set_flags(DB_TXN_NOSYNC, 1);
        ret = dbenv.open(strDataDir.c_str(),
                         DB_CREATE     |
                         DB_INIT_LOCK  |
                         DB_INIT_LOG   |

We would then rely on dbenv.txn_checkpoint(0, 0, 0) in CDB::Close() to flush after wallet writes.
jgarzik (OP)
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1099


View Profile
December 01, 2010, 11:56:40 PM
 #29

More generally, we could also consider this:

        dbenv.set_lk_max_objects(10000);
        dbenv.set_errfile(fopen(strErrorFile.c_str(), "a")); /// debug
        dbenv.set_flags(DB_AUTO_COMMIT, 1);
+       dbenv.set_flags(DB_TXN_NOSYNC, 1);
        ret = dbenv.open(strDataDir.c_str(),
                         DB_CREATE     |

Or DB_TXN_WRITE_NOSYNC.  Writes, but does not sync.  Should be fast since almost every OS (and hard drive!) has a writeback cache.

Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
MrFlibble
Newbie
*
Offline Offline

Activity: 25
Merit: 0


View Profile
December 03, 2010, 02:09:04 AM
 #30

I tested it on a slow 7 year old drive, where bandwidth and CPU were clearly not the bottleneck.  Initial download took 1 hour 20 minutes.

If it's taking a lot longer than that, certainly 24 hours, then it must be downloading from a very slow node, or your connection is much slower than around 15KB per sec (120kbps), or something else is wrong.  It would be nice to know what appears to be the bottleneck when that happens.

My ~24hr download was to (cheap) Compact Flash in a PATA adapter, with a ~4 Mbit/s ADSL line, port 8333 forwarding from the router, 700MHz Pentium3 and 256MiB RAM on Debian Lenny.  Download rate varied wildly.

I ran bonnie++ so we have some idea how this heap performs,
Code:
$ /usr/sbin/bonnie -d ~/bon/ -f -s 1500MiB -m clunker
[...noise...]
Version 1.03d       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
clunker       1500M            5396   6  5034   5           17658  11 601.4   5
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 10494  60 +++++ +++   980   3  7919  45 +++++ +++  1106   4
clunker,1500M,,,5396,6,5034,5,,,17658,11,601.4,5,16,10494,60,+++++,+++,980,3,7919,45,+++++,+++,1106,4

$ /usr/sbin/bonnie -d ~/bon/ -f -s 1500MiB -m clunker -b
[...noise...]
Version 1.03d       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
clunker       1500M            4921   6  5382   6           19630  12  33.4   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16    68   0 +++++ +++    16   0    68   0 +++++ +++    17   0
clunker,1500M,,,4921,6,5382,6,,,19630,12,33.4,0,16,68,0,+++++,+++,16,0,68,0,+++++,+++,17,0

The second run is with -b "no write buffering.  fsync() after every write.", it doesn't change block writes much (random seeks lose 18x?) but file creation and deletion are 60x - 150x slower.  %CP is CPU usage, +++ is "too fast to be meaningful".


I can compile & test new versions but my realtime-lag is quite high at the moment.
Is there ever anything in debug.log one might wish to redact / anonymise?
Pages: « 1 [2]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!