I thought people would be excited to hear that I finally got around to really investigating why the initial chain sync-up was so slow (at least for some users) and that I found an easily fixed problem.
It's well known that the reference software does a lot of unneeded synchronous disk IO during syncup (in addition to the large amount of needed async IO), so syncup is expected to be slow right now on systems without fast disks... but even on systems with super fast disks and CPUs, the syncup is still quite slow. People frequently make all kinds of wild network related suggestions that they think will help syncup which I've been discounting:
https://bitcointalk.org/index.php?topic=53389.msg640824#msg640824, but I hadn't had a chance to sit down and work on any of the actual causes of slowness.
Well, it turns out that when the wallet encryption was introduced some usage of mlock() was added to keep private key data out of swap. This is a good thing, but the mlock is used naively which is bad because it's slow (results in a TLB flush). That wouldn't matter too much, except a side effect of the change was to mlock all the memory used by a very common data type used all through bitcoin. This is pointless because most of the usage doesn't contain private data. (and may have even adversely impacted security of the encrypted wallets on systems with limited mlockable memory)
This was somewhat hard to track down— it doesn't show up in the oprofile cycle counter (it would have shown up in TLB flushes, but thats only obvious in retrospect) and didn't show up in some other performance tools like valgrind/callgrind (which is too simplistic an emulation to know mlock is slow). I eventually caught it in ltrace output.
There is still some ongoing discussion about the best path to fix this, but the performance improvement at least on some nodes is quite significant. YMMV, if you're not on a super fast SSD this improvement will be much smaller (at least until the excessive fsync usage is resolved), and my AMD AMD Phenom(tm) II X6 1055T with kernel 2.6.35.14-106.fc14.x86_64 might mlock slower than some others.
This should also speed up the client generally, though its especially obvious for the chain syncup.
Enough talk, How about some pictures:
The blue and red lines are variants of this fix, green is the stock client.
(note the log scale)
There is more discussion on the pull request (
https://github.com/bitcoin/bitcoin/pull/740)