Bitcoin Forum
December 08, 2016, 12:13:36 PM *
News: Latest stable version of Bitcoin Core: 0.13.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: [1] 2 »  All
  Print  
Author Topic: Synchronizing with Blockchain I/O bound  (Read 3848 times)
Mushoz
Hero Member
*****
Offline Offline

Activity: 686


Bitbuy


View Profile WWW
March 18, 2012, 08:31:55 PM
 #1

I was wondering why synchronizing with the blockchain is bottlenecked so badly by I/O requests? There's plenty of free RAM, and the entire blockchain could easily fit in there, yet the client is "only" using 160MB of RAM at the moment, less than 3% average CPU time and some minor network usage spikes well below my maximum throughput with a well connected client (36 active connections at the moment). Disk usage is constantly pegged to 100% usage, and the harddisk activity is easily heard. With a good internet connection, it should be entirely possible to move the bottleneck to the CPU, resulting in much better synchronizing speeds. Are there any plans on improving caching mechanics?

Alternative clients aren't very well known yet, and people wanting to try out Bitcoin could easily be put off by the bad synchronizing speeds. People wanting to give bitcoin a try would in 99% of the cases use the regular client. People wanting to give it a shot out of curiosity won't like having to wait for literally hours before it catches up, leaving them with a relatively unusable computer in the meantime, as the disk activity makes the computer respond really slow compared to having 1 core pegged to 100% CPU usage.

I really think this is an important area to improve the client in. Thoughts/comments?

www.bitbuy.nl - Koop eenvoudig, snel en goedkoop bitcoins bij Bitbuy!
"The nature of Bitcoin is such that once version 0.1 was released, the core design was set in stone for the rest of its lifetime." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1481199216
Hero Member
*
Offline Offline

Posts: 1481199216

View Profile Personal Message (Offline)

Ignore
1481199216
Reply with quote  #2

1481199216
Report to moderator
antares
Hero Member
*****
Offline Offline

Activity: 518


View Profile
March 18, 2012, 08:36:12 PM
 #2

when I need a new BC, I usually create a ram disk of 3 or so gigabyte, and let bitcoind download the blockchain there. This way I get the entire chain in something below an hour, and once it's there, I simply shutdown bitcoind, and move the blockchain out. seems the best way to do it, until the devs here figure that initial blockchain download is a serious reason hindering new people from getting into bitcoin.
Mushoz
Hero Member
*****
Offline Offline

Activity: 686


Bitbuy


View Profile WWW
March 18, 2012, 08:37:53 PM
 #3

when I need a new BC, I usually create a ram disk of 3 or so gigabyte, and let bitcoind download the blockchain there. This way I get the entire chain in something below an hour, and once it's there, I simply shutdown bitcoind, and move the blockchain out. seems the best way to do it, until the devs here figure that initial blockchain download is a serious reason hindering new people from getting into bitcoin.

Yes, but this proves that it's entirely possible to speed up the process by leaps and bounds. It should automatically be cached to RAM, instead having to do it manually with a RAM-disk.

www.bitbuy.nl - Koop eenvoudig, snel en goedkoop bitcoins bij Bitbuy!
notme
Legendary
*
Offline Offline

Activity: 1526


View Profile
March 18, 2012, 08:39:32 PM
 #4

when I need a new BC, I usually create a ram disk of 3 or so gigabyte, and let bitcoind download the blockchain there. This way I get the entire chain in something below an hour, and once it's there, I simply shutdown bitcoind, and move the blockchain out. seems the best way to do it, until the devs here figure that initial blockchain download is a serious reason hindering new people from getting into bitcoin.

+1

But, it needs to be handled by the client.  We can't expect everyone to be able to set up a ramdisk.

I still think there should be downloads that include the blockchain (up to the latest lock-in block) available alongside the client-only downloads.

https://www.bitcoin.org/bitcoin.pdf
While no idea is perfect, some ideas are useful.
12jh3odyAAaR2XedPKZNCR4X4sebuotQzN
Mushoz
Hero Member
*****
Offline Offline

Activity: 686


Bitbuy


View Profile WWW
March 18, 2012, 08:50:29 PM
 #5

Can't the initial download be handled in a torrent-like way? What we have to do is hardcode data that's usually included in a .torrent file in the client. Instead of downloading the blocks from the clients and manually verifying every transaction, why not hash the blocks and check those hashes against the hashes hardcoded in the client? If they match, there's no need to manually verify every single transaction again. Only blocks created after the last block of which the hash was included in the client have to be checked the regular way. We could even create new ".torrent" like files, only a few kb in size, which include newer blocks. That way, if the client hasn't been updated in a while, we can still easily and quickly catch up with the chain by downloading that small file and opening it with the client. Thoughts/comments?

www.bitbuy.nl - Koop eenvoudig, snel en goedkoop bitcoins bij Bitbuy!
gmaxwell
Moderator
Legendary
*
qt
Offline Offline

Activity: 2030



View Profile
March 18, 2012, 11:56:22 PM
 #6

Can't the initial download be handled in a torrent-like way? What we have to do is hardcode data that's usually included in a .torrent file in the client. Instead of downloading the blocks from the clients and manually verifying every transaction, why not hash the blocks and check those hashes against the hashes hardcoded in the client?

Because this violates the design of Bitcoin in an extreme way.   Bitcoin is, for the most part, a zero trust system. You don't trust the developers to tell you about the right transactions, you trust only that software on your system (that you, or your agents, can audit) has independently validated that the rules have all been followed.

The fact that you and a great many other independent people running full nodes are doing this independent validation is also what enables things like SPV nodes (which don't do this checking) to also be fairly trustworthy.

This is all pretty important because if Bitcoin is to achieve it's goal of removing trust from money then it's not okay to replace state trust with a gaggle of developers and big bitcoin sites (e.g. Deepbit, Mtgox).  I say this to insult them because they are trustworthy folks, but why would you trust a tiny cabal when you won't trust democratically elected states and regulated free market chosen banks?

In any case, this validation doesn't have to get in the way of using the software— Bitcoin could startup as a SPV node and become a full node at its leisure (and lapse back to SPV mode if it falls behind). It's just that the software for this hasn't been written yet. The fact that the validation will happen 'soon' confers almost all of the decentralization benefits, while providing all of the performance benefits.

Of course, none of this has anything to do with the OP's point which was that the synchronization is currently needlessly slow. He's absolutely right.  If you run bitcoin in tmpfs on a fast machine you can do a full blockchain sync in only a half hour. There is no fundimental reason that it couldn't be just as fast while writing to disk, at least on systems with reasonable amounts of ram. This must be fixed, can be fixed, and the Bitcoin using community shouldn't allow the current brokenness to be used as an excuse to degrade the trust model of Bitcoin.  Unfortunately, fixing it doesn't appear to be trivial— and so far everything that has been tried has not been successful (though improvements have been made).



Mushoz
Hero Member
*****
Offline Offline

Activity: 686


Bitbuy


View Profile WWW
March 19, 2012, 12:08:47 AM
 #7

Very good post, and you're right! Back to the initial point then. Smiley Except for some very basic stuff, I'm no programmer. But how hard do you think it is to implement a caching feature? I was checking the Bitcoin-qt process, and it looks like most of it's I/O activity was happening to the blkindex.dat file, which could quite easily fit in most people's RAM. Do you think it's feasible to cache that entire file into RAM? Of course, a smarter caching algorithm would be much better, but would also be quite a bit harder to implement. And we have to make sure sudden loss of power won't result in corrupted blockchains.

www.bitbuy.nl - Koop eenvoudig, snel en goedkoop bitcoins bij Bitbuy!
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428


Core Armory Developer


View Profile WWW
March 19, 2012, 03:46:13 PM
 #8

Very good post, and you're right! Back to the initial point then. Smiley Except for some very basic stuff, I'm no programmer. But how hard do you think it is to implement a caching feature? I was checking the Bitcoin-qt process, and it looks like most of it's I/O activity was happening to the blkindex.dat file, which could quite easily fit in most people's RAM. Do you think it's feasible to cache that entire file into RAM? Of course, a smarter caching algorithm would be much better, but would also be quite a bit harder to implement. And we have to make sure sudden loss of power won't result in corrupted blockchains.


Btw, just for reference, I started writing Armory about 9 months ago when the blockchain was a few hundred MB.  I asked the same question, and even built an experimental, speed-optimized blockchain scanner that holds the entire blockchain in memory.  It has been remarkably successful for those that have enough RAM, but it's going to become unusable very soon.  The blockchain has more than doubled in size since I started, and it's increasing in speed.   I'm scrambling to get something in there so that systems with less than 4GB of RAM can use it...

Instead, I'm switching to an mmap-based solution which seems to give the best of both worlds.  It's treating disk space like memory, and a memory access retrieves the data from disk if it's not in the cache.  The nice thing about this is, if you have a system with 8GB+ RAM, it will just cache the whole blockchain and you get the benefits of the original implementation.  But if you have less RAM, it will cache as much as it can, and supposedly intelligently.  The caching is OS-dependent, but fairly optimized, as it's something that's actually implemented at the kernel layer.  The only consideration there is that if you are going to some kind of structured access pattern of the file, then you can "advise" the mmap'd memory about it and it will optimize itself for it (i.e. - if you are going to access the whole file sequentially, it will start caching sector i+1 as soon as you read sector i).

The problem with "why not hold everything in RAM?" questions is that with Bitcoin, there is no limit on what "everything" will be.  I don't know exactly what the blockindex holds, but there's no guarantee it won't get wildly out of hand -- maybe someone figures out how to spam the blockchain with certain types of bloat.  Then, thousands of users who've been using the program for months, suddenly can't load the client anymore.  Even with blockchain pruning, there's no guarantees.

So, my lessons from Armory were that you should never count on anything being held entirely in RAM.  And I like gmaxwell's solution of having a SPV-node until synchronization completes, then switching.  I've been pondering this a lot recently, but haven't come up with a good, robust (and user-understandable) way to implement it yet.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
notme
Legendary
*
Offline Offline

Activity: 1526


View Profile
March 19, 2012, 04:32:22 PM
 #9

Very good post, and you're right! Back to the initial point then. Smiley Except for some very basic stuff, I'm no programmer. But how hard do you think it is to implement a caching feature? I was checking the Bitcoin-qt process, and it looks like most of it's I/O activity was happening to the blkindex.dat file, which could quite easily fit in most people's RAM. Do you think it's feasible to cache that entire file into RAM? Of course, a smarter caching algorithm would be much better, but would also be quite a bit harder to implement. And we have to make sure sudden loss of power won't result in corrupted blockchains.


Btw, just for reference, I started writing Armory about 9 months ago when the blockchain was a few hundred MB.  I asked the same question, and even built an experimental, speed-optimized blockchain scanner that holds the entire blockchain in memory.  It has been remarkably successful for those that have enough RAM, but it's going to become unusable very soon.  The blockchain has more than doubled in size since I started, and it's increasing in speed.   I'm scrambling to get something in there so that systems with less than 4GB of RAM can use it...

Instead, I'm switching to an mmap-based solution which seems to give the best of both worlds.  It's treating disk space like memory, and a memory access retrieves the data from disk if it's not in the cache.  The nice thing about this is, if you have a system with 8GB+ RAM, it will just cache the whole blockchain and you get the benefits of the original implementation.  But if you have less RAM, it will cache as much as it can, and supposedly intelligently.  The caching is OS-dependent, but fairly optimized, as it's something that's actually implemented at the kernel layer.  The only consideration there is that if you are going to some kind of structured access pattern of the file, then you can "advise" the mmap'd memory about it and it will optimize itself for it (i.e. - if you are going to access the whole file sequentially, it will start caching sector i+1 as soon as you read sector i).

The problem with "why not hold everything in RAM?" questions is that with Bitcoin, there is no limit on what "everything" will be.  I don't know exactly what the blockindex holds, but there's no guarantee it won't get wildly out of hand -- maybe someone figures out how to spam the blockchain with certain types of bloat.  Then, thousands of users who've been using the program for months, suddenly can't load the client anymore.  Even with blockchain pruning, there's no guarantees.

So, my lessons from Armory were that you should never count on anything being held entirely in RAM.  And I like gmaxwell's solution of having a SPV-node until synchronization completes, then switching.  I've been pondering this a lot recently, but haven't come up with a good, robust (and user-understandable) way to implement it yet.



mmap appears to be the correct solution here (and possibly gmaxwell's solution as well)

Any developers of the satoshi client looking at this?  I'd be willing to try my hand at a patch if someone can point me in the right direction, but I'm not familiar with the bitcoin client code or libdb (which may need altered if it doesn't already provide mmapability for databases).

https://www.bitcoin.org/bitcoin.pdf
While no idea is perfect, some ideas are useful.
12jh3odyAAaR2XedPKZNCR4X4sebuotQzN
Mushoz
Hero Member
*****
Offline Offline

Activity: 686


Bitbuy


View Profile WWW
March 20, 2012, 10:21:04 PM
 #10

Very good post, and you're right! Back to the initial point then. Smiley Except for some very basic stuff, I'm no programmer. But how hard do you think it is to implement a caching feature? I was checking the Bitcoin-qt process, and it looks like most of it's I/O activity was happening to the blkindex.dat file, which could quite easily fit in most people's RAM. Do you think it's feasible to cache that entire file into RAM? Of course, a smarter caching algorithm would be much better, but would also be quite a bit harder to implement. And we have to make sure sudden loss of power won't result in corrupted blockchains.


Btw, just for reference, I started writing Armory about 9 months ago when the blockchain was a few hundred MB.  I asked the same question, and even built an experimental, speed-optimized blockchain scanner that holds the entire blockchain in memory.  It has been remarkably successful for those that have enough RAM, but it's going to become unusable very soon.  The blockchain has more than doubled in size since I started, and it's increasing in speed.   I'm scrambling to get something in there so that systems with less than 4GB of RAM can use it...

Instead, I'm switching to an mmap-based solution which seems to give the best of both worlds.  It's treating disk space like memory, and a memory access retrieves the data from disk if it's not in the cache.  The nice thing about this is, if you have a system with 8GB+ RAM, it will just cache the whole blockchain and you get the benefits of the original implementation.  But if you have less RAM, it will cache as much as it can, and supposedly intelligently.  The caching is OS-dependent, but fairly optimized, as it's something that's actually implemented at the kernel layer.  The only consideration there is that if you are going to some kind of structured access pattern of the file, then you can "advise" the mmap'd memory about it and it will optimize itself for it (i.e. - if you are going to access the whole file sequentially, it will start caching sector i+1 as soon as you read sector i).

The problem with "why not hold everything in RAM?" questions is that with Bitcoin, there is no limit on what "everything" will be.  I don't know exactly what the blockindex holds, but there's no guarantee it won't get wildly out of hand -- maybe someone figures out how to spam the blockchain with certain types of bloat.  Then, thousands of users who've been using the program for months, suddenly can't load the client anymore.  Even with blockchain pruning, there's no guarantees.

So, my lessons from Armory were that you should never count on anything being held entirely in RAM.  And I like gmaxwell's solution of having a SPV-node until synchronization completes, then switching.  I've been pondering this a lot recently, but haven't come up with a good, robust (and user-understandable) way to implement it yet.



mmap appears to be the correct solution here (and possibly gmaxwell's solution as well)

Any developers of the satoshi client looking at this?  I'd be willing to try my hand at a patch if someone can point me in the right direction, but I'm not familiar with the bitcoin client code or libdb (which may need altered if it doesn't already provide mmapability for databases).

That implementation sounds fantastic! Exactly what we need. Would be great if this could be implemented, I really think this is quite a high priority. Getting started with bitcoin should be as painless, fast and easy as possible for new users. Good luck with this Notme Smiley

www.bitbuy.nl - Koop eenvoudig, snel en goedkoop bitcoins bij Bitbuy!
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428


Core Armory Developer


View Profile WWW
March 20, 2012, 11:33:35 PM
 #11

mmap appears to be the correct solution here (and possibly gmaxwell's solution as well)

Any developers of the satoshi client looking at this?  I'd be willing to try my hand at a patch if someone can point me in the right direction, but I'm not familiar with the bitcoin client code or libdb (which may need altered if it doesn't already provide mmapability for databases).

That implementation sounds fantastic! Exactly what we need. Would be great if this could be implemented, I really think this is quite a high priority. Getting started with bitcoin should be as painless, fast and easy as possible for new users. Good luck with this Notme Smiley

I'm pretty sure the Satoshi client already uses something along these lines.  In fact, I know I saw mmap in the source code for opening wallets...

Perhaps the difference is that Armory does a full rescan on every load (but does not do full verification, that would take a while), whereas the Satoshi client (I think) only re-scans the last 2500 blocks.  This gives me an opportunity to get the whole blockchain into cache, whereas the Satoshi client won't get it into cache until the first time a scan is done.  If I'm right and it does use mmap for the blockchain, then a second scan (for instance, on an address import) will go much faster if your computer has 4-8 GB of RAM.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
finway
Hero Member
*****
Offline Offline

Activity: 714


View Profile
March 21, 2012, 02:16:02 AM
 #12

when I need a new BC, I usually create a ram disk of 3 or so gigabyte, and let bitcoind download the blockchain there. This way I get the entire chain in something below an hour, and once it's there, I simply shutdown bitcoind, and move the blockchain out. seems the best way to do it, until the devs here figure that initial blockchain download is a serious reason hindering new people from getting into bitcoin.

I should try this. 

randomproof
Member
**
Offline Offline

Activity: 61


View Profile
March 22, 2012, 03:27:33 AM
 #13

What about changing how the database is stored on disk?  It seems to me that using the Berkeley DB library might be causing the problem, but I don't know enough on how that works to be sure.  Maybe using some other database library (like SQLite) would have less disk IO.

Donations to me:   19599Y3PTRF1mNdzVjQzePr67ttMiBG5LS
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428


Core Armory Developer


View Profile WWW
March 22, 2012, 03:29:48 AM
 #14

What about changing how the database is stored on disk?  It seems to me that using the Berkeley DB library might be causing the problem, but I don't know enough on how that works to be sure.  Maybe using some other database library (like SQLite) would have less disk IO.

The blockchain is actually stored in a flat binary file.  It's just one raw block after another, serialized into blk0001.dat.  It's the wallet file that is stored using Berkeley DB.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Pieter Wuille
Legendary
*
qt
Offline Offline

Activity: 1036


View Profile WWW
March 22, 2012, 04:21:53 AM
 #15

By tweaking some caching settings, a rather spectacular speed increase for loading a block chain was obtained. This will probably end up in 0.6 still.

aka sipa, core dev team

Tips and donations: 1KwDYMJMS4xq3ZEWYfdBRwYG2fHwhZsipa
Gavin Andresen
Legendary
*
qt
Offline Offline

Activity: 1652


Chief Scientist


View Profile WWW
March 22, 2012, 02:54:19 PM
 #16

By tweaking some caching settings, a rather spectacular speed increase for loading a block chain was obtained. This will probably end up in 0.6 still.
I pulled #964 for 0.6 this morning.

I had played with database settings several months ago and saw no speedup because there was another bug causing a bottleneck.  That bug was fixed a while ago, but nobody thought to try tweaking the db settings again until a few days ago.

Pieter and Greg did all the hard work of doing a lot of benchmarking to figure out which settings actually matter.

PS: the database settings are run-time configurable for any version of bitcoin; berkeley db reads a file called 'DB_CONFIG' (if it exists) in the "database environment" directory (aka -datadir).

How often do you get the chance to work on a potentially world-changing project?
Mushoz
Hero Member
*****
Offline Offline

Activity: 686


Bitbuy


View Profile WWW
March 22, 2012, 05:28:06 PM
 #17

By tweaking some caching settings, a rather spectacular speed increase for loading a block chain was obtained. This will probably end up in 0.6 still.
I pulled #964 for 0.6 this morning.

I had played with database settings several months ago and saw no speedup because there was another bug causing a bottleneck.  That bug was fixed a while ago, but nobody thought to try tweaking the db settings again until a few days ago.

Pieter and Greg did all the hard work of doing a lot of benchmarking to figure out which settings actually matter.

PS: the database settings are run-time configurable for any version of bitcoin; berkeley db reads a file called 'DB_CONFIG' (if it exists) in the "database environment" directory (aka -datadir).


This is awesome news! A full blockchain download in 33 minutes on a laptop is excellent! Thank you all who made this possible, you've just majorly lowered the bar to entry Smiley

www.bitbuy.nl - Koop eenvoudig, snel en goedkoop bitcoins bij Bitbuy!
Pieter Wuille
Legendary
*
qt
Offline Offline

Activity: 1036


View Profile WWW
March 22, 2012, 05:30:13 PM
 #18


This is awesome news! A full blockchain download in 33 minutes on a laptop is excellent! Thank you all who made this possible, you've just majorly lowered the bar to entry Smiley
[/quote]

Note that that is not including the downloading, only processing them - I imported it from a local file. I assume in normal circumstances it will still take an hour or two (and if you have bad luck, a lot more) to download.

aka sipa, core dev team

Tips and donations: 1KwDYMJMS4xq3ZEWYfdBRwYG2fHwhZsipa
Mushoz
Hero Member
*****
Offline Offline

Activity: 686


Bitbuy


View Profile WWW
March 22, 2012, 05:33:43 PM
 #19


Quote
This is awesome news! A full blockchain download in 33 minutes on a laptop is excellent! Thank you all who made this possible, you've just majorly lowered the bar to entry Smiley

Note that that is not including the downloading, only processing them - I imported it from a local file. I assume in normal circumstances it will still take an hour or two (and if you have bad luck, a lot more) to download.


Validating creates quite a bit of load on the CPU, so it will most likely be bottlenecked by the CPU. A fast connection should easily be able to download the entire blockchain within 30 minutes, as long as the client is well connected with a working uPnP setup. We'll have to wait and see I guess Smiley Does anyone know how I can check whether this Pull is included in 0.6RC1? I would like to give this a shot, but I'm not capable of compiling Bitcoin myself. Thanks Smiley

www.bitbuy.nl - Koop eenvoudig, snel en goedkoop bitcoins bij Bitbuy!
Pieter Wuille
Legendary
*
qt
Offline Offline

Activity: 1036


View Profile WWW
March 22, 2012, 05:40:28 PM
 #20

Validating creates quite a bit of load on the CPU, so it will most likely be bottlenecked by the CPU. A fast connection should easily be able to download the entire blockchain within 30 minutes, as long as the client is well connected with a working uPnP setup. We'll have to wait and see I guess Smiley Does anyone know how I can check whether this Pull is included in 0.6RC1? I would like to give this a shot, but I'm not capable of compiling Bitcoin myself. Thanks Smiley

It was only merged today. 0.6.0rc1 is 1.5 month old. 0.6.0rc4 is 6 days old. It will be included in 0.6.0rc5.

You shouldn't run outdated release candidates by the way - there's a reason a newer rc was created: the old one had (too many) bugs.

aka sipa, core dev team

Tips and donations: 1KwDYMJMS4xq3ZEWYfdBRwYG2fHwhZsipa
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!