Title: Data directory over NFS Post by: myfirst on May 19, 2015, 04:39:56 PM Is it safe to keep the blockchain data and wallet.dat file on a NFS mounted share?
What happens if the NFS server temporarily becomes unavailable? Will the databases get damaged? Title: Re: Data directory over NFS Post by: 2112 on May 19, 2015, 10:42:48 PM Most likely not safe. The LevelDB isn't really 100% ACID, especially the chainstate directory is prone to corruptions if the node crashes.
Edit: on the other hand if you have a tight backup schedule or the underlying file system supports snapshots then it is OK to run them over NFS because neither the blocks nor chainstate LevelDB databases are really critical and can be always rebuilt after the corruption. It all depends on what exactly is your tolerance profile for outages. Title: Re: Data directory over NFS Post by: myfirst on May 21, 2015, 01:01:34 AM Most likely not safe. The LevelDB isn't really 100% ACID, especially the chainstate directory is prone to corruptions if the node crashes. I have 3 servers that host nodes for several different block chains (altcoins) and 1 NFS server used to store the blockchain and wallet data files.Edit: on the other hand if you have a tight backup schedule or the underlying file system supports snapshots then it is OK to run them over NFS because neither the blocks nor chainstate LevelDB databases are really critical and can be always rebuilt after the corruption. It all depends on what exactly is your tolerance profile for outages. I've certainly starting to run into issues with the altcoins based on 0.8.x and 0.9.x codebase. So far Bitcoin 0.10.1 seems to do okay. The NFS server kernel panicked while the nodes where still running and it seems that the nodes continue to 'write' to the databases even though the mount is offline. The operating system has 'paused' all operations to that mount yet the running mode does not seem to honor that. The end result is devastating at times and fits in to these 3 categories:
In all cases you don't know something is wrong until after you restart the node, which make matters worse. wallet.dat I run wallet.dat backups every hour thinking I would be safe. However when wallet.dat gets damaged, the backup is also damaged. This is strange and the worst case. Thankfully I've only seen it once and happen to have had an older backup that was not damaged. With both the original wallet.dat and the backup version you will see something like this in debug.log indicating it's hosed. Code: 2015-05-21 00:39:12 block index 5175ms and in db.log Code: Page 1229: unreferenced page The question is, why is the backup corrupted? Large valid fork found This error I've only seen with 0.9.x version, it's not the worst thing in the world as long as you catch it. As with all cases the error does not appear until after the node is restarted. Code: 2015-05-21 00:18:20 CheckForkWarningConditions: Warning: Large valid fork found Code: Warning: The network does not appear to fully agree! Some miners appear to be experiencing issues. The node continues to run in this state until you do a reindex. Very scary. Error loading blkindex.dat This has been a very common issue, especially with older versions (0.8.x and whatever Peercoin is based off of). In some versions I can delete blkindex.dat and it will reindex on it's own, in other version I have to run with the 'reindex' command. Worst case here is I need to re-download the chain. debug.log will look like this, the node will attempt to start then shutdown: Code: 05/21/15 00:10:28 Verifying last 2500 blocks at level 1 Here is another example where I had to run reindex manually Code: 2015-05-20 23:56:10 Opened LevelDB successfully With one coin there was no blkindex.dat file present (even with a valid data directory) yet it complains it can't load it. This one I had to re-download the chain. Amazingly (and thankfully) the Bitcoin blockchain has not suffered corruption as of yet. The worst part about all this is that not even your backups are safe. Besides NOT using NFS (not an option for the time being), is there a way to minimize the damage? Would taking hourly volume snapshots on the NFS server be of any value? It runs ZFS. Title: Re: Data directory over NFS Post by: achow101_alt on May 21, 2015, 01:10:27 AM Since you are running multiple nodes, the nodes will write and overwrite things in the data directory. This causes corruption because the nodes don't know what the other nodes wrote and the sudden change of data along with simultaneous reading and writing done by the nodes to the data directory causes all of these issues. The data is then constantly changed and corrupted, causing you to need to constantly reindex or redownload the blockchain. Since all of the nodes are also accessing the same wallet.dat file, they cause it to be corrupted like the blockchain database files. The only weird thing is that the backups were also corrupted. You should keep the backups on a separate machine just in case.
Title: Re: Data directory over NFS Post by: myfirst on May 21, 2015, 01:16:54 AM Since you are running multiple nodes, the nodes will write and overwrite things in the data directory. The nodes each write to their own data directory on the same NFS server but independent of each other. The nodes are for various different altcoins. Since all of the nodes are also accessing the same wallet.dat file, they cause it to be corrupted like the blockchain database files. The only weird thing is that the backups were also corrupted. You should keep the backups on a separate machine just in case. It's not the same wallet.dat file, each node has it's own wallet.dat and blockchain. The wallet.dat is backed up to the local filesystem, not the NFS mount. Title: Re: Data directory over NFS Post by: achow101_alt on May 21, 2015, 01:39:50 AM Since you are running multiple nodes, the nodes will write and overwrite things in the data directory. The nodes each write to their own data directory on the same NFS server but independent of each other. The nodes are for various different altcoins. Since all of the nodes are also accessing the same wallet.dat file, they cause it to be corrupted like the blockchain database files. The only weird thing is that the backups were also corrupted. You should keep the backups on a separate machine just in case. It's not the same wallet.dat file, each node has it's own wallet.dat and blockchain. The wallet.dat is backed up to the local filesystem, not the NFS mount. I think that there may be open issues with some of the altcoins. You should check the git repos and see if there are any issues that are similar to yours. Also, it would be helpful if we knew what the coins you have on this NFS. Regarding your corrupted wallet.dat, perhaps your backup had accidentally backed up a corrupted version of the wallet before you noticed? Title: Re: Data directory over NFS Post by: myfirst on May 21, 2015, 02:12:51 AM I think that there may be open issues with some of the altcoins. You should check the git repos and see if there are any issues that are similar to yours. Also, it would be helpful if we knew what the coins you have on this NFS. The problematic ones have been:
Emerald and Peercoin have been the most troublesome. I have several others on NFS (probably more than 20 total at the moment) like unobtanium, litecoin, dash, earthcoin, etc... None of those have had problems. Regarding your corrupted wallet.dat, perhaps your backup had accidentally backed up a corrupted version of the wallet before you noticed? That is most likely the problem. The wallet.dat backups run automatically every hour via RPC 'backupwallet()'. When the NFS server reboots and the node has not been restarted, the 'backupwallet' command will result in a damaged wallet. I think what I will end up doing it setting up a script that will trigger the nodes to restart if NFS goes offline and online. One thing I have not tried is to keep NFS offline indefinitely. I am wondering how long before the node crashes. Might try that on a test server. Title: Re: Data directory over NFS Post by: 2112 on May 21, 2015, 04:24:08 AM I think you are doing things wrong. I understand that your problems are actually with BerkeleyDB.
1) always hard mount the NFS directories with the initial mount done in background if the server is not available (NFS mount options "hard,bg") 2) do the backups from the client side, not from the server. This way the backup program sees the most recent data. The backups done on the server are only consistent when the 3) change the initialization options of BerkeleyDB to make the memory mapped files visible in the file system (I think the files have names like "_db.000"). I don't remember if that can be done with simply adding flags to the "DB_CONFIG" or if it requires recompilation. With those files visible you can run the db_utils on the client while coins clients are running and make fully consistent backups using db_backup and do live consistency checks with db_verify. Just remember to add "set_lg_dir database" to the DB_CONFIG so the BerkeleyDB utilities can find the live logs. With those caveats I've successfully run the BerkeleyDB-based programs over NFS for very many years with the central storage & build server, longer than the Bitcoin was in existence. Title: Re: Data directory over NFS Post by: myfirst on May 22, 2015, 12:06:55 AM I think you are doing things wrong. I understand that your problems are actually with BerkeleyDB. 1) always hard mount the NFS directories with the initial mount done in background if the server is not available (NFS mount options "hard,bg") I'll experiment with those options. I think on my OS 'hard' is default. 2) do the backups from the client side, not from the server. This way the backup program sees the most recent data. The backups done on the server are only consistent when the I think that is what I am doing when I use the RPC function?: Code: 2015-05-21 23:55:02 Using data directory /mnt/storage01/blockchains/bitcoin 3) change the initialization options of BerkeleyDB to make the memory mapped files visible in the file system (I think the files have names like "_db.000"). I don't remember if that can be done with simply adding flags to the "DB_CONFIG" or if it requires recompilation. With those files visible you can run the db_utils on the client while coins clients are running and make fully consistent backups using db_backup and do live consistency checks with db_verify. Just remember to add "set_lg_dir database" to the DB_CONFIG so the BerkeleyDB utilities can find the live logs. This is interesting, I think peercoin does that already since I've seen those files. Anyway, where can I get more information about trying this suggestion. Title: Re: Data directory over NFS Post by: myfirst on May 22, 2015, 12:19:28 AM Bummer, just had another wallet go bad:
Code: 05/21/15 22:20:02 KolschCoin version v1.0.0.0 () Title: Re: Data directory over NFS Post by: 2112 on May 25, 2015, 08:30:32 PM This is interesting, I think peercoin does that already since I've seen those files. Anyway, where can I get more information about trying this suggestion. Actually, after checking, there is a command line/*coin.conf for this in the recent code: -privdb=0 .But no matter what you do you will have to read the Berkeyley DB documentation anyway. While I didn't run the most recent Bitcoin clients over NFS, I did quite a bit of load testing in times of version 0.6.3 (and thereabouts) and did not have anything like your problems. My professional opinion is that you are suffering from some sort of hardware problem or misconfiguration. NFS is quite fragile and tends to stress the infrastructure, especially if run over UDP. Check your port mapper RPC error statistics, verify the network switches for correct configuration of flow control, verify the correct operation of checksum offload in the drivers, etc. I didn't do anything serious then, the coin daemons were simply the second thing that popped into my mind when I had to think for a load test for our revamped/reinstalled NAS & SAN hardware running under Solaris 9 and 10. The first tests were actually "cp -rpv" of a terabytes-sized collection of Linux ISOs followed by "md5sum". There were some errors, but in particular all my wallet.dat-s survived without any damage. |