Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: mustyoshi on July 02, 2013, 03:55:58 PM



Title: Blockchain corruption during power loss?
Post by: mustyoshi on July 02, 2013, 03:55:58 PM
I lost power the other day while my bitcoind was running, and I went to start it again and it was unable to read the blocks. I'm currently in the process of redownloading the blockchain.

Has anybody else had this issue?


Title: Re: Blockchain corruption during power loss?
Post by: Trader Steve on July 02, 2013, 04:03:53 PM
How ironic, I just had the same problem. I had to remove the program and the associated files and then reinstall the entire program. Fortunately, I keep several backups of the wallet.dat file. Now, my computer has been running non-stop for the last 7 hours updating the blockchain - only 16,000 blocks to go!



Title: Re: Blockchain corruption during power loss?
Post by: mustyoshi on July 02, 2013, 04:06:47 PM
How ironic, I just had the same problem. I had to remove the program and the associated files and then reinstall the entire program. Fortunately, I keep several backups of the wallet.dat file. Now, my computer has been running non-stop for the last 7 hours updating the blockchain - only 16,000 blocks to go!


Oh good, I was afraid it was cause of the modifications I made to my bitcoind that scan transactions for specific addresses and keep metrics on them. But since you also had the same issue I have ruled that out, thanks.

I should probably start backing up my blockchain at regular intervals to prevent this from happening again.


Title: Re: Blockchain corruption during power loss?
Post by: elasticband on July 02, 2013, 04:08:24 PM
i had this also during a power outage... only took an hour or so to reindex the blocks htough


Title: Re: Blockchain corruption during power loss?
Post by: mustyoshi on July 02, 2013, 04:09:30 PM
i had this also during a power outage... only took an hour or so to reindex the blocks htough
wait what

you mean I didn't have to delete everything?


Title: Re: Blockchain corruption during power loss?
Post by: elasticband on July 02, 2013, 04:12:16 PM
nope.... I just restarted the client, then is had an error and said it needed to reindex the blocks(i imagine the blockchain stored on my computer), then about an hour later it was done, probably not even an hour...... i set it to low priority


Title: Re: Blockchain corruption during power loss?
Post by: mustyoshi on July 02, 2013, 04:25:00 PM
nope.... I just restarted the client, then is had an error and said it needed to reindex the blocks(i imagine the blockchain stored on my computer), then about an hour later it was done, probably not even an hour...... i set it to low priority
Hmm, well I'm almost certain it didn't say anything about indexing the blocks, it was more could not read the block. So at first I tried deleting all the lock files and it still didn't work.


Title: Re: Blockchain corruption during power loss?
Post by: kjj on July 03, 2013, 10:57:23 AM
Buy a UPS.

Clever programming simply cannot prevent corruption during a power loss.  The filesystem guys have been working on the problem for like 50 years, and they all agree that you need to buy a UPS.


Title: Re: Blockchain corruption during power loss?
Post by: TierNolan on July 03, 2013, 11:49:42 AM
Clever programming simply cannot prevent corruption during a power loss. 

Atomic file operations would ensure that the disk is always in a valid state.  However, it doesn't seem to be a high priority for OS designers.


Title: Re: Blockchain corruption during power loss?
Post by: davout on July 03, 2013, 11:52:29 AM
Are you guys running OSX ?


Title: Re: Blockchain corruption during power loss?
Post by: Mysticsam_3579 on July 06, 2013, 02:30:49 PM
Clever programming simply cannot prevent corruption during a power loss. 

Atomic file operations would ensure that the disk is always in a valid state.  However, it doesn't seem to be a high priority for OS designers.

It is solved. The answer is: ZFS
Here is a link: http://en.wikipedia.org/wiki/ZFS


Title: Re: Blockchain corruption during power loss?
Post by: piotr_n on July 07, 2013, 01:47:21 AM
just make a backup of your chain, once for awhile.
or download the chain from some torrents, if you hadn't.
otherwise it takes ages to recover from such a lose :)


Title: Re: Blockchain corruption during power loss?
Post by: gmaxwell on July 07, 2013, 04:13:32 PM
Can you please disclose what OS, OS version, and Bitcoin version you're running?

I've tried to reproduce unclean shutdown corruption and in hundreds of shutdowns in Linux been unable to do so.

Contrary to what KJJ claims— it is actually not supposed to do this, and at least on some systems it does not appear to (or at least does so with only negligible probability).  I suspect that leveldb has some bugs on some systems/enviroments which degrades its durability, but with basically nothing to go on its hard to determine why.

We absolutely _must_ get this fixed— or at least reduced to negligible probability for all users— before we can support pruning.


Title: Re: Blockchain corruption during power loss?
Post by: malevolent on July 07, 2013, 06:58:59 PM
Can you please disclose what OS, OS version, and Bitcoin version you're running?

This also happened to me once on Win7 64, but that was 2 years ago with 0.3-something.


Title: Re: Blockchain corruption during power loss?
Post by: piotr_n on July 07, 2013, 07:51:04 PM
I once gave you a snapshot of a naturally corrupt testnet3 DB, that the official client wasn't able to continue with... and you didn't even bother do download it, did you?
And now you suddenly care... :)


Title: Re: Blockchain corruption during power loss?
Post by: davout on July 07, 2013, 10:24:31 PM
Can you please disclose what OS, OS version, and Bitcoin version you're running?

Discussed this with you a few weeks ago.
Happens to me every single time OSX ML fails to come back up from suspend.
Pretty much the only reason why I stopped running a node.


Title: Re: Blockchain corruption during power loss?
Post by: kjj on July 08, 2013, 03:01:07 AM
Contrary to what KJJ claims— it is actually not supposed to do this

Heh.  I never said that it was supposed to happen.  I was just pointing out the workaround used all around the world by people with unreliable power.

Maybe we should make a sticky at the top for bug reporting best practices.


Title: Re: Blockchain corruption during power loss?
Post by: jgarzik on July 08, 2013, 05:57:48 AM
Can you please disclose what OS, OS version, and Bitcoin version you're running?

Discussed this with you a few weeks ago.
Happens to me every single time OSX ML fails to come back up from suspend.
Pretty much the only reason why I stopped running a node.

What is an OSX ML?



Title: Re: Blockchain corruption during power loss?
Post by: davout on July 08, 2013, 07:05:48 AM
What is an OSX ML?

This. (http://bit.ly/14Xhe1T)


Title: Re: Blockchain corruption during power loss?
Post by: Mike Hearn on July 08, 2013, 10:32:57 AM
I also encountered a corrupt LevelDB and it also appeared to be a suspend related issue. My guess - power management on modern Macs is buggy and is likely to cause the file system to lose its integrity in some way. The fact that Mac's do sometimes just die and refuse to unsuspend strongly suggests the presence of fatal errors in their implementation. Recent OS X versions are sloppy in other ways - when the laptop lid is opened and the unsuspend process begins, the first thing it does is display a screenshot of the password entry screen! Of course it's not actually usable for many seconds so any keypresses you make get thrown away. This kind of duplicitous nonsense is classic Apple. Now think - if your power management engineering team is the kind that'd make such a decision, do you trust them to get the details 100% right? I wouldn't.

All that said running nodes on systems that come and go all the time is hardly helping the network and most users will get tired of it sucking up battery and other resources. I can't see running full nodes on laptops being popular in the long run. So fixing this doesn't seem to be very important to me, certainly it shouldn't be seen as blocking pruning. If it's robust on Linux servers, that's the most important thing.



Title: Re: Blockchain corruption during power loss?
Post by: davout on July 08, 2013, 10:50:21 AM
I also encountered a corrupt LevelDB and it also appeared to be a suspend related issue. My guess - power management on modern Macs is buggy and is likely to cause the file system to lose its integrity in some way. The fact that Mac's do sometimes just die and refuse to unsuspend strongly suggests the presence of fatal errors in their implementation. Recent OS X versions are sloppy in other ways - when the laptop lid is opened and the unsuspend process begins, the first thing it does is display a screenshot of the password entry screen! Of course it's not actually usable for many seconds so any keypresses you make get thrown away. This kind of duplicitous nonsense is classic Apple. Now think - if your power management engineering team is the kind that'd make such a decision, do you trust them to get the details 100% right? I wouldn't.

All that said running nodes on systems that come and go all the time is hardly helping the network and most users will get tired of it sucking up battery and other resources. I can't see running full nodes on laptops being popular in the long run. So fixing this doesn't seem to be very important to me, certainly it shouldn't be seen as blocking pruning. If it's robust on Linux servers, that's the most important thing.

It's not a laptop it's a Mac Mini that simply goes idle from time to time, for example at night when unused.
And bitcoind should probably be a little more resilient than "oh really, you let your computer go idle, let's just re-download the whole chain".

If you think bitcoind should only be resilient on Debian stable in a well-connected datacenter you're going to keep seeing the general decline in nodes that is being experienced.

If you reason this way why would you want to implement pruning at all? After all if bitcoind runs fine on a server with an i7, 1To disk, 32 Go RAM and a 1Gbps connection that's the most important thing right ?


Title: Re: Blockchain corruption during power loss?
Post by: Mike Hearn on July 08, 2013, 11:27:03 AM
Well, your last paragraph explains why I'm not myself working on pruning right now ;)   (also sipa said he'd do it).

Is the Mac Mini actually entering a sleep state of some kind? You said it happens when the machine comes back from suspend. Now your machine is just "idling". So which is it? If the computer is just running normally then that'd imply spontaneous random destruction of the db, which I've not seen myself.


Title: Re: Blockchain corruption during power loss?
Post by: davout on July 08, 2013, 11:35:21 AM
Is the Mac Mini actually entering a sleep state of some kind? You said it happens when the machine comes back from suspend. Now your machine is just "idling". So which is it? If the computer is just running normally then that'd imply spontaneous random destruction of the db, which I've not seen myself.

The steps that I used to reproduce it before giving up on bitcoind on the Mac mini :

 - Have a good evening coding around, listening to some nice music and stuff,
 - go to bed
 - come back to computer with cup of coffee
 - fail to bring computer back up from idle/sleep/suspend or whatever a mac does when you leave it alone for a while
 - hard reboot it
 - rage at bitcoind casually telling me that the blockchain is corrupted and that i need to reindex

Conclusion : in the presence of coffee, leveldb spontaneously self-destructs.


Title: Re: Blockchain corruption during power loss?
Post by: Mike Hearn on July 08, 2013, 11:42:15 AM
Yeah so the Mac Mini went into a sleep state just like a laptop did. I think the Mini vs laptop distinction is a distraction here, the root cause is failed unsuspends which leads me to wonder if it's even solvable by LevelDB or us. If the machine just fails to wake up then it's obviously hosed internally in some bad way.


Title: Re: Blockchain corruption during power loss?
Post by: davout on July 08, 2013, 11:48:30 AM
Yeah so the Mac Mini went into a sleep state just like a laptop did. I think the Mini vs laptop distinction is a distraction here, the root cause is failed unsuspends which leads me to wonder if it's even solvable by LevelDB or us. If the machine just fails to wake up then it's obviously hosed internally in some bad way.

Well, the root cause is obviously the Mac, right.
The bug on bitcoind's side though, IMO, is that it forces you to go through hours and hours of reindexing redownloading when this kind of stuff happens instead of handling it more gracefully.




Title: Re: Blockchain corruption during power loss?
Post by: Mike Hearn on July 08, 2013, 12:14:11 PM
You shouldn't have to download, reindexing is enough. Whether it's possible to handle more gracefully really depends on the exact details of what goes wrong.

What we could potentially do is make rolling backups of verified-consistent databases, and then just roll back to the last database then replay from that point onwards. So it'd reduce the amount of reindexing time.

But the simpler fix is to just not run Bitcoin-Qt on Macs that might sleep a lot.


Title: Re: Blockchain corruption during power loss?
Post by: jgarzik on July 08, 2013, 03:07:08 PM
But the simpler fix is to just not run Bitcoin-Qt on Macs that might sleep a lot.

If leveldb cannot handle suspend/resume with full data integrity, then we may need to revisit it.



Title: Re: Blockchain corruption during power loss?
Post by: Mike Hearn on July 08, 2013, 03:09:42 PM
The issue is not suspend/resume when it works, which LevelDB can survive just fine, it's when the OS or hardware itself screws up the resume process.

ACPI suspend/resume is unbelievably complicated, at least the way it used to work required the BIOS to provide an actual program written in a special kind of assembly language run by an ACPI interpreter as part of bringing the system down/up. If anything goes wrong in that process, all bets are off - pretty much anything could happen to the data on disk.

LevelDB is designed to survive certain kinds of file system corruption by ensuring that all commits are atomic .... on the assumption that file system renames and write() calls are atomic. If that underlying OS assumption is violated by bugs in the OS or hardware, LevelDB can corrupt itself as would any other database.


Title: Re: Blockchain corruption during power loss?
Post by: jgarzik on July 08, 2013, 03:41:20 PM
LevelDB is designed to survive certain kinds of file system corruption by ensuring that all commits are atomic .... on the assumption that file system renames and write() calls are atomic. If that underlying OS assumption is violated by bugs in the OS or hardware, LevelDB can corrupt itself as would any other database.

Write calls have never been atomic in any Unix-ish OS...  They may be reordered by the OS between fsync/fdatasync calls, and may be reordered again at the hardware (disk) level, unless the OS sends a hardware flush command (FLUSH CACHE / SYNCHRONIZE CACHE).



Title: Re: Blockchain corruption during power loss?
Post by: Mike Hearn on July 08, 2013, 04:20:58 PM
Yeah, it does do at least fdatasync in some cases. OK, actually, I don't know the exact requirements LevelDB makes of the OS, but I really doubt it's anything un-POSIXish.