Bitcoin Forum
May 03, 2024, 03:14:35 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2]  All
  Print  
Author Topic: Blockchain corruption during power loss?  (Read 2333 times)
davout
Legendary
*
Offline Offline

Activity: 1372
Merit: 1007


1davout


View Profile WWW
July 08, 2013, 10:50:21 AM
 #21

I also encountered a corrupt LevelDB and it also appeared to be a suspend related issue. My guess - power management on modern Macs is buggy and is likely to cause the file system to lose its integrity in some way. The fact that Mac's do sometimes just die and refuse to unsuspend strongly suggests the presence of fatal errors in their implementation. Recent OS X versions are sloppy in other ways - when the laptop lid is opened and the unsuspend process begins, the first thing it does is display a screenshot of the password entry screen! Of course it's not actually usable for many seconds so any keypresses you make get thrown away. This kind of duplicitous nonsense is classic Apple. Now think - if your power management engineering team is the kind that'd make such a decision, do you trust them to get the details 100% right? I wouldn't.

All that said running nodes on systems that come and go all the time is hardly helping the network and most users will get tired of it sucking up battery and other resources. I can't see running full nodes on laptops being popular in the long run. So fixing this doesn't seem to be very important to me, certainly it shouldn't be seen as blocking pruning. If it's robust on Linux servers, that's the most important thing.

It's not a laptop it's a Mac Mini that simply goes idle from time to time, for example at night when unused.
And bitcoind should probably be a little more resilient than "oh really, you let your computer go idle, let's just re-download the whole chain".

If you think bitcoind should only be resilient on Debian stable in a well-connected datacenter you're going to keep seeing the general decline in nodes that is being experienced.

If you reason this way why would you want to implement pruning at all? After all if bitcoind runs fine on a server with an i7, 1To disk, 32 Go RAM and a 1Gbps connection that's the most important thing right ?

TalkImg was created especially for hosting images on bitcointalk.org: try it next time you want to post an image
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714749275
Hero Member
*
Offline Offline

Posts: 1714749275

View Profile Personal Message (Offline)

Ignore
1714749275
Reply with quote  #2

1714749275
Report to moderator
1714749275
Hero Member
*
Offline Offline

Posts: 1714749275

View Profile Personal Message (Offline)

Ignore
1714749275
Reply with quote  #2

1714749275
Report to moderator
Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1129


View Profile
July 08, 2013, 11:27:03 AM
 #22

Well, your last paragraph explains why I'm not myself working on pruning right now Wink   (also sipa said he'd do it).

Is the Mac Mini actually entering a sleep state of some kind? You said it happens when the machine comes back from suspend. Now your machine is just "idling". So which is it? If the computer is just running normally then that'd imply spontaneous random destruction of the db, which I've not seen myself.
davout
Legendary
*
Offline Offline

Activity: 1372
Merit: 1007


1davout


View Profile WWW
July 08, 2013, 11:35:21 AM
 #23

Is the Mac Mini actually entering a sleep state of some kind? You said it happens when the machine comes back from suspend. Now your machine is just "idling". So which is it? If the computer is just running normally then that'd imply spontaneous random destruction of the db, which I've not seen myself.

The steps that I used to reproduce it before giving up on bitcoind on the Mac mini :

 - Have a good evening coding around, listening to some nice music and stuff,
 - go to bed
 - come back to computer with cup of coffee
 - fail to bring computer back up from idle/sleep/suspend or whatever a mac does when you leave it alone for a while
 - hard reboot it
 - rage at bitcoind casually telling me that the blockchain is corrupted and that i need to reindex

Conclusion : in the presence of coffee, leveldb spontaneously self-destructs.

Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1129


View Profile
July 08, 2013, 11:42:15 AM
 #24

Yeah so the Mac Mini went into a sleep state just like a laptop did. I think the Mini vs laptop distinction is a distraction here, the root cause is failed unsuspends which leads me to wonder if it's even solvable by LevelDB or us. If the machine just fails to wake up then it's obviously hosed internally in some bad way.
davout
Legendary
*
Offline Offline

Activity: 1372
Merit: 1007


1davout


View Profile WWW
July 08, 2013, 11:48:30 AM
 #25

Yeah so the Mac Mini went into a sleep state just like a laptop did. I think the Mini vs laptop distinction is a distraction here, the root cause is failed unsuspends which leads me to wonder if it's even solvable by LevelDB or us. If the machine just fails to wake up then it's obviously hosed internally in some bad way.

Well, the root cause is obviously the Mac, right.
The bug on bitcoind's side though, IMO, is that it forces you to go through hours and hours of reindexing redownloading when this kind of stuff happens instead of handling it more gracefully.



Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1129


View Profile
July 08, 2013, 12:14:11 PM
 #26

You shouldn't have to download, reindexing is enough. Whether it's possible to handle more gracefully really depends on the exact details of what goes wrong.

What we could potentially do is make rolling backups of verified-consistent databases, and then just roll back to the last database then replay from that point onwards. So it'd reduce the amount of reindexing time.

But the simpler fix is to just not run Bitcoin-Qt on Macs that might sleep a lot.
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1091


View Profile
July 08, 2013, 03:07:08 PM
 #27

But the simpler fix is to just not run Bitcoin-Qt on Macs that might sleep a lot.

If leveldb cannot handle suspend/resume with full data integrity, then we may need to revisit it.


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1129


View Profile
July 08, 2013, 03:09:42 PM
 #28

The issue is not suspend/resume when it works, which LevelDB can survive just fine, it's when the OS or hardware itself screws up the resume process.

ACPI suspend/resume is unbelievably complicated, at least the way it used to work required the BIOS to provide an actual program written in a special kind of assembly language run by an ACPI interpreter as part of bringing the system down/up. If anything goes wrong in that process, all bets are off - pretty much anything could happen to the data on disk.

LevelDB is designed to survive certain kinds of file system corruption by ensuring that all commits are atomic .... on the assumption that file system renames and write() calls are atomic. If that underlying OS assumption is violated by bugs in the OS or hardware, LevelDB can corrupt itself as would any other database.
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1091


View Profile
July 08, 2013, 03:41:20 PM
 #29

LevelDB is designed to survive certain kinds of file system corruption by ensuring that all commits are atomic .... on the assumption that file system renames and write() calls are atomic. If that underlying OS assumption is violated by bugs in the OS or hardware, LevelDB can corrupt itself as would any other database.

Write calls have never been atomic in any Unix-ish OS...  They may be reordered by the OS between fsync/fdatasync calls, and may be reordered again at the hardware (disk) level, unless the OS sends a hardware flush command (FLUSH CACHE / SYNCHRONIZE CACHE).


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1129


View Profile
July 08, 2013, 04:20:58 PM
 #30

Yeah, it does do at least fdatasync in some cases. OK, actually, I don't know the exact requirements LevelDB makes of the OS, but I really doubt it's anything un-POSIXish.
Pages: « 1 [2]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!