Bitcoin Forum
October 30, 2024, 03:46:12 PM *
News: Bitcoin Pumpkin Carving Contest
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 4 5 6 »  All
  Print  
Author Topic: [FIXED] MacOS X LevelDB Corruption Bounty (10.00 BTC + 200.2 LTC)  (Read 83869 times)
wtogami (OP)
Sr. Member
****
Offline Offline

Activity: 263
Merit: 250



View Profile
November 18, 2013, 03:26:40 AM
Last edit: December 29, 2013, 04:47:55 AM by wtogami
 #1

FIXED.

Can you fix the MacOS X Bitcoin LevelDB data corruption issue?

https://bitcointalk.org/index.php?topic=337294.msg3718821#msg3718821
TEST THESE BUILDS NOW!


Bounty Funding: 10.00 BTC + 200.2 LTC
Gavin Andresen has pledged 5 BTC.  BitcoinTalk pledged 4 BTC.  Public donations have contributed 1 BTC.  Litecoin Dev Team pledges 200 LTC.  The public is encouraged to contribute to these addresses to increase the incentive to fix this sooner.


Conditions
The bounty may be awarded under the following conditions.

  • Document how anyone can consistently reproduce the data corruption.
  • Explain why it happens.
  • Write a code fix that is acceptable to the Bitcoin core developers and merged into Bitcoin.

The Bitcoin developers have ultimate deciding power of how to apportion the bounty award(s) based upon the merit of the contributions  This may encourage collaboration that may lead to a fix rather than hoarding of information.  Non-developers may be able to figure out #1.

These terms may be changed at any time for any or no reason.

Background
https://github.com/bitcoin/bitcoin/issues/2770
Since Bitcoin 0.8.x and the introduction of LevelDB, MacOS X users have been experiencing periodic LevelDB data corruption.  For some Mac users it has never happened, while for others it happens frequently.

https://github.com/bitcoin/bitcoin/pull/2916
https://github.com/bitcoin/bitcoin/pull/3000
https://github.com/bitcoin/bitcoin/pull/2933
Bitcoin master now contains two Mac-specific fsync patches and an upgrade to LevelDB 1.13.  Bitcoin 0.8.5 OMG3 and Litecoin 0.8.5.2-rc5 contains these same patches.  It is possible that a different Mac corruption issue was solved by these earlier patches, but users of these branches have reported continued corruption.  Curiously, corruption seems to happen after a clean shutdown and restart of the client.  All corruption reports seem to be from MacOS X 10.8.x and 10.9 users.  It is unclear if earlier versions of MacOS X are affected.  It is unknown if particular hardware or software configurations are involved.

https://github.com/bitcoin/bitcoin/issues/2785
Corruption with the same error message apparently is capable of fixing itself.  Not clear if this is true of the recent master branch.

GPG Signed message
Code:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

https://bitcointalk.org/index.php?topic=337294
These addresses contain public donations to be added to the Bitcoin MacOS X corruption fix bounty.
BTC: 1FZ1mSJXj8aJqdpwUcpigLBqJLwtTu46fA
LTC: LS1Rb3bb29TA9PEVGR64bV2cLxC7RdQi8A
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)

iQQcBAEBCAAGBQJSiaeFAAoJELEXnrc0fcENmRsf/3c/w53R2EHX62L+QimS96Rj
J+GPSpVQQRFOFr19OM+efjC1ydoZ3N/suYI1FynQ9nX4RzmCW5ZwbxMtl6wnEw7h
oIqv+ufnD0XEpkFr+g32JdoRNN2KprrMH4Cr2oLI0w+Oqv32jLveoRIqSzIArCId
U9ZVPcvFvKa9hWJrnM9KJQW6NgsGsKW3WBk5n/Wcbp4PYUn9ZC0taRMq2NbakSwk
RaNf6yFSC1wWb2dD6eE+1UiXBCidyK0cVUMkjCRoA0eRqZqy2cJwELmOrJ1RHlgP
6K9Y6MuelTPxhXNa/NNq/sVAbhOmtAeyJ5ApuTuvjd1gpKpS14bFEHY7yFf/dv7A
t0Z43xqQ8FVJ9HnYKY0T6d5W30L31bz5EZvhTQsa+IzfrQeBXGu1ecXM6dSlkcpf
KkJQdyLZ2W72roq+RjF5eOsLmlW9+Xyk7pMSn403oMlMY5EpJByAO8znomq0XEkq
UWPqfzjF2ptXGt6JqPdXx2La3w/jd+GNpHFsA65xZlcgYls/LXyq6483jDz3qPUS
L6WZJZh5BrE4yfmIcTh8LUdiVj7fzlZs3r7CKmD8pv3mtsLpqAZGNiFdK8uMuerp
h+2rPreMxGN7AqN28xdo5WOhqCAersoJQuwz3yQcGtXqnqcVTCBUCoaDpFxExlIK
BHKuGW6awyd1akgKz46aWjlDnWuJ94ZY90tkKPXtSe2XhMZHtq5gYzxpv6qEEFo4
ikDpxyaoDMK7GOdUW0FGY9ZSELWjuPSIwjip/5KN5Z51/TaUeiOQmhxQJLIHKNY3
SMj+wNJLb+FTdlOPBEqYAu3WPPG9ye73ADudt1N36ELLqFcvjsB1RzqntpogEHXR
T+I2VOTtbMvCPqbKdy5FijOERfjRIfrfXirovboLb/iP8ouhbuH7JHcj2niFshaL
i6MBAB2eTTh9LlNx3B1w/ESQuYJlR4NsHDiGmWQGHAEHw6LaCVT7MDh2fmag+1Jx
vDF2LdcCnRCgP5mSv+ZeJv7MvpeJ84UL3SlkB6iKZyD1+EJMyTB7f7xLbyWZSp+v
To7lqJBxk1PbqcRl9rYX7jdW4b4ztsr8FNxOvw5jxcPGZ0Mc9eb9ln6Nl+hx4PBv
jg4j4emg9uAPqRZn8KgJ1OL+wYE5Lw74mu3CP63pBmRVSl894janSUhKc4Z3ToF2
9kf81jVWudmRrVzQhiYA8vlrbC1Bc3nhlrX0KlF8VdREvptfV9PMbOAZdW96u4Mt
1lbqv2ZNWqxOon7Q3HKOcOo3uNvhv0sYItXSygZx5Z/chmBBRQrrJDCdHUw+WhR8
UGNsSL+Rz2vFeAc/W6jrlw3dId/wK+H36vDW8X4bSY6rVi+HhxZNoAPihUNNFy4=
=o/b5
-----END PGP SIGNATURE-----

If you appreciate my work please consider making a small donation.
BTC:  1LkYiL3RaouKXTUhGcE84XLece31JjnLc3      LTC:  LYtrtYZsVSn5ymhPepcJMo4HnBeeXXVKW9
GPG: AEC1884398647C47413C1C3FB1179EB7347DC10D
Diapolo
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
November 18, 2013, 11:17:47 AM
 #2

I created a pull (not specific to this problem), which uses std::fstream instead of fopen() and such for reading/writing block/undo files.
Perhaps this can help in a way that it works a little different than current code, dunno. I also added somewhat clearer exception error messages.

https://github.com/bitcoin/bitcoin/pull/3277

It's not intended for getting merged into the master branch yet, perhaps it never will, but you can give it a try.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Remember remember the 5th of November
Legendary
*
Offline Offline

Activity: 1862
Merit: 1011

Reverse engineer from time to time


View Profile
November 18, 2013, 12:07:10 PM
 #3

Is it not possible that LevelDB or something else related to the data files is failing silently?

BTC:1AiCRMxgf1ptVQwx6hDuKMu4f7F27QmJC2
Diapolo
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
November 18, 2013, 12:10:36 PM
 #4

Is it not possible that LevelDB or something else related to the data files is failing silently?

I would say that's at least not impossible...

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Valerian77
Sr. Member
****
Offline Offline

Activity: 437
Merit: 255


View Profile
November 18, 2013, 01:00:12 PM
 #5

  • Document how anyone can consistently reproduce the data corruption.
  • Explain why it happens.
  • Write a code fix that is acceptable to the Bitcoin core developers and merged into Bitcoin git master.

Please refer to my posting: https://bitcointalk.org/index.php?topic=337575.msg3622968#msg3622968

Since I use Windows not IOS the situation may differ slightly. But at least it may be a hint.

If you want to donate me:  1METhkrvz2r9d3zkFPQrHnpFC1BjCs64Zf
donal
Newbie
*
Offline Offline

Activity: 23
Merit: 0


View Profile WWW
November 18, 2013, 10:29:11 PM
 #6

Litecoin wallet was crashing for me, saying DB corruption, if I open terminal and enter

cd /Applications/Litecoin-Qt.app/Contents/MacOS

./Litecoin-Qt -reindex

It works..

These messages are then displayed in terminal,

2013-11-18 19:57:36.821 Litecoin-Qt[991:507] CoreText performance note: Client called CTFontCreateWithName() using name "Arial" and got font with PostScript name "ArialMT". For best performance, only use PostScript names when calling this API.

2013-11-18 19:57:36.821 Litecoin-Qt[991:507] CoreText performance note: Set a breakpoint on CTFontLogSuboptimalRequest to debug.

2013-11-18 19:57:37.657 Litecoin-Qt[991:507] CoreText performance note: Client called CTFontCreateWithName() using name "Courier New" and got font with PostScript name "CourierNewPSMT". For best performance, only use PostScript names when calling this API.
behindtext
Full Member
***
Offline Offline

Activity: 121
Merit: 103


View Profile WWW
November 19, 2013, 02:26:39 AM
 #7

it is funny to see this considering that marco just penned a blog entry

https://blog.conformal.com/deslugging-in-go-with-pprof-btcd/

about how bitcoind uses leveldb vs what we do in btcd. to quote

"Dealing with corrupt journals/flat-file/database is not only complex it has the potential of a very negative user experience. If corruption of any sort is detected then the database components must be validated, this is inherent to the its size a very long operation."

apparently when using flat file storage for blocks and referencing by offset versus storing the entire block in leveldb, there are lots of unsavory ways for leveldb to fail.

leveldb is a harsh mistress.

Bismarck
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
November 19, 2013, 02:27:23 AM
 #8

I'd like to point everyone's attention to this thread on the LiteCoin forums --

https://forum.litecoin.net/index.php/topic,7147.msg55666.html#msg55666

I have an LTC wallet that doesn't play well with others.  I have no problems being someone's guinea pig as I'd really like to get it working again on my laptop.  

For the new post; I DO have TimeMachine enabled.  

Just for consistency;

Here is the error that Litecoin-Qt keeps throwing;

Code:
Last login: Mon Nov 18 18:27:48 on ttys000
Bismarcks-MacBook-Pro-2:~ Bismarcks$ /Applications/Litecoin-Qt.app/Contents/MacOS/Litecoin-Qt ; exit;
2013-11-18 18:32:21.744 Litecoin-Qt[12289:507] CoreText performance note: Client called CTFontCreateWithName() using name "Arial" and got font with PostScript name "ArialMT". For best performance, only use PostScript names when calling this API.
2013-11-18 18:32:21.745 Litecoin-Qt[12289:507] CoreText performance note: Set a breakpoint on CTFontLogSuboptimalRequest to debug.
2013-11-18 18:32:21.748 Litecoin-Qt[12289:507] *** WARNING: Method userSpaceScaleFactor in class NSView is deprecated on 10.7 and later. It should not be used in new applications. Use convertRectToBacking: instead.
2013-11-18 18:32:27.518 Litecoin-Qt[12289:507] CoreText performance note: Client called CTFontCreateWithName() using name "Courier New" and got font with PostScript name "CourierNewPSMT". For best performance, only use PostScript names when calling this API.
Assertion failed: (pindexFirst), function GetNextWorkRequired, file ../litecoin/src/main.cpp, line 1149.
Abort trap: 6
logout

[Process completed]

donal
Newbie
*
Offline Offline

Activity: 23
Merit: 0


View Profile WWW
November 19, 2013, 05:16:36 PM
 #9

wtogami,

I can confirm that it has nothing to do with time machine, I do not have time machine.
wtogami (OP)
Sr. Member
****
Offline Offline

Activity: 263
Merit: 250



View Profile
November 19, 2013, 09:40:53 PM
 #10

wtogami,

I can confirm that it has nothing to do with time machine, I do not have time machine.

What version exactly are you running?  There have been multiple fixes.  Please verify specifically with Bitcoin 0.8.5 OMG3.

If you appreciate my work please consider making a small donation.
BTC:  1LkYiL3RaouKXTUhGcE84XLece31JjnLc3      LTC:  LYtrtYZsVSn5ymhPepcJMo4HnBeeXXVKW9
GPG: AEC1884398647C47413C1C3FB1179EB7347DC10D
whault
Newbie
*
Offline Offline

Activity: 16
Merit: 0


View Profile
November 20, 2013, 12:40:38 AM
 #11

Some observations. My setup uses two drives, one with the OS and a lower speed one for general storage. I don't use time machine like the poster above, and there's nothing else non-standard about my software.

  • only the blockchain stored on the internal SSD boot disk gets corrupted, a blockchain stored on the second SATA HDD is never corrupted
  • corruption seems to happen most often after a system sleep (deep or not), though not always
  • corruption can happen during the initial sync if it is stopped and then restarted
  • corruption can happen with FileVault 2 turned on and off
  • has happened less often since updating to 10.9 only twice so far instead of every few days, though it could just be chance

That's it really. No other behaviour is specific to corruptions for me. Sometimes they happen twice in a day, sometimes not for weeks.
moderate
Member
**
Offline Offline

Activity: 98
Merit: 10

nearly dead


View Profile
November 20, 2013, 02:42:39 AM
 #12

If anything, this should serve as a warning for picking up cool new shiny things.

I take there was some discussion about why picking LevelDB was the right choice, surely it wasn't considered only because it performs faster than BDB and is developed at Google ? After that surely there were some good testing in various systems, since this is a very new low level storage, yes ?

I'm just mocking here, obviously. Good luck finding and fixing the issues.
wtogami (OP)
Sr. Member
****
Offline Offline

Activity: 263
Merit: 250



View Profile
November 20, 2013, 03:15:40 AM
 #13

If anything, this should serve as a warning for picking up cool new shiny things.

I take there was some discussion about why picking LevelDB was the right choice, surely it wasn't considered only because it performs faster than BDB and is developed at Google ? After that surely there were some good testing in various systems, since this is a very new low level storage, yes ?

I'm just mocking here, obviously. Good luck finding and fixing the issues.

It's working quite well on Linux and Windows.  Also the old BDB corrupted on all platforms, although less often than Mac users experience this current issue.

If you appreciate my work please consider making a small donation.
BTC:  1LkYiL3RaouKXTUhGcE84XLece31JjnLc3      LTC:  LYtrtYZsVSn5ymhPepcJMo4HnBeeXXVKW9
GPG: AEC1884398647C47413C1C3FB1179EB7347DC10D
behindtext
Full Member
***
Offline Offline

Activity: 121
Merit: 103


View Profile WWW
November 20, 2013, 11:04:03 AM
 #14

If anything, this should serve as a warning for picking up cool new shiny things.

I take there was some discussion about why picking LevelDB was the right choice, surely it wasn't considered only because it performs faster than BDB and is developed at Google ? After that surely there were some good testing in various systems, since this is a very new low level storage, yes ?

the motivation for using leveldb vs other dbs is due to the fact that with large numbers of records, e.g. over roughly 10 mln records, most "normal" dbs start to get really sluggish on inserts and selects. you can see the behavior for yourself by stuffing a ton of records in sqlite, mysql, psql, etc.

leveldb is not so much a db as a key-value store, which means that insert speed can be maintained even when there are a massive number of records, e.g. 250 mln. this is where the "level" in leveldb comes from - it load levels on inserts. the only price you pay for the load leveling is episodic compaction by leveldb. however, when doing selects/lookups on data that is already in leveldb, you must do several seeks, similar to more common databases.

the likely reason leveldb was chosen is that there aren't a ton of great choices for key-value stores. many of the key-value stores besides leveldb have only a few devs and may not be actively maintained. there are also many key-value stores that have questionable data integrity. using a dependency that goes unmaintained means having to change that dep out later, a giant PITA.

the reason the issue that is cited in this thread is so nasty is that not only does bitcoind use leveldb, it uses it in conjunction with flat file storage for the blocks. the act of storing data in flat files and referencing them in the db substantially increases the number and severity of error and failure paths in the combined structure (leveldb + flat file storage). as we can now see, hunting these bugs is very difficult.

perhaps something can be inferred from the way in which leveldb + blocks are corrupted. this would require a dev looking at the db and blocks after they have been hosed.

Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1134


View Profile
November 20, 2013, 01:15:57 PM
 #15

You can blame me for LevelDB. We switched to it because it was a large (>2x) speedup over BDB and performance is critical for Bitcoin, for obvious reasons. Also BDB sucks in lots of different ways and LevelDB is very well written.

We already know Apple have made some .... questionable ... decisions in their kernel, with regard to fsync (hint: fsync doesn't). That was at least one source of corruptions, which we already fixed.

Given that rather astonishing approach to data integrity there may well be other equally questionable decisions lurking under the covers. The fact that this only happens on MacOS and not any other platform is strongly indicative that Apple have done more than one bad thing.

I am wondering if there is something going wrong with mmap.

https://code.google.com/p/leveldb/issues/detail?id=196

The behaviour of mmap seems like it can sometimes be broken by kernel developers in subtle ways, I got a bug report for the Android app a few months ago which strongly implies mmap on Motorola devices is broken in ways that can cause data corruption. I wonder if POSIX specifies its behaviour tightly enough.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4270
Merit: 8805



View Profile WWW
November 20, 2013, 11:40:41 PM
 #16

Can we get a couple of useful bits of data for someone to work on this:

* Earliest confirmed version of 10.8 with the problem
* A sample of a corrupted DB
* console logs from *during time of corruption* including dmesg and system.log
* Information on how bitcoin built/installed, clang? gcc42? macports/brew for deps?
* if the people experiencing the problem have filevault (FDE) turned on or not, whether it was turned on during the install or after, and if it's ever been cycled on/off
* also whether people who have hit this are using stock fs settings or if have case-sensitivity/etc turned on
italoarmstrong
Newbie
*
Offline Offline

Activity: 59
Merit: 0


View Profile
November 21, 2013, 01:19:00 AM
 #17

+1 on that... give me some kind of log to start.

I have a possible repro (and potentially solution) on OS X for a db corruption... not sure if its the same issue however.
Diapolo
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
November 21, 2013, 04:08:59 PM
 #18

What filesystems are in use on Mac? And did anyone try my std::stream branch Wink?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
moderate
Member
**
Offline Offline

Activity: 98
Merit: 10

nearly dead


View Profile
November 21, 2013, 10:21:56 PM
 #19

This is on Litecoin 0.8.5.2-rc5 (same as Bitcoin 0.8.5 OMG3) running on MacOS 10.6.8 where it does not corrupt itself.

Does not corrupt or you cannot get it corrupted ? If the latter, then you solved the first step in this bounty but didn't announce it ?
wtogami (OP)
Sr. Member
****
Offline Offline

Activity: 263
Merit: 250



View Profile
November 21, 2013, 10:23:45 PM
 #20

This is on Litecoin 0.8.5.2-rc5 (same as Bitcoin 0.8.5 OMG3) running on MacOS 10.6.8 where it does not corrupt itself.

Does not corrupt or you cannot get it corrupted ? If the latter, then you solved the first step in this bounty but didn't announce it ?

MacOS X 10.6.8 does not seem to corrupt with native Bitcoin-Qt as far as I can tell, so this test doesn't tell us anything.  I am only pointing out that it is possible.

If you appreciate my work please consider making a small donation.
BTC:  1LkYiL3RaouKXTUhGcE84XLece31JjnLc3      LTC:  LYtrtYZsVSn5ymhPepcJMo4HnBeeXXVKW9
GPG: AEC1884398647C47413C1C3FB1179EB7347DC10D
Pages: [1] 2 3 4 5 6 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!