wtogami (OP)
|
|
November 18, 2013, 03:26:40 AM Last edit: December 29, 2013, 04:47:55 AM by wtogami |
|
FIXED.Can you fix the MacOS X Bitcoin LevelDB data corruption issue? https://bitcointalk.org/index.php?topic=337294.msg3718821#msg3718821 TEST THESE BUILDS NOW!Bounty Funding: 10.00 BTC + 200.2 LTCGavin Andresen has pledged 5 BTC. BitcoinTalk pledged 4 BTC. Public donations have contributed 1 BTC. Litecoin Dev Team pledges 200 LTC. The public is encouraged to contribute to these addresses to increase the incentive to fix this sooner. ConditionsThe bounty may be awarded under the following conditions. - Document how anyone can consistently reproduce the data corruption.
- Explain why it happens.
- Write a code fix that is acceptable to the Bitcoin core developers and merged into Bitcoin.
The Bitcoin developers have ultimate deciding power of how to apportion the bounty award(s) based upon the merit of the contributions This may encourage collaboration that may lead to a fix rather than hoarding of information. Non-developers may be able to figure out #1. These terms may be changed at any time for any or no reason. Backgroundhttps://github.com/bitcoin/bitcoin/issues/2770Since Bitcoin 0.8.x and the introduction of LevelDB, MacOS X users have been experiencing periodic LevelDB data corruption. For some Mac users it has never happened, while for others it happens frequently. https://github.com/bitcoin/bitcoin/pull/2916https://github.com/bitcoin/bitcoin/pull/3000https://github.com/bitcoin/bitcoin/pull/2933Bitcoin master now contains two Mac-specific fsync patches and an upgrade to LevelDB 1.13. Bitcoin 0.8.5 OMG3 and Litecoin 0.8.5.2-rc5 contains these same patches. It is possible that a different Mac corruption issue was solved by these earlier patches, but users of these branches have reported continued corruption. Curiously, corruption seems to happen after a clean shutdown and restart of the client. All corruption reports seem to be from MacOS X 10.8.x and 10.9 users. It is unclear if earlier versions of MacOS X are affected. It is unknown if particular hardware or software configurations are involved. https://github.com/bitcoin/bitcoin/issues/2785Corruption with the same error message apparently is capable of fixing itself. Not clear if this is true of the recent master branch. GPG Signed message-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
https://bitcointalk.org/index.php?topic=337294 These addresses contain public donations to be added to the Bitcoin MacOS X corruption fix bounty. BTC: 1FZ1mSJXj8aJqdpwUcpigLBqJLwtTu46fA LTC: LS1Rb3bb29TA9PEVGR64bV2cLxC7RdQi8A -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux)
iQQcBAEBCAAGBQJSiaeFAAoJELEXnrc0fcENmRsf/3c/w53R2EHX62L+QimS96Rj J+GPSpVQQRFOFr19OM+efjC1ydoZ3N/suYI1FynQ9nX4RzmCW5ZwbxMtl6wnEw7h oIqv+ufnD0XEpkFr+g32JdoRNN2KprrMH4Cr2oLI0w+Oqv32jLveoRIqSzIArCId U9ZVPcvFvKa9hWJrnM9KJQW6NgsGsKW3WBk5n/Wcbp4PYUn9ZC0taRMq2NbakSwk RaNf6yFSC1wWb2dD6eE+1UiXBCidyK0cVUMkjCRoA0eRqZqy2cJwELmOrJ1RHlgP 6K9Y6MuelTPxhXNa/NNq/sVAbhOmtAeyJ5ApuTuvjd1gpKpS14bFEHY7yFf/dv7A t0Z43xqQ8FVJ9HnYKY0T6d5W30L31bz5EZvhTQsa+IzfrQeBXGu1ecXM6dSlkcpf KkJQdyLZ2W72roq+RjF5eOsLmlW9+Xyk7pMSn403oMlMY5EpJByAO8znomq0XEkq UWPqfzjF2ptXGt6JqPdXx2La3w/jd+GNpHFsA65xZlcgYls/LXyq6483jDz3qPUS L6WZJZh5BrE4yfmIcTh8LUdiVj7fzlZs3r7CKmD8pv3mtsLpqAZGNiFdK8uMuerp h+2rPreMxGN7AqN28xdo5WOhqCAersoJQuwz3yQcGtXqnqcVTCBUCoaDpFxExlIK BHKuGW6awyd1akgKz46aWjlDnWuJ94ZY90tkKPXtSe2XhMZHtq5gYzxpv6qEEFo4 ikDpxyaoDMK7GOdUW0FGY9ZSELWjuPSIwjip/5KN5Z51/TaUeiOQmhxQJLIHKNY3 SMj+wNJLb+FTdlOPBEqYAu3WPPG9ye73ADudt1N36ELLqFcvjsB1RzqntpogEHXR T+I2VOTtbMvCPqbKdy5FijOERfjRIfrfXirovboLb/iP8ouhbuH7JHcj2niFshaL i6MBAB2eTTh9LlNx3B1w/ESQuYJlR4NsHDiGmWQGHAEHw6LaCVT7MDh2fmag+1Jx vDF2LdcCnRCgP5mSv+ZeJv7MvpeJ84UL3SlkB6iKZyD1+EJMyTB7f7xLbyWZSp+v To7lqJBxk1PbqcRl9rYX7jdW4b4ztsr8FNxOvw5jxcPGZ0Mc9eb9ln6Nl+hx4PBv jg4j4emg9uAPqRZn8KgJ1OL+wYE5Lw74mu3CP63pBmRVSl894janSUhKc4Z3ToF2 9kf81jVWudmRrVzQhiYA8vlrbC1Bc3nhlrX0KlF8VdREvptfV9PMbOAZdW96u4Mt 1lbqv2ZNWqxOon7Q3HKOcOo3uNvhv0sYItXSygZx5Z/chmBBRQrrJDCdHUw+WhR8 UGNsSL+Rz2vFeAc/W6jrlw3dId/wK+H36vDW8X4bSY6rVi+HhxZNoAPihUNNFy4= =o/b5 -----END PGP SIGNATURE-----
|
If you appreciate my work please consider making a small donation. BTC: 1LkYiL3RaouKXTUhGcE84XLece31JjnLc3 LTC: LYtrtYZsVSn5ymhPepcJMo4HnBeeXXVKW9 GPG: AEC1884398647C47413C1C3FB1179EB7347DC10D
|
|
|
Diapolo
|
|
November 18, 2013, 11:17:47 AM |
|
I created a pull (not specific to this problem), which uses std::fstream instead of fopen() and such for reading/writing block/undo files. Perhaps this can help in a way that it works a little different than current code, dunno. I also added somewhat clearer exception error messages. https://github.com/bitcoin/bitcoin/pull/3277It's not intended for getting merged into the master branch yet, perhaps it never will, but you can give it a try. Dia
|
|
|
|
Remember remember the 5th of November
Legendary
Offline
Activity: 1862
Merit: 1011
Reverse engineer from time to time
|
|
November 18, 2013, 12:07:10 PM |
|
Is it not possible that LevelDB or something else related to the data files is failing silently?
|
BTC:1AiCRMxgf1ptVQwx6hDuKMu4f7F27QmJC2
|
|
|
Diapolo
|
|
November 18, 2013, 12:10:36 PM |
|
Is it not possible that LevelDB or something else related to the data files is failing silently?
I would say that's at least not impossible... Dia
|
|
|
|
Valerian77
|
|
November 18, 2013, 01:00:12 PM |
|
- Document how anyone can consistently reproduce the data corruption.
- Explain why it happens.
- Write a code fix that is acceptable to the Bitcoin core developers and merged into Bitcoin git master.
Please refer to my posting: https://bitcointalk.org/index.php?topic=337575.msg3622968#msg3622968Since I use Windows not IOS the situation may differ slightly. But at least it may be a hint. If you want to donate me: 1METhkrvz2r9d3zkFPQrHnpFC1BjCs64Zf
|
|
|
|
donal
Newbie
Offline
Activity: 23
Merit: 0
|
|
November 18, 2013, 10:29:11 PM |
|
Litecoin wallet was crashing for me, saying DB corruption, if I open terminal and enter
cd /Applications/Litecoin-Qt.app/Contents/MacOS
./Litecoin-Qt -reindex
It works..
These messages are then displayed in terminal,
2013-11-18 19:57:36.821 Litecoin-Qt[991:507] CoreText performance note: Client called CTFontCreateWithName() using name "Arial" and got font with PostScript name "ArialMT". For best performance, only use PostScript names when calling this API.
2013-11-18 19:57:36.821 Litecoin-Qt[991:507] CoreText performance note: Set a breakpoint on CTFontLogSuboptimalRequest to debug.
2013-11-18 19:57:37.657 Litecoin-Qt[991:507] CoreText performance note: Client called CTFontCreateWithName() using name "Courier New" and got font with PostScript name "CourierNewPSMT". For best performance, only use PostScript names when calling this API.
|
|
|
|
behindtext
|
|
November 19, 2013, 02:26:39 AM |
|
it is funny to see this considering that marco just penned a blog entry https://blog.conformal.com/deslugging-in-go-with-pprof-btcd/about how bitcoind uses leveldb vs what we do in btcd. to quote "Dealing with corrupt journals/flat-file/database is not only complex it has the potential of a very negative user experience. If corruption of any sort is detected then the database components must be validated, this is inherent to the its size a very long operation." apparently when using flat file storage for blocks and referencing by offset versus storing the entire block in leveldb, there are lots of unsavory ways for leveldb to fail. leveldb is a harsh mistress.
|
|
|
|
Bismarck
Newbie
Offline
Activity: 14
Merit: 0
|
|
November 19, 2013, 02:27:23 AM |
|
I'd like to point everyone's attention to this thread on the LiteCoin forums -- https://forum.litecoin.net/index.php/topic,7147.msg55666.html#msg55666I have an LTC wallet that doesn't play well with others. I have no problems being someone's guinea pig as I'd really like to get it working again on my laptop. For the new post; I DO have TimeMachine enabled. Just for consistency; Here is the error that Litecoin-Qt keeps throwing; Last login: Mon Nov 18 18:27:48 on ttys000 Bismarcks-MacBook-Pro-2:~ Bismarcks$ /Applications/Litecoin-Qt.app/Contents/MacOS/Litecoin-Qt ; exit; 2013-11-18 18:32:21.744 Litecoin-Qt[12289:507] CoreText performance note: Client called CTFontCreateWithName() using name "Arial" and got font with PostScript name "ArialMT". For best performance, only use PostScript names when calling this API. 2013-11-18 18:32:21.745 Litecoin-Qt[12289:507] CoreText performance note: Set a breakpoint on CTFontLogSuboptimalRequest to debug. 2013-11-18 18:32:21.748 Litecoin-Qt[12289:507] *** WARNING: Method userSpaceScaleFactor in class NSView is deprecated on 10.7 and later. It should not be used in new applications. Use convertRectToBacking: instead. 2013-11-18 18:32:27.518 Litecoin-Qt[12289:507] CoreText performance note: Client called CTFontCreateWithName() using name "Courier New" and got font with PostScript name "CourierNewPSMT". For best performance, only use PostScript names when calling this API. Assertion failed: (pindexFirst), function GetNextWorkRequired, file ../litecoin/src/main.cpp, line 1149. Abort trap: 6 logout
[Process completed]
|
|
|
|
donal
Newbie
Offline
Activity: 23
Merit: 0
|
|
November 19, 2013, 05:16:36 PM |
|
wtogami,
I can confirm that it has nothing to do with time machine, I do not have time machine.
|
|
|
|
wtogami (OP)
|
|
November 19, 2013, 09:40:53 PM |
|
wtogami,
I can confirm that it has nothing to do with time machine, I do not have time machine.
What version exactly are you running? There have been multiple fixes. Please verify specifically with Bitcoin 0.8.5 OMG3.
|
If you appreciate my work please consider making a small donation. BTC: 1LkYiL3RaouKXTUhGcE84XLece31JjnLc3 LTC: LYtrtYZsVSn5ymhPepcJMo4HnBeeXXVKW9 GPG: AEC1884398647C47413C1C3FB1179EB7347DC10D
|
|
|
whault
Newbie
Offline
Activity: 16
Merit: 0
|
|
November 20, 2013, 12:40:38 AM |
|
Some observations. My setup uses two drives, one with the OS and a lower speed one for general storage. I don't use time machine like the poster above, and there's nothing else non-standard about my software. - only the blockchain stored on the internal SSD boot disk gets corrupted, a blockchain stored on the second SATA HDD is never corrupted
- corruption seems to happen most often after a system sleep (deep or not), though not always
- corruption can happen during the initial sync if it is stopped and then restarted
- corruption can happen with FileVault 2 turned on and off
- has happened less often since updating to 10.9 only twice so far instead of every few days, though it could just be chance
That's it really. No other behaviour is specific to corruptions for me. Sometimes they happen twice in a day, sometimes not for weeks.
|
|
|
|
moderate
Member
Offline
Activity: 98
Merit: 10
nearly dead
|
|
November 20, 2013, 02:42:39 AM |
|
If anything, this should serve as a warning for picking up cool new shiny things.
I take there was some discussion about why picking LevelDB was the right choice, surely it wasn't considered only because it performs faster than BDB and is developed at Google ? After that surely there were some good testing in various systems, since this is a very new low level storage, yes ?
I'm just mocking here, obviously. Good luck finding and fixing the issues.
|
|
|
|
wtogami (OP)
|
|
November 20, 2013, 03:15:40 AM |
|
If anything, this should serve as a warning for picking up cool new shiny things.
I take there was some discussion about why picking LevelDB was the right choice, surely it wasn't considered only because it performs faster than BDB and is developed at Google ? After that surely there were some good testing in various systems, since this is a very new low level storage, yes ?
I'm just mocking here, obviously. Good luck finding and fixing the issues.
It's working quite well on Linux and Windows. Also the old BDB corrupted on all platforms, although less often than Mac users experience this current issue.
|
If you appreciate my work please consider making a small donation. BTC: 1LkYiL3RaouKXTUhGcE84XLece31JjnLc3 LTC: LYtrtYZsVSn5ymhPepcJMo4HnBeeXXVKW9 GPG: AEC1884398647C47413C1C3FB1179EB7347DC10D
|
|
|
behindtext
|
|
November 20, 2013, 11:04:03 AM |
|
If anything, this should serve as a warning for picking up cool new shiny things.
I take there was some discussion about why picking LevelDB was the right choice, surely it wasn't considered only because it performs faster than BDB and is developed at Google ? After that surely there were some good testing in various systems, since this is a very new low level storage, yes ?
the motivation for using leveldb vs other dbs is due to the fact that with large numbers of records, e.g. over roughly 10 mln records, most "normal" dbs start to get really sluggish on inserts and selects. you can see the behavior for yourself by stuffing a ton of records in sqlite, mysql, psql, etc. leveldb is not so much a db as a key-value store, which means that insert speed can be maintained even when there are a massive number of records, e.g. 250 mln. this is where the "level" in leveldb comes from - it load levels on inserts. the only price you pay for the load leveling is episodic compaction by leveldb. however, when doing selects/lookups on data that is already in leveldb, you must do several seeks, similar to more common databases. the likely reason leveldb was chosen is that there aren't a ton of great choices for key-value stores. many of the key-value stores besides leveldb have only a few devs and may not be actively maintained. there are also many key-value stores that have questionable data integrity. using a dependency that goes unmaintained means having to change that dep out later, a giant PITA. the reason the issue that is cited in this thread is so nasty is that not only does bitcoind use leveldb, it uses it in conjunction with flat file storage for the blocks. the act of storing data in flat files and referencing them in the db substantially increases the number and severity of error and failure paths in the combined structure (leveldb + flat file storage). as we can now see, hunting these bugs is very difficult. perhaps something can be inferred from the way in which leveldb + blocks are corrupted. this would require a dev looking at the db and blocks after they have been hosed.
|
|
|
|
Mike Hearn
Legendary
Offline
Activity: 1526
Merit: 1134
|
|
November 20, 2013, 01:15:57 PM |
|
You can blame me for LevelDB. We switched to it because it was a large (>2x) speedup over BDB and performance is critical for Bitcoin, for obvious reasons. Also BDB sucks in lots of different ways and LevelDB is very well written. We already know Apple have made some .... questionable ... decisions in their kernel, with regard to fsync (hint: fsync doesn't). That was at least one source of corruptions, which we already fixed. Given that rather astonishing approach to data integrity there may well be other equally questionable decisions lurking under the covers. The fact that this only happens on MacOS and not any other platform is strongly indicative that Apple have done more than one bad thing. I am wondering if there is something going wrong with mmap. https://code.google.com/p/leveldb/issues/detail?id=196The behaviour of mmap seems like it can sometimes be broken by kernel developers in subtle ways, I got a bug report for the Android app a few months ago which strongly implies mmap on Motorola devices is broken in ways that can cause data corruption. I wonder if POSIX specifies its behaviour tightly enough.
|
|
|
|
gmaxwell
Moderator
Legendary
Offline
Activity: 4270
Merit: 8805
|
|
November 20, 2013, 11:40:41 PM |
|
Can we get a couple of useful bits of data for someone to work on this:
* Earliest confirmed version of 10.8 with the problem * A sample of a corrupted DB * console logs from *during time of corruption* including dmesg and system.log * Information on how bitcoin built/installed, clang? gcc42? macports/brew for deps? * if the people experiencing the problem have filevault (FDE) turned on or not, whether it was turned on during the install or after, and if it's ever been cycled on/off * also whether people who have hit this are using stock fs settings or if have case-sensitivity/etc turned on
|
|
|
|
italoarmstrong
Newbie
Offline
Activity: 59
Merit: 0
|
|
November 21, 2013, 01:19:00 AM |
|
+1 on that... give me some kind of log to start.
I have a possible repro (and potentially solution) on OS X for a db corruption... not sure if its the same issue however.
|
|
|
|
Diapolo
|
|
November 21, 2013, 04:08:59 PM |
|
What filesystems are in use on Mac? And did anyone try my std::stream branch ? Dia
|
|
|
|
moderate
Member
Offline
Activity: 98
Merit: 10
nearly dead
|
|
November 21, 2013, 10:21:56 PM |
|
This is on Litecoin 0.8.5.2-rc5 (same as Bitcoin 0.8.5 OMG3) running on MacOS 10.6.8 where it does not corrupt itself. Does not corrupt or you cannot get it corrupted ? If the latter, then you solved the first step in this bounty but didn't announce it ?
|
|
|
|
wtogami (OP)
|
|
November 21, 2013, 10:23:45 PM |
|
This is on Litecoin 0.8.5.2-rc5 (same as Bitcoin 0.8.5 OMG3) running on MacOS 10.6.8 where it does not corrupt itself. Does not corrupt or you cannot get it corrupted ? If the latter, then you solved the first step in this bounty but didn't announce it ? MacOS X 10.6.8 does not seem to corrupt with native Bitcoin-Qt as far as I can tell, so this test doesn't tell us anything. I am only pointing out that it is possible.
|
If you appreciate my work please consider making a small donation. BTC: 1LkYiL3RaouKXTUhGcE84XLece31JjnLc3 LTC: LYtrtYZsVSn5ymhPepcJMo4HnBeeXXVKW9 GPG: AEC1884398647C47413C1C3FB1179EB7347DC10D
|
|
|
|