mndrix (OP)
Michael Hendricks
VIP
Sr. Member
Offline
Activity: 447
Merit: 258
|
|
March 24, 2011, 09:01:25 PM |
|
While operating CoinPal, I've had the bitcoin daemon hang several times. The behavior has been the same each time. RPC calls timeout without response. I restart the daemon, it catches up with the blockchain and works correctly again for several hours or days before it happens again. Here's the information I have. When I notice the daemon has hung, the tail of debug.log always looks about like this. I can watch the log indefinitely and see only similar messages streaming by as normal: IRC got join IRC got join AddAddress() IRC got new address IRC got join IRC got join
If I look backwards in the debug log to the last activity not related to addresses and IRC, I usually get something similar to this: IRC got join received: inv (37 bytes) got inventory: tx 1d95d66a217e5fbe49bd new askfor tx 1d95d66a217e5fbe49bd 0 sending getdata: tx 1d95d66a217e5fbe49bd sending: getdata (37 bytes) received: inv (37 bytes) got inventory: tx 1d95d66a217e5fbe49bd new askfor tx 1d95d66a217e5fbe49bd 1300914187000000 received: inv (37 bytes) got inventory: tx 1d95d66a217e5fbe49bd new askfor tx 1d95d66a217e5fbe49bd 1300914307000000 received: tx (617 bytes) ThreadRPCServer method=sendfrom IRC got join
That last "sendfrom" request never sends a response. When the daemon hangs again, what information should I collect so that developers can diagnose the problem?
|
|
|
|
jgarzik
Legendary
Offline
Activity: 1596
Merit: 1099
|
|
March 24, 2011, 09:09:17 PM |
|
Have you played around with -rpctimeout ?
|
Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own. Visit bloq.com / metronome.io Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
|
|
|
ArtForz
|
|
March 25, 2011, 08:42:49 AM |
|
I think we got a deadlock in there...
rpc: sendfrom CRITICAL_BLOCK(cs_mapWallet) SendMoneyToBitcoinAddress(strAddress, nAmount, wtx) SendMoney(scriptPubKey, nValue, wtxNew, fAskFee) CRITICAL_BLOCK(cs_main) ...
processmessages: CRITICAL_BLOCK(cs_main) ProcessMessage(pfrom, strCommand, vMsg) AddToWalletIfMine() AddToWallet(wtx) CRITICAL_BLOCK(cs_mapWallet)
|
bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
|
|
|
Mike Hearn
Legendary
Offline
Activity: 1526
Merit: 1134
|
|
March 25, 2011, 11:35:48 AM |
|
Oops. Should RPCs be run with the BFL held?
|
|
|
|
Gavin Andresen
Legendary
Offline
Activity: 1652
Merit: 2300
Chief Scientist
|
|
March 25, 2011, 12:35:19 PM |
|
Oops. Should RPCs be run with the BFL held?
D'oh! sendfrom should definitely CRITICAL_BLOCK(cs_main). Nice catch ArtForz.
|
How often do you get the chance to work on a potentially world-changing project?
|
|
|
mikegogulski
|
|
March 25, 2011, 12:46:11 PM |
|
ArtForz, you've got BTC 5.00 incoming from me for spotting this. Very well done.
|
|
|
|
Gavin Andresen
Legendary
Offline
Activity: 1652
Merit: 2300
Chief Scientist
|
|
March 25, 2011, 01:01:00 PM |
|
Does anybody have experience with valgrind -helgrind or other automated tools for finding potential deadlocks?
Running it on bitcoind I'm getting a huge number of false positives...
Should we just document every method that holds one or more locks? I'm worried there are other possible deadlocks lurking.
|
How often do you get the chance to work on a potentially world-changing project?
|
|
|
ShadowOfHarbringer
Legendary
Offline
Activity: 1470
Merit: 1006
Bringing Legendary Har® to you since 1952
|
|
March 25, 2011, 01:35:26 PM |
|
Oops. Should RPCs be run with the BFL held?
D'oh! sendfrom should definitely CRITICAL_BLOCK(cs_main). Nice catch ArtForz. For which version will the patch be scheduled for ?
|
|
|
|
ArtForz
|
|
March 25, 2011, 01:52:10 PM |
|
well, quick manual check suggests for cs_main + cs_mapWallet only rpc.cpp sendfrom and sendmany are doing the wrong thing.
|
bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
|
|
|
mikegogulski
|
|
March 25, 2011, 01:55:38 PM |
|
@Gavin: Document? Always a good thing. This is tricky stuff, as ArtForz has shown. My own experience goes like: 1: If you don't really have to lock, push into a serial action queue; 2: when you really do have to lock, prepare everything beforehand, then lock, alter and unlock as swiftly as possible; and 3: er, yeh, document, at least so that you can recall what the heck you were up to when you decided you needed that lock.
Obviously, this becomes real hard when we're dealing with what are essentially library primitives for manipulating the dataset.
If I were sober at the moment I'd produced a precompiler macro that would flag potential nested locks in the control flow. Fortunately, I'm not sober.
|
|
|
|
ArtForz
|
|
March 25, 2011, 02:26:12 PM |
|
Another one setaccount CRITICAL_BLOCK(cs_mapAddressBook) GetAccountAddress(strOldAccount) CRITICAL_BLOCK(cs_mapWallet)
processmessages: CRITICAL_BLOCK(cs_main) ProcessMessage(pfrom, strCommand, vMsg) AddToWalletIfMine() AddToWallet(wtx) CRITICAL_BLOCK(cs_mapWallet) walletdb.WriteName(PubKeyToAddress(vchDefaultKey), "") CRITICAL_BLOCK(cs_mapAddressBook)
|
bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
|
|
|
mndrix (OP)
Michael Hendricks
VIP
Sr. Member
Offline
Activity: 447
Merit: 258
|
|
March 25, 2011, 02:28:24 PM |
|
Well done. Let me know when a patch makes it into a beta/nightly build and I'll run it in production to test.
|
|
|
|
|
slush
Legendary
Offline
Activity: 1386
Merit: 1097
|
|
April 02, 2011, 06:47:29 PM |
|
Today I had similar problems as mndrix had; bitcoind freezed during payouts. It was second time in pool history, but firstly with sendmany command.
mndrix, did you succesfully tested jgarzik's patch?
|
|
|
|
mndrix (OP)
Michael Hendricks
VIP
Sr. Member
Offline
Activity: 447
Merit: 258
|
|
April 11, 2011, 03:15:15 PM |
|
mndrix, did you succesfully tested jgarzik's patch?
I haven't tested the patch. My feeble attempts to compile Bitcoin from source have failed (speaks to my ignorance not a problem with Bitcoin). Does anyone know if the patch is available in a release candidate build for Linux yet?
|
|
|
|
slush
Legendary
Offline
Activity: 1386
Merit: 1097
|
|
April 11, 2011, 09:17:42 PM |
|
this patch is already in bitcoin upstream, it looks like more people watched it . I'll try to use that in pool tomorrow...
|
|
|
|
mndrix (OP)
Michael Hendricks
VIP
Sr. Member
Offline
Activity: 447
Merit: 258
|
|
April 15, 2011, 05:16:49 PM |
|
Coin{Pal,Card} are now running a nightly build including the deadlock changes. I'll report here if bitcoind hangs again.
|
|
|
|
|
Stephen Gornick
Legendary
Offline
Activity: 2506
Merit: 1010
|
|
April 19, 2011, 06:21:32 AM |
|
Coin{Pal,Card} are now running a nightly build including the deadlock changes. I'll report here if bitcoind hangs again.
Was CoinPal's April 18th service issue related to this? Your post mentioned "I've restarted some server components and the site appears to be working fine now".
|
|
|
|
mndrix (OP)
Michael Hendricks
VIP
Sr. Member
Offline
Activity: 447
Merit: 258
|
|
April 22, 2011, 02:28:52 PM |
|
Coin{Pal,Card} are now running a nightly build including the deadlock changes. I'll report here if bitcoind hangs again.
Was CoinPal's April 18th service issue related to this? Your post mentioned "I've restarted some server components and the site appears to be working fine now". I'm glad you brought this up sgornick. I should have mentioned it here. That particular outage was caused by an error in my code causing it to leak open file handles. It wasn't related to bitcoind. Since upgrading to a nightly build on April 15th, I haven't had any problems with bitcoind hanging. I almost certainly would have seen one by now if the problem were still present. Thanks all for your help diagnosing and fixing the bug.
|
|
|
|
|