Inaba
Legendary
Offline
Activity: 1260
Merit: 1000
|
|
July 16, 2011, 09:12:45 PM |
|
Well, I tried a fresh copy of .23 and applied the update and diff4, still same problem as somebadgers repository. System will basically stop responding after ~12 hours and bitcoind needs to be restarted and everything picks right back up.
How can I trouble shoot this?
|
If you're searching these lines for a point, you've probably missed it. There was never anything there in the first place.
|
|
|
|
|
|
|
|
"Bitcoin: the cutting edge of begging technology." -- Giraffe.BTC
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
|
JoelKatz
Legendary
Offline
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
|
|
July 16, 2011, 09:15:14 PM |
|
Well, I tried a fresh copy of .23 and applied the update and diff4, still same problem as somebadgers repository. System will basically stop responding after ~12 hours and bitcoind needs to be restarted and everything picks right back up.
What is this update?
|
I am an employee of Ripple. Follow me on Twitter @JoelKatz 1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
|
|
|
Inaba
Legendary
Offline
Activity: 1260
Merit: 1000
|
|
July 16, 2011, 09:19:10 PM |
|
|
If you're searching these lines for a point, you've probably missed it. There was never anything there in the first place.
|
|
|
JoelKatz
Legendary
Offline
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
|
|
July 16, 2011, 09:46:14 PM |
|
Ahh, okay. I think there's a locking bug in those updates. I'm working on two things: 1) A new 4-diff based on 0.3.24 that includes all the updates that are believed to be safe. 2) A new diff based on 0.3.24 that includes even the getwork pre-compute update that is suspected to be responsible for deadlocks, but hopefully with the deadlock issue fixed. 1 should be ready soon. I still haven't done final auditing of the diff and testing to make sure it works. I just put up a preview here: http://davids.webmaster.com/~davids/preview.diff2 may take a bit longer.
|
I am an employee of Ripple. Follow me on Twitter @JoelKatz 1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
|
|
|
Inaba
Legendary
Offline
Activity: 1260
Merit: 1000
|
|
July 16, 2011, 10:10:09 PM |
|
Cool, I will try it out and see what happens. You might consider removing the upnp junk in the diff as well, since it's an extra dependency that is a) a pain in the ass and b) completely useless in this application. I remove it from the makefile before compiling.
Just to clarify, is this preview a potential fix for the lockup issue I'm experiencing or is that part of number 2?
When you say ready soon do you mean today or within a couple days? I don't want to start digging around in the code and making a nuisance of myself if there's going to be some changes within the next few hours as far as that goes.
|
If you're searching these lines for a point, you've probably missed it. There was never anything there in the first place.
|
|
|
JoelKatz
Legendary
Offline
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
|
|
July 16, 2011, 10:12:13 PM |
|
When you say ready soon do you mean today or within a couple days? I don't want to start digging around in the code and making a nuisance of myself if there's going to be some changes within the next few hours as far as that goes.
1 should be ready today. All that's left is a final audit and testing to make sure I can't break it. There are unlikely to be any significant changes and there's a good chance there will be no changes at all. Testing and reports are very helpful, so don't worry about making a nuisance of yourself. "It worked for me" is extremely helpful because it helps me get closer to the confidence level needed for release. "It didn't work for me" is extremely helpful because I hate to find issues after I release.
|
I am an employee of Ripple. Follow me on Twitter @JoelKatz 1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
|
|
|
Inaba
Legendary
Offline
Activity: 1260
Merit: 1000
|
|
July 16, 2011, 10:19:01 PM |
|
Ok, well, with your preview.diff, bitcoind won't start. Here is debug.log errors:
Bitcoin version 0.3.24-beta Default data directory /home/xxx/.bitcoin Bound to port 8333 Loading addresses... dbenv.open strLogDir=/home/xxx/.bitcoin/database strErrorFile=/home/xxx/.bitcoin/db.log Loaded 356873 addresses addresses 1032ms Loading block index... LoadBlockIndex(): hashBestChain=00000000000002b99ddf height=136616 block index 1914ms Loading wallet... nFileVersion = 32400 fGenerateBitcoins = 0 nTransactionFee = 0 addrIncoming = 255.255.255.255:8333 fMinimizeToTray = 0 fMinimizeOnClose = 0 fUseProxy = 0 addrProxy = 127.0.0.1:9050 wallet 118ms Done loading mapBlockIndex.size() = 136636 nBestHeight = 136616 mapKeys.size() = 172 setKeyPool.size() = 101 mapPubKeys.size() = 172 mapWallet.size() = 190 mapAddressBook.size() = 2 Loading addresses from DNS seeds (could take a while) AddAddress(84.49.174.161:8333) 48 addresses found from DNS seeds sending: version (85 bytes) ThreadRPCServer started ipv4 eth0: x.x.x.x addrLocalHost = x.x.x.x:8333 ThreadSocketHandler started ThreadIRCSeed started ThreadOpenConnections started ThreadMessageHandler started trying connection 67.172.181.225:8333 lastseen=-0.1hrs lasttry=-364126.2hrs IRC :irc.lechat.ir NOTICE AUTH :*** Looking up your hostname... connected 67.172.181.225:8333 sending: version (85 bytes)
bitcoind exits at this point
|
If you're searching these lines for a point, you've probably missed it. There was never anything there in the first place.
|
|
|
Inaba
Legendary
Offline
Activity: 1260
Merit: 1000
|
|
July 16, 2011, 10:21:41 PM |
|
I am going to pull a fresh copy of .24 and reapply just to be sure I didn't do something odd - but the patch applied just fine.
EDIT - whoops... I guess you are patching against the tar and not the repo? The above report is off the tar and it applied fine. Just tried against the repo and it didn't like that.
|
If you're searching these lines for a point, you've probably missed it. There was never anything there in the first place.
|
|
|
Inaba
Legendary
Offline
Activity: 1260
Merit: 1000
|
|
July 16, 2011, 10:31:45 PM |
|
Ok, confirmed that a fresh copy of the .24 tar file with your preview gives the above result. Here's another debug.log output from the fresh copy:
Bitcoin version 0.3.24-beta Default data directory /home/xxx/.bitcoin Bound to port 8333 Loading addresses... dbenv.open strLogDir=/home/xxx/.bitcoin/database strErrorFile=/home/xxx/.bitcoin/db.log Loaded 357138 addresses addresses 942ms Loading block index... LoadBlockIndex(): hashBestChain=0000000000000a1fe384 height=136619 block index 1925ms Loading wallet... nFileVersion = 32400 fGenerateBitcoins = 0 nTransactionFee = 0 addrIncoming = 255.255.255.255:8333 fMinimizeToTray = 0 fMinimizeOnClose = 0 fUseProxy = 0 addrProxy = 127.0.0.1:9050 wallet 175ms Done loading mapBlockIndex.size() = 136639 nBestHeight = 136619 mapKeys.size() = 172 setKeyPool.size() = 101 mapPubKeys.size() = 172 mapWallet.size() = 190 mapAddressBook.size() = 2 Loading addresses from DNS seeds (could take a while) AddAddress(62.155.236.249:8333) AddAddress(109.75.176.193:8333) AddAddress(174.120.185.74:8333) AddAddress(69.163.132.101:8333) AddAddress(178.79.147.99:8333) AddAddress(91.85.220.84:8333) 48 addresses found from DNS seeds sending: version (85 bytes) ThreadRPCServer started ipv4 eth0: x.x.x.x addrLocalHost = x.x.x.x:8333 ThreadSocketHandler started ThreadIRCSeed started ThreadOpenConnections started ThreadMessageHandler started IRC :pelican.heliacal.net NOTICE AUTH :*** Looking up your hostname... IRC :pelican.heliacal.net NOTICE AUTH :*** Couldn't look up your hostname IRC SENDING: NICK x265309750^M IRC SENDING: USER x265309750 8 * : x265309750^M trying connection 213.111.82.119:8333 lastseen=-0.2hrs lasttry=-364126.5hrs IRC :pelican.heliacal.net 001 x265309750 :Welcome to the LFNet Internet Relay Chat Network x265309750 IRC :pelican.heliacal.net 002 x265309750 :Your host is pelican.heliacal.net[173.246.103.92/6667], running version hybri d-7.2.3 IRC :pelican.heliacal.net 003 x265309750 :This server was created Jun 28 2011 at 14:26:11 IRC :pelican.heliacal.net 004 x265309750 pelican.heliacal.net hybrid-7.2.3 CDGabcdfgiklnorsuwxyz biklmnopstveI bkloveI connected 213.111.82.119:8333 sending: version (85 bytes)
Bitcoind exits at this point.
|
If you're searching these lines for a point, you've probably missed it. There was never anything there in the first place.
|
|
|
JoelKatz
Legendary
Offline
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
|
|
July 16, 2011, 10:31:49 PM Last edit: July 16, 2011, 10:46:17 PM by JoelKatz |
|
It is against the 0.3.24 release. I just made a small, but very critical, fix. So redownload it. (That's likely the problem in your post above.)
I've finished my own testing and auditing. So if the latest works for you, I'm ready to release it.
Performance is about 3,000 getwork's per second on my test machine (a Core 2 Quad Q9550 running 64-bit Linux).
|
I am an employee of Ripple. Follow me on Twitter @JoelKatz 1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
|
|
|
Inaba
Legendary
Offline
Activity: 1260
Merit: 1000
|
|
July 16, 2011, 10:44:40 PM |
|
That fixed it!
Just to clarify, is this preview a potential fix for the lockup issue I'm experiencing or is that part of number 2?
|
If you're searching these lines for a point, you've probably missed it. There was never anything there in the first place.
|
|
|
JoelKatz
Legendary
Offline
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
|
|
July 16, 2011, 10:46:38 PM |
|
Just to clarify, is this preview a potential fix for the lockup issue I'm experiencing or is that part of number 2?
This does not contain the optimization that I believe is responsible for the lockup issue. That optimization was not contained in 3diff or 4diff but was an additional optimization (introduced in the 'update' patch) specifically to reduce the 'hiccup' in 'getwork' responses when a new transaction hits the network.
|
I am an employee of Ripple. Follow me on Twitter @JoelKatz 1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
|
|
|
backburn
Member
Offline
Activity: 111
Merit: 10
★Trash&Burn [TBC/TXB]★
|
|
July 17, 2011, 06:01:08 AM |
|
Patch at http://davids.webmaster.com/~davids/preview.diff compiles perfectly on the 0.3.24 release Ubuntu x64 10.04 + 10.10. Was working properly with load on test net and I have pushed to live with about 100 GHash. About an hour in, so far so good. Thank you so much for all your hard work Joel! PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19263 pool 20 0 4262m 139m 9396 S 2 2.5 2:24.12 bitcoind-24prev
Yummy!
|
|
|
|
JoelKatz
Legendary
Offline
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
|
|
July 17, 2011, 06:17:53 AM |
|
Patch at http://davids.webmaster.com/~davids/preview.diff compiles perfectly on the 0.3.24 release Ubuntu x64 10.04 + 10.10. Was working properly with load on test net and I have pushed to live with about 100 GHash. About an hour in, so far so good. Thank you so much for all your hard work Joel! Thanks for the success report. The very first preview had a serious bug that would make it crash immediately. The final release had, other than that fix, only cosmetic changes (like fixing extra spaces and such). So you can stick with the build you've got, since it obviously doesn't have the 'makes it crash immediately' bug. Do you know offhand how that CPU usage compares with what you were seeing before?
|
I am an employee of Ripple. Follow me on Twitter @JoelKatz 1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
|
|
|
backburn
Member
Offline
Activity: 111
Merit: 10
★Trash&Burn [TBC/TXB]★
|
|
July 17, 2011, 07:18:12 AM |
|
Do you know offhand how that CPU usage compares with what you were seeing before?
For a basis, this server is using an Intel Q8400. Looks to be around half the usage at "idle". During a LP; it hits about 40-50% usage. Bunches better than stock bitcoind, and about 10% less than the diff-4 v.95.
|
|
|
|
JoelKatz
Legendary
Offline
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
|
|
July 17, 2011, 07:24:44 AM |
|
Do you know offhand how that CPU usage compares with what you were seeing before?
For a basis, this server is using an Intel Q8400. Looks to be around half the usage at "idle". During a LP; it hits about 40-50% usage. Bunches better than stock bitcoind, and about 10% less than the diff-4 v.95. Great to hear. Thanks.
|
I am an employee of Ripple. Follow me on Twitter @JoelKatz 1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
|
|
|
jollyjim
Newbie
Offline
Activity: 21
Merit: 0
|
|
July 17, 2011, 07:46:34 AM |
|
I've been testing and debugging the patches against the 0.3.23 version. I think I've got most of the problems figured out but there is still a critical problem that's made me lose out on 3 blocks so far. First, the fast getwork parsing doesn't work if the client doesn't send the id without quotes. This will allow it to send the id with or without quotes: ThreadRPCServer3 in rpc.cpp: if ((strRequest.find("\"getwork\"")!=std::string::npos) && strRequest.find("[]")!=std::string::npos) std::string id; size_t p = strRequest.find("\"id\":"); if (p != std::string::npos) { size_t ep = strRequest.find(" ", p+5), e; if((e=strRequest.find(",", p+5))<ep) ep = e; if((e=strRequest.find("}", p+5))<ep) ep = e; id = strRequest.substr(p+5, ep-p-5); }
Second, the locking problems seen are mostly due to the mutex being non-recursive. Changing mGetWork to recursive_mutex instead of mutex, along with the iterators for it, should fix the problem. Third, the CheckWork done in getwork assigns a value that should be returned but to a variable in the wrong scope. This has been addressed earlier. Some of the locking seems to be off. An example would be: CommitTransaction in main.cpp: // Broadcast if (!wtxNew.AcceptToMemoryPool()) { // This must not fail. The transaction has already been signed and recorded. printf("CommitTransaction() : Error: Transaction not valid"); return false; } wtxNew.RelayWalletTransaction(); + SyncGetWork(4); } - SyncGetWork(4);
The comment for SyncGetWork says cs_main must be held but before being called, but that's not the case with the patch before the above change. Another one would be: AddToBlockIndex in main.cpp: if (pindexNew == pindexBest) { // Notify UI to display prev block's coinbase if it was ours static uint256 hashPrevBestCoinBase; - CRITICAL_BLOCK(cs_mapWallet) + CRITICAL_BLOCK(cs_mapWallet) { vWalletUpdated.push_back(hashPrevBestCoinBase); + hashPrevBestCoinBase = vtx[0].GetHash(); + } - hashPrevBestCoinBase = vtx[0].GetHash(); }
This one may or may not matter depending on how many threads call it (I believe only one thread runs this function but I'm not 100% sure). However, this related to crashes I've been seeing right after writing out the block and before sending out the proof of work. The call to vtx[0].GetHash causes a crash in the Serialize function due to the assertion that nSize >= 0. I believe it's due to vtx[0] being corrupted but the event is rare at such high difficulties. I don't see the problem with the test network but it happens on the live one. One thought I had is that the live one has more transactions and appending enough of them would trigger the crash. If this is the case, one cause might be the changes JoelKatz has made that'd allow vtx to get updated elsewhere (from another client calling getwork?). Another one may be vtx is a vector and whenever it resizes, the structures related to each object inside vtx isn't correctly copied over (I didn't see any copy constructors). I was able to get a block through when I commented the above out so that might be a sign that I'm on the right track but I'd like to see it get a few more blocks before I can rule out that it wasn't just a random event (initially, crashes on the test network were random and I've yet to see any on it since the fixes mentioned earlier). These high difficulty levels are making it really troublesome to debug locking/race conditions. It'd be nice if one of the big pools try out various things to see if they also experience the crashes (and possibly end up not getting credit for the block) and try out various fixes to see if it fixes the problem. I haven't seen anyone mention these crashes so for all I know, it's just some compiler/machine issue. But then again, it doesn't seem like anyone's tried it with the fixes that doesn't lock up.
|
|
|
|
backburn
Member
Offline
Activity: 111
Merit: 10
★Trash&Burn [TBC/TXB]★
|
|
July 17, 2011, 09:37:43 AM |
|
We had both bitcoin-4diff.txt (.95)+ updates.txt + 0.3.23 and just 4diff (.95) + 0.3.23 in production for about a week @ 100+ GH/s each. Both produced blocks and we had no invalids. Neither crashed and ran un-interrupted under load for the week long test. On a side note, it seemed just using the 4diff and no updates we generated more blocks. But that really doesn't mean anything.
|
|
|
|
backburn
Member
Offline
Activity: 111
Merit: 10
★Trash&Burn [TBC/TXB]★
|
|
July 17, 2011, 09:47:19 AM Last edit: July 17, 2011, 09:58:26 AM by backburn |
|
Oh I forgot to add. We generated our first live block with the 4-diff preview code. So all seems to be working.
There are some awesome crash test pilots miners in our irc channel that are eager to test new features on our public test pool. So if there is anything we can try to break for you, please ask!
|
|
|
|
Furyan
|
|
July 17, 2011, 02:19:07 PM |
|
I've applied the patch to 0.3.24 and rebuilt, running now apparently without problems.
I'm doing a heavier load test today so I will let you know how that goes.
Can you explain the "hub" mode a little better?
|
|
|
|
|