Bitcoin Forum
November 06, 2024, 01:19:05 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 [9] 10 11 12 13 »  All
  Print  
Author Topic: [20 BTC] Multithreaded Keep-alive Implementation in Bitcoind  (Read 31450 times)
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 16, 2011, 09:12:45 PM
 #161

Well, I tried a fresh copy of .23 and applied the update and diff4, still same problem as somebadgers repository.  System will basically stop responding after ~12 hours and bitcoind needs to be restarted and everything picks right back up.

How can I trouble shoot this?

If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 16, 2011, 09:15:14 PM
 #162

Well, I tried a fresh copy of .23 and applied the update and diff4, still same problem as somebadgers repository.  System will basically stop responding after ~12 hours and bitcoind needs to be restarted and everything picks right back up.
What is this update?

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 16, 2011, 09:19:10 PM
 #163

Your updates from post http://forum.bitcoin.org/index.php?topic=22585.msg334074#msg334074

If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 16, 2011, 09:46:14 PM
 #164

Ahh, okay. I think there's a locking bug in those updates. I'm working on two things:

1) A new 4-diff based on 0.3.24 that includes all the updates that are believed to be safe.

2) A new diff based on 0.3.24 that includes even the getwork pre-compute update that is suspected to be responsible for deadlocks, but hopefully with the deadlock issue fixed.

1 should be ready soon. I still haven't done final auditing of the diff and testing to make sure it works.
I just put up a preview here:
http://davids.webmaster.com/~davids/preview.diff

2 may take a bit longer.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 16, 2011, 10:10:09 PM
 #165

Cool, I will try it out and see what happens.  You might consider removing the upnp junk in the diff as well, since it's an extra dependency that is a) a pain in the ass and b) completely useless in this application.  I remove it from the makefile before compiling.

Just to clarify, is this preview a potential fix for the lockup issue I'm experiencing or is that part of number 2?

When you say ready soon do you mean today or within a couple days?  I don't want to start digging around in the code  and making a nuisance of myself if there's going to be some changes within the next few hours as far as that goes.

If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 16, 2011, 10:12:13 PM
 #166

When you say ready soon do you mean today or within a couple days?  I don't want to start digging around in the code  and making a nuisance of myself if there's going to be some changes within the next few hours as far as that goes.
1 should be ready today. All that's left is a final audit and testing to make sure I can't break it. There are unlikely to be any significant changes and there's a good chance there will be no changes at all. Testing and reports are very helpful, so don't worry about making a nuisance of yourself. "It worked for me" is extremely helpful because it helps me get closer to the confidence level needed for release. "It didn't work for me" is extremely helpful because I hate to find issues after I release.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 16, 2011, 10:19:01 PM
 #167

Ok, well, with your preview.diff, bitcoind won't start.  Here is debug.log errors:

Bitcoin version 0.3.24-beta
Default data directory /home/xxx/.bitcoin
Bound to port 8333
Loading addresses...
dbenv.open strLogDir=/home/xxx/.bitcoin/database strErrorFile=/home/xxx/.bitcoin/db.log
Loaded 356873 addresses
 addresses              1032ms
Loading block index...
LoadBlockIndex(): hashBestChain=00000000000002b99ddf  height=136616
 block index            1914ms
Loading wallet...
nFileVersion = 32400
fGenerateBitcoins = 0
nTransactionFee = 0
addrIncoming = 255.255.255.255:8333
fMinimizeToTray = 0
fMinimizeOnClose = 0
fUseProxy = 0
addrProxy = 127.0.0.1:9050
 wallet                  118ms
Done loading
mapBlockIndex.size() = 136636
nBestHeight = 136616
mapKeys.size() = 172
setKeyPool.size() = 101
mapPubKeys.size() = 172
mapWallet.size() = 190
mapAddressBook.size() = 2
Loading addresses from DNS seeds (could take a while)
AddAddress(84.49.174.161:8333)
48 addresses found from DNS seeds
sending: version (85 bytes)
ThreadRPCServer started
ipv4 eth0: x.x.x.x
addrLocalHost = x.x.x.x:8333
ThreadSocketHandler started
ThreadIRCSeed started
ThreadOpenConnections started
ThreadMessageHandler started
trying connection 67.172.181.225:8333 lastseen=-0.1hrs lasttry=-364126.2hrs
IRC :irc.lechat.ir NOTICE AUTH :*** Looking up your hostname...
connected 67.172.181.225:8333
sending: version (85 bytes)



bitcoind exits at this point

If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 16, 2011, 10:21:41 PM
 #168

I am going to pull a fresh copy of .24 and reapply just to be sure I didn't do something odd - but the patch applied just fine.


EDIT - whoops... I guess you are patching against the tar and not the repo?  The above report is off the tar and it applied fine.  Just tried against the repo and it didn't like that.


If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 16, 2011, 10:31:45 PM
 #169

Ok, confirmed that a fresh copy of the .24 tar file with your preview gives the above result.  Here's another debug.log output from the fresh copy:

Bitcoin version 0.3.24-beta
Default data directory /home/xxx/.bitcoin
Bound to port 8333
Loading addresses...
dbenv.open strLogDir=/home/xxx/.bitcoin/database strErrorFile=/home/xxx/.bitcoin/db.log
Loaded 357138 addresses
 addresses               942ms
Loading block index...
LoadBlockIndex(): hashBestChain=0000000000000a1fe384  height=136619
 block index            1925ms
Loading wallet...
nFileVersion = 32400
fGenerateBitcoins = 0
nTransactionFee = 0
addrIncoming = 255.255.255.255:8333
fMinimizeToTray = 0
fMinimizeOnClose = 0
fUseProxy = 0
addrProxy = 127.0.0.1:9050
 wallet                  175ms
Done loading
mapBlockIndex.size() = 136639
nBestHeight = 136619
mapKeys.size() = 172
setKeyPool.size() = 101
mapPubKeys.size() = 172
mapWallet.size() = 190
mapAddressBook.size() = 2
Loading addresses from DNS seeds (could take a while)
AddAddress(62.155.236.249:8333)
AddAddress(109.75.176.193:8333)
AddAddress(174.120.185.74:8333)
AddAddress(69.163.132.101:8333)
AddAddress(178.79.147.99:8333)
AddAddress(91.85.220.84:8333)
48 addresses found from DNS seeds
sending: version (85 bytes)
ThreadRPCServer started
ipv4 eth0: x.x.x.x
addrLocalHost = x.x.x.x:8333
ThreadSocketHandler started
ThreadIRCSeed started
ThreadOpenConnections started
ThreadMessageHandler started
IRC :pelican.heliacal.net NOTICE AUTH :*** Looking up your hostname...
IRC :pelican.heliacal.net NOTICE AUTH :*** Couldn't look up your hostname
IRC SENDING: NICK x265309750^M
IRC SENDING: USER x265309750 8 * : x265309750^M
trying connection 213.111.82.119:8333 lastseen=-0.2hrs lasttry=-364126.5hrs
IRC :pelican.heliacal.net 001 x265309750 :Welcome to the LFNet Internet Relay Chat Network x265309750
IRC :pelican.heliacal.net 002 x265309750 :Your host is pelican.heliacal.net[173.246.103.92/6667], running version hybri
d-7.2.3
IRC :pelican.heliacal.net 003 x265309750 :This server was created Jun 28 2011 at 14:26:11
IRC :pelican.heliacal.net 004 x265309750 pelican.heliacal.net hybrid-7.2.3 CDGabcdfgiklnorsuwxyz biklmnopstveI bkloveI
connected 213.111.82.119:8333
sending: version (85 bytes)


Bitcoind exits at this point.

If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 16, 2011, 10:31:49 PM
Last edit: July 16, 2011, 10:46:17 PM by JoelKatz
 #170

It is against the 0.3.24 release. I just made a small, but very critical, fix. So redownload it. (That's likely the problem in your post above.)

I've finished my own testing and auditing. So if the latest works for you, I'm ready to release it.

Performance is about 3,000 getwork's per second on my test machine (a Core 2 Quad Q9550 running 64-bit Linux).

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 16, 2011, 10:44:40 PM
 #171

That fixed it!

Just to clarify, is this preview a potential fix for the lockup issue I'm experiencing or is that part of number 2?


If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 16, 2011, 10:46:38 PM
 #172

Just to clarify, is this preview a potential fix for the lockup issue I'm experiencing or is that part of number 2?
This does not contain the optimization that I believe is responsible for the lockup issue. That optimization was not contained in 3diff or 4diff but was an additional optimization (introduced in the 'update' patch) specifically to reduce the 'hiccup' in 'getwork' responses when a new transaction hits the network.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
backburn
Member
**
Offline Offline

Activity: 111
Merit: 10


★Trash&Burn [TBC/TXB]★


View Profile
July 17, 2011, 06:01:08 AM
 #173

Patch at http://davids.webmaster.com/~davids/preview.diff compiles perfectly on the 0.3.24 release Ubuntu x64 10.04 + 10.10. Was working properly with load on test net and I have pushed to live with about 100 GHash. About an hour in, so far so good. Thank you so much for all your hard work Joel!

Code:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
19263 pool      20   0 4262m 139m 9396 S    2  2.5   2:24.12 bitcoind-24prev

Yummy!
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 17, 2011, 06:17:53 AM
 #174

Patch at http://davids.webmaster.com/~davids/preview.diff compiles perfectly on the 0.3.24 release Ubuntu x64 10.04 + 10.10. Was working properly with load on test net and I have pushed to live with about 100 GHash. About an hour in, so far so good. Thank you so much for all your hard work Joel!
Thanks for the success report. The very first preview had a serious bug that would make it crash immediately. The final release had, other than that fix, only cosmetic changes (like fixing extra spaces and such). So you can stick with the build you've got, since it obviously doesn't have the 'makes it crash immediately' bug.

Do you know offhand how that CPU usage compares with what you were seeing before?

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
backburn
Member
**
Offline Offline

Activity: 111
Merit: 10


★Trash&Burn [TBC/TXB]★


View Profile
July 17, 2011, 07:18:12 AM
 #175

Do you know offhand how that CPU usage compares with what you were seeing before?

For a basis, this server is using an Intel Q8400. Looks to be around half the usage at "idle". During a LP; it hits about 40-50% usage. Bunches better than stock bitcoind, and about 10% less than the diff-4 v.95.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 17, 2011, 07:24:44 AM
 #176

Do you know offhand how that CPU usage compares with what you were seeing before?
For a basis, this server is using an Intel Q8400. Looks to be around half the usage at "idle". During a LP; it hits about 40-50% usage. Bunches better than stock bitcoind, and about 10% less than the diff-4 v.95.
Great to hear. Thanks.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
jollyjim
Newbie
*
Offline Offline

Activity: 21
Merit: 0


View Profile
July 17, 2011, 07:46:34 AM
 #177

I've been testing and debugging the patches against the 0.3.23 version.  I think I've got most of the problems figured out but there is still a critical problem that's made me lose out on 3 blocks so far.

First, the fast getwork parsing doesn't work if the client doesn't send the id without quotes.  This will allow it to send the id with or without quotes:

ThreadRPCServer3 in rpc.cpp:
Code:
        if ((strRequest.find("\"getwork\"")!=std::string::npos) && strRequest.find("[]")!=std::string::npos)
            std::string id;
            size_t p = strRequest.find("\"id\":");
            if (p != std::string::npos)
            {
                size_t ep = strRequest.find(" ", p+5), e;
                if((e=strRequest.find(",", p+5))<ep) ep = e;
                if((e=strRequest.find("}", p+5))<ep) ep = e;
                id = strRequest.substr(p+5, ep-p-5);
            }

Second, the locking problems seen are mostly due to the mutex being non-recursive.  Changing mGetWork to recursive_mutex instead of mutex, along with the iterators for it, should fix the problem.

Third, the CheckWork done in getwork assigns a value that should be returned but to a variable in the wrong scope.  This has been addressed earlier.

Some of the locking seems to be off.  An example would be:

CommitTransaction in main.cpp:
Code:
        // Broadcast
        if (!wtxNew.AcceptToMemoryPool())
        {
            // This must not fail. The transaction has already been signed and recorded.
            printf("CommitTransaction() : Error: Transaction not valid");
            return false;
        }
        wtxNew.RelayWalletTransaction();
+        SyncGetWork(4);
    }
-    SyncGetWork(4);

The comment for SyncGetWork says cs_main must be held but before being called, but that's not the case with the patch before the above change.

Another one would be:

AddToBlockIndex in main.cpp:
Code:
    if (pindexNew == pindexBest)
    {
        // Notify UI to display prev block's coinbase if it was ours
        static uint256 hashPrevBestCoinBase;
-        CRITICAL_BLOCK(cs_mapWallet)
+        CRITICAL_BLOCK(cs_mapWallet) {
            vWalletUpdated.push_back(hashPrevBestCoinBase);
+            hashPrevBestCoinBase = vtx[0].GetHash();
+        }
-         hashPrevBestCoinBase = vtx[0].GetHash();
    }

This one may or may not matter depending on how many threads call it (I believe only one thread runs this function but I'm not 100% sure).  However, this related to crashes I've been seeing right after writing out the block and before sending out the proof of work.  The call to vtx[0].GetHash causes a crash in the Serialize function due to the assertion that nSize >= 0.  I believe it's due to vtx[0] being corrupted but the event is rare at such high difficulties.  I don't see the problem with the test network but it happens on the live one.

One thought I had is that the live one has more transactions and appending enough of them would trigger the crash.  If this is the case, one cause might be the changes JoelKatz has made that'd allow vtx to get updated elsewhere (from another client calling getwork?).  Another one may be vtx is a vector and whenever it resizes, the structures related to each object inside vtx isn't correctly copied over (I didn't see any copy constructors).  I was able to get a block through when I commented the above out so that might be a sign that I'm on the right track but I'd like to see it get a few more blocks before I can rule out that it wasn't just a random event (initially, crashes on the test network were random and I've yet to see any on it since the fixes mentioned earlier).

These high difficulty levels are making it really troublesome to debug locking/race conditions.  It'd be nice if one of the big pools try out various things to see if they also experience the crashes (and possibly end up not getting credit for the block) and try out various fixes to see if it fixes the problem.  I haven't seen anyone mention these crashes so for all I know, it's just some compiler/machine issue.  But then again, it doesn't seem like anyone's tried it with the fixes that doesn't lock up.
backburn
Member
**
Offline Offline

Activity: 111
Merit: 10


★Trash&Burn [TBC/TXB]★


View Profile
July 17, 2011, 09:37:43 AM
 #178

We had both bitcoin-4diff.txt (.95)+ updates.txt +  0.3.23 and just 4diff (.95) + 0.3.23 in production for about a week @ 100+ GH/s each. Both produced blocks and we had no invalids. Neither crashed and ran un-interrupted under load for the week long test. On a side note, it seemed just using the 4diff and no updates we generated more blocks. But that really doesn't mean anything.
backburn
Member
**
Offline Offline

Activity: 111
Merit: 10


★Trash&Burn [TBC/TXB]★


View Profile
July 17, 2011, 09:47:19 AM
Last edit: July 17, 2011, 09:58:26 AM by backburn
 #179

Oh I forgot to add. We generated our first live block with the 4-diff preview code. So all seems to be working.

There are some awesome crash test pilots miners in our irc channel that are eager to test new features on our public test pool. So if there is anything we can try to break for you, please ask!
Furyan
Full Member
***
Offline Offline

Activity: 175
Merit: 102



View Profile
July 17, 2011, 02:19:07 PM
 #180

I've applied the patch to 0.3.24 and rebuilt, running now apparently without problems.

I'm doing a heavier load test today so I will let you know how that goes.

Can you explain the "hub" mode a little better?
Pages: « 1 2 3 4 5 6 7 8 [9] 10 11 12 13 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!