Bitcoin Forum
November 19, 2024, 03:19:46 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 [8] 9 10 11 12 13 »  All
  Print  
Author Topic: [20 BTC] Multithreaded Keep-alive Implementation in Bitcoind  (Read 31453 times)
m0mchil
Full Member
***
Offline Offline

Activity: 171
Merit: 127


View Profile
July 12, 2011, 08:38:55 PM
 #141

Even if too late in this thread, I'd like to make some comments. Partly because I feel some guilt over the notorious 'getwork' RPC call.

It was made with only one purpose - to experiment with mining outside of Satoshi's code. I never imagined it will feed heavy loaded servers in the way it is doing it now. Sadly, it made pooled mining possible and at the same time allowed de-democratization of the mining process. I believe this can be fixed.

Right now, classic 'getwork' is re-processing all transactions whenever there are new ones and 60 seconds have passed or when there is new block(1). Worse, because each worker needs its own hash space, the Merkle tree is recalculated entirely with each request(2). When the size of transaction pool (unconfirmed transactions) gets really large, this becomes unfeasible. There were episodes with significant numbers of spam transactions which proved this.

Some months ago I made https://github.com/m0mchil/bitcoin/tree/poolmode

About (1), transaction processing was moved out of the RPC thread (to main.cpp, ProcessTransactions) to make 'getwork' always return as fast as possible. For (2) UpdateMerkleTree was introduced to allow only specific branch to be rebuilt (specifically the first one). Bitcoind was creating new thread handle for each connection accepted (I guess it was needed for connection timeout guard) - this was removed (see rpc.cpp, around 'boost::thread api_caller') because with pools the server is always used locally, by trusted process(es). Even still single threaded, 'getwork' performance improved drastically.

But it is time for a new scheme. I see Gavin's monitorX patch as a good candidate. We need something like 'monitorTransactionPool' to push whenever there is change in the set of transactions currently ready to be included in a block.

Also, pools should be changed to allow miners to just prove they included pool's coin base in the block they solve. This is possible by sending the transaction with pool's address in it and the next Merkle branch. Then miners will have complete control over which transactions to include and which block chain to build on.

I am intending to have this implemented soon.

JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 12, 2011, 09:19:47 PM
 #142

Also, pools should be changed to allow miners to just prove they included pool's coin base in the block they solve. This is possible by sending the transaction with pool's address in it and the next Merkle branch. Then miners will have complete control over which transactions to include and which block chain to build on.
This is a great idea, however, it requires changes to *everything*. Bitcoin will need changes, pool managers will need changes, and mining programs will need changes.

I'm not sure how the proof of work flow would go though. If the pool only gets the address and the branch, it cannot submit the block to the public chain. Are you expecting the miner to do that alone? Or are you suggesting different paths for solved block versus share found?

This will force pool managers to rely on the miners to work on useful branches, and it will mean that if a miner has a poor connection, he may be submitting useless shares for which he will get full credit.

I think this needs some subtle changes to address those issues. But I think the idea of letting the miners decide which transactions to include is a great one. And avoiding recomputing the entire merkle tree on each 'getwork' call will eliminate the biggest CPU sucker left in 'getwork' after my changes.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
m0mchil
Full Member
***
Offline Offline

Activity: 171
Merit: 127


View Profile
July 12, 2011, 09:41:10 PM
 #143


I'm not sure how the proof of work flow would go though. If the pool only gets the address and the branch, it cannot submit the block to the public chain. Are you expecting the miner to do that alone? Or are you suggesting different paths for solved block versus share found?

This will force pool managers to rely on the miners to work on useful branches, and it will mean that if a miner has a poor connection, he may be submitting useless shares for which he will get full credit.

This is true even now - miners can fail to submit the precious one true solution due to many reasons. Even on purpose. And yes, I expect the miner to submit his blocks.

Quote
... And avoiding recomputing the entire merkle tree on each 'getwork' call will eliminate the biggest CPU sucker left in 'getwork' after my changes.

Feel free to pull this in your code. My patch is tested against pushpoold and seems to work fine.

JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 12, 2011, 09:58:11 PM
Last edit: July 12, 2011, 10:19:36 PM by JoelKatz
 #144

Something else occurred to me: If we make these changes, we can allow the miner to decide how much he wants to pool mine and how much he wants to solo mine. Instead of defining each share submitted as 1, we can define each share submitted as the fraction of the revenue for the block sent to the pool's account. So a miner could split the money 50/50 and get half shares from the pool but make 25BTC if he mines the block himself, or anywhere in between.

In fact, he could even mine for more than one pool at a time, accumulating fractional shares from each pool.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 14, 2011, 11:27:37 AM
 #145

Is anyone else having problems with somebadgers git pull of this?

I had a bunch of socket recv error 104 errors in the debug.log and bitcoind was not serving any work this morning.

If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 14, 2011, 11:53:55 PM
 #146

I'm also seeing this in the console log now and again:

[2011-07-14 23:53:5.926549] JSON-RPC call failed: (null)
[2011-07-14 23:53:5.926566] submit_work json_rpc_call failed

If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 15, 2011, 02:42:20 AM
 #147

Locked up again...

Somebadger: Your git is definitely not stable, sadly.  I'm trying now without any -hub command line, we'll see what happens there.


If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
phorensic
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500



View Profile
July 15, 2011, 03:33:14 AM
 #148

I think it's because somebadger is patching against 0.3.24 and JoelKatz started with 0.3.23.  Correct me if I'm wrong.
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 15, 2011, 04:20:09 AM
 #149

It's my understanding Somebadger backed off to .23 in the git repository.  Is that not correct?

If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
btcmonkey
Newbie
*
Offline Offline

Activity: 20
Merit: 0


View Profile
July 15, 2011, 05:49:41 PM
 #150

Hello,

I grabbed bitcoin v0.3.23 and applied bitcoin-4diff.txt and updates.diff.txt.

This is on 64bit CentOS 5.5.

I am now generating bitcoind cores every hour on the hour.  I am fairly new to troubleshooting this type of thing, but a backtrace shows:

Code:
Core was generated by `/home/bitcoin/bitcoind -testnet -conf=/etc/bitcoin.conf -daemon -pollpidfile=/v'.
Program terminated with signal 6, Aborted.
#0  0x00000039f9e30265 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00000039f9e30265 in raise () from /lib64/libc.so.6
#1  0x00000039f9e31d10 in abort () from /lib64/libc.so.6
#2  0x00000039fd2bed14 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#3  0x00000039fd2bce16 in ?? () from /usr/lib64/libstdc++.so.6
#4  0x00000039fd2bce43 in std::terminate() () from /usr/lib64/libstdc++.so.6
#5  0x00000039fd2bcec5 in __cxa_rethrow () from /usr/lib64/libstdc++.so.6
#6  0x000000000040a40a in PrintException (pex=<value optimized out>, pszThread=<value optimized out>) at util.cpp:659
#7  0x00000000004adbee in ThreadRPCServer (parg=0x0) at rpc.cpp:1897
#8  0x00000039faa0673d in start_thread () from /lib64/libpthread.so.0
#9  0x00000039f9ed44bd in clone () from /lib64/libc.so.6

my bitcoind command line is:

/usr/local/sbin/bitcoind -conf=/etc/bitcoin.conf -daemon -pollpidfile=/var/run/pushpoold/pushpoold.pid

with bitcoind.conf only containing rpc user and pass.

Does anyone have any suggestions as to what I can do to resolve this issue?

Thanks much,
   btcmonkey

JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 15, 2011, 06:04:12 PM
 #151

Can you paste the code from your 'rpc.cpp' file around line 1897 (say five lines before and five after). And please identify exactly which line is 1897.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 15, 2011, 09:14:46 PM
 #152

Anyone got any ideas what's going on on my end?

Can't really run a backtrace since it's not crashing exactly.

If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
btcmonkey
Newbie
*
Offline Offline

Activity: 20
Merit: 0


View Profile
July 16, 2011, 12:20:20 AM
 #153

Can you paste the code from your 'rpc.cpp' file around line 1897 (say five lines before and five after). And please identify exactly which line is 1897.


Here is a snip from rpc.cpp:
Code:

void ThreadRPCServer(void* parg)
{
    IMPLEMENT_RANDOMIZE_STACK(ThreadRPCServer(parg));
    try
    {
        vnThreadsRunning[4]++;
        ThreadRPCServer2(parg);
        vnThreadsRunning[4]--;
    }
    catch (std::exception& e) {
        vnThreadsRunning[4]--;
        PrintException(&e, "ThreadRPCServer()");
    } catch (...) {
        vnThreadsRunning[4]--;
        PrintException(NULL, "ThreadRPCServer()");
    }
    printf("ThreadRPCServer exiting\n");
}

Line 1897 is: PrintException(&e, "ThreadRPCServer()");



Just to verify that I am doing things right, to create this file I:

Code:
git clone http://github.com/bitcoin/bitcoin/ davids
cd davids
git checkout v0.3.23
cd src
patch  < ~/src/bitcoin/bitcoin-4diff.txt
patch  < ~/src/bitcoin/updates.diff.txt
then built bitcoind normally.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 16, 2011, 02:45:01 AM
 #154

This is one of the annoying things about C++ exception handling. The exception was caught, and that obscured the information needed to find the code that generated it. Sad

All we can tell from that is that the RPC code threw an exception. This could be for reasons that really aren't the code's fault, such as running out of memory at a critical point, or (more likely) they could be due to bugs in the code.

One thing you can try -- use the 'up' command until you get to level 7, the 'ThreadRPCServer' call. And type 'print e'. If for some reason that doesn't work, you can try level 6, 'PrintException' and the command 'print pszMessage'.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 16, 2011, 04:26:08 AM
 #155

I put in some debug code, why is this failing?

[2011-07-16 04:22:57.656676] JSON-RPC call failed: (null)
[2011-07-16 04:22:57.656694] submit_work json_rpc_call failed
[2011-07-16 04:22:57.656701] curl: P*, srv.rpc_url: http://127.0.0.1:8332/, srv.rpc_userpass: xxx:xxx, s: {"method": "getwork", "params": [ "000000017389ec1cfb159619a8182312807a8d7a77041d1d835f6377000008f00000000020988cf 8191cab5a828a8bd44b73c7310ab418f1c469f9d836d66d8dd6cf139c4e2112071a0abbcf6a64c9 c400000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000" ], "id":1}

The above submit_work call is what's failing with JoelKatz patch and update applied.  it works fine without the patch.  This comes from the submit_work() function in msg.c


If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 16, 2011, 04:48:54 AM
 #156

I put in some debug code, why is this failing?

[2011-07-16 04:22:57.656676] JSON-RPC call failed: (null)
[2011-07-16 04:22:57.656694] submit_work json_rpc_call failed
[2011-07-16 04:22:57.656701] curl: P*, srv.rpc_url: http://127.0.0.1:8332/, srv.rpc_userpass: xxx:xxx, s: {"method": "getwork", "params": [ "000000017389ec1cfb159619a8182312807a8d7a77041d1d835f6377000008f00000000020988cf 8191cab5a828a8bd44b73c7310ab418f1c469f9d836d66d8dd6cf139c4e2112071a0abbcf6a64c9 c400000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000" ], "id":1}

The above submit_work call is what's failing with JoelKatz patch and update applied.  it works fine without the patch.  This comes from the submit_work() function in msg.c

Nice catch!!!! This is a bug in the code that processes a found block. Fortunately, we are correctly processing the block, so you are getting credit for it. However, we bungle getting the information back to the caller and return nothing. Here's the fix:

--- rpc.cpp~    2011-07-10 04:37:16.000000000 -0700
+++ rpc.cpp     2011-07-15 21:47:58.097050116 -0700
@@ -1461,7 +1461,7 @@ Value getwork(const Array& params, bool
             npblock->vtx[0].vin[0].scriptSig = CScript() << pblock->nBits << CBigNum(nExtraNonce);
             npblock->hashMerkleRoot = npblock->BuildMerkleTree();
 
-            Value ret = CheckWork(npblock, reservekey);
+            ret = CheckWork(npblock, reservekey);
         }
         return ret;
     }


I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
July 16, 2011, 04:59:15 AM
 #157

Made the changes, I will see how it goes.

I get this when doing a kill on bitcoind BTW:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >'
  what():  mutex: Invalid argument

It looks like bitcoind is not exiting cleanly when it gets a SIG to die.


If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 16, 2011, 05:09:07 AM
 #158

Made the changes, I will see how it goes.

I get this when doing a kill on bitcoind BTW:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >'
  what():  mutex: Invalid argument

It looks like bitcoind is not exiting cleanly when it gets a SIG to die.
The most likely issue is in this code:

 { // CAUTION: Raising the delay will slow connection accept
     boost::posix_time::time_duration wait_duration = boost::posix_time::millisec(250);
     boost::unique_lock<boost::mutex> lock(mWorkNotification);
     if(!fWorkFound)
         cvWorkNotification.timed_wait(lock, wait_duration); // ** HERE **
 }

I wonder if I have to enclose that in a try/catch block or it doesn't release the mutex if the timed wait is interrupted. I'll try to track it down.

You can safely replace that entire block of code with 'Sleep(250);' if you want. Network latency (the time between when you get a block or transaction and the time you can pass it to neighboring nodes) will be slightly higher -- but still better than without the hub patches. But if nothing else, it would tell me if that block is causing the issue.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
btcmonkey
Newbie
*
Offline Offline

Activity: 20
Merit: 0


View Profile
July 16, 2011, 05:14:49 AM
 #159

This is one of the annoying things about C++ exception handling. The exception was caught, and that obscured the information needed to find the code that generated it. Sad

All we can tell from that is that the RPC code threw an exception. This could be for reasons that really aren't the code's fault, such as running out of memory at a critical point, or (more likely) they could be due to bugs in the code.

One thing you can try -- use the 'up' command until you get to level 7, the 'ThreadRPCServer' call. And type 'print e'. If for some reason that doesn't work, you can try level 6, 'PrintException' and the command 'print pszMessage'.


So it turns out I made a stupid mistake.  As part of a test some days ago, I had created a cron job that stopped and restarted bitcoind on an hourly basis.  It appears that I wasn't waiting long enough for the bitcoind stop command to close out bitcoind.  The 'trying to start bitcoind when it was in the midst of still shutting down' seems to have caused the cores.

I'm sorry for wasting your time, but really appreciate what you have taught me about debugging/troubleshooting these issues.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 16, 2011, 05:42:15 AM
 #160

I'm sorry for wasting your time, but really appreciate what you have taught me about debugging/troubleshooting these issues.
No problem. That's par for the course. I've made more stupid mistakes that have wasted other people's times than I can count.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Pages: « 1 2 3 4 5 6 7 [8] 9 10 11 12 13 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!