Bitcoin Forum
May 04, 2024, 06:01:38 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 »  All
  Print  
Author Topic: [20 BTC] Multithreaded Keep-alive Implementation in Bitcoind  (Read 31401 times)
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 25, 2011, 10:46:05 PM
Last edit: June 26, 2011, 12:23:34 AM by Jine
 #1

Hi!

As a pool operator, I'm looking for a multi threaded keep-alive solution for bitcoind.
A brief description of the issue we're having (and i know BTCGuild is having) is described here:
https://github.com/bitcoin/bitcoin/issues/344

Async IO is already implemented here, could maybe give some hints.
https://github.com/bitcoin/bitcoin/pull/214 - But not fully functional.


20 BTC reward if we can fix this issues now:
A keep-alive (only a few connections between pushpoold and bitcoind) multi-threaded solution for bitcoind.

/ Jim

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
1714802498
Hero Member
*
Offline Offline

Posts: 1714802498

View Profile Personal Message (Offline)

Ignore
1714802498
Reply with quote  #2

1714802498
Report to moderator
1714802498
Hero Member
*
Offline Offline

Posts: 1714802498

View Profile Personal Message (Offline)

Ignore
1714802498
Reply with quote  #2

1714802498
Report to moderator
1714802498
Hero Member
*
Offline Offline

Posts: 1714802498

View Profile Personal Message (Offline)

Ignore
1714802498
Reply with quote  #2

1714802498
Report to moderator
"Governments are good at cutting off the heads of a centrally controlled networks like Napster, but pure P2P networks like Gnutella and Tor seem to be holding their own." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714802498
Hero Member
*
Offline Offline

Posts: 1714802498

View Profile Personal Message (Offline)

Ignore
1714802498
Reply with quote  #2

1714802498
Report to moderator
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 26, 2011, 12:33:40 AM
 #2

Someone at a Swedish forum found this link, should be something worth taking a look on:
http://codingplayground.blogspot.com/2008/07/boostasio-and-keep-alive.html

I'm no C++ dev tho.. Sad

/ Jim

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
sakkaku
Member
**
Offline Offline

Activity: 70
Merit: 10



View Profile WWW
June 26, 2011, 12:54:15 AM
 #3

If you run bitcoind and pushpoold on the same machine (or on the same lan) keep alive is a minimal improvement because opening TCP connections has next to no overhead locally compared to remote.

13NiQcetcioQj3YwHL1ZWvgQg8eAjkzUdt
Blog/Projects: zxlu.com | syn-multiminer
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 26, 2011, 01:16:39 AM
 #4

But we're seeing 20-30k TIME_WAIT connections due to this.
It's not the overhead, it's the amount of new sockets/connections.

This is a known problem for both us, eligius and btcguild (This is why they got multiple nodes with maximum ~500GH each)

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
padrino
Legendary
*
Offline Offline

Activity: 1428
Merit: 1000


https://www.bitworks.io


View Profile WWW
June 26, 2011, 12:28:37 PM
 #5

But we're seeing 20-30k TIME_WAIT connections due to this.
It's not the overhead, it's the amount of new sockets/connections.

This is a known problem for both us, eligius and btcguild (This is why they got multiple nodes with maximum ~500GH each)

Did you end up sorting it out?

1CPi7VRihoF396gyYYcs2AdTEF8KQG2BCR
https://www.bitworks.io
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 26, 2011, 04:04:42 PM
 #6

No solution yet, no.

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
padrino
Legendary
*
Offline Offline

Activity: 1428
Merit: 1000


https://www.bitworks.io


View Profile WWW
June 26, 2011, 04:11:01 PM
 #7

No solution yet, no.

I have a history with socket programming and a couple hours to blow right now, I'll give it a look.. Offhand even without code fixes there may be some system level tweaks you can do to get things under control a bit more but I'll give the code a look now.

1CPi7VRihoF396gyYYcs2AdTEF8KQG2BCR
https://www.bitworks.io
Bloody Bell
Newbie
*
Offline Offline

Activity: 18
Merit: 0


View Profile
June 26, 2011, 06:02:19 PM
 #8

I have a history with socket programming

I just took a look. The Boost asio library is used, so to avoid replacing everything (and introduce a different style of coding), one should probably stick with that. The whole thing is running on a single thread, and as soon as a request is answered, closes the connection. Doesn't seem too efficient, indeed. I tried starting a new thread for each request, but there is some glitch somewhere. Unfortunately I am not too familiar with boost Sad

JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
June 26, 2011, 06:19:52 PM
 #9

The problem description doesn't explain what the actual problem is. It says there are a large number of connections in TIME_WAIT state, but doesn't explain why that's a problem. I'd hate for someone to add keep alive and have it not actually solve the problem.

Since a DDoS attack can create arbitrarily large numbers of connection in the TIME_WAIT state, it's a much better solution to fix whatever is causing the large number of connections to be a problem. What is going wrong because of this? Fix that, and you'll get improved resistance to DDoS attacks for free.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Bloody Bell
Newbie
*
Offline Offline

Activity: 18
Merit: 0


View Profile
June 26, 2011, 06:41:07 PM
 #10

The problem description doesn't explain what the actual problem is.
OP said that the connections are between the mining pool server and bitcoind. I suppose the rpc port of bitcoind isn't exposed to the internet, therefore can't be flooded. Also, most of the time it makes no sense to leave it open for anyone.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
June 26, 2011, 06:54:54 PM
 #11

The problem description doesn't explain what the actual problem is.
OP said that the connections are between the mining pool server and bitcoind. I suppose the rpc port of bitcoind isn't exposed to the internet, therefore can't be flooded. Also, most of the time it makes no sense to leave it open for anyone.
Ahh, okay. So forget the DDoS resistance stuff. But we still don't know why this is a problem.

If all the pool server needs is the 'getwork' output, it seems like a much better solution is to avoid polling altogether and have 'bitcoind' put this information in a file or write it continuously to a queue or something. (Assuming I'm understanding the problem correctly. I'm not 100% sure that I am.)

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 26, 2011, 07:56:02 PM
 #12

Lets get things straight.

It's an issue on localhost, for each and every getwork request pushpoold gets - it opens up a NEW socket against bitcoind, and issues getwork over that socket.
When it's finished and pushpoold got his work, the socket is left untouched and still opened.

We (and I know BTC Guild have) tweaked Linux a bit to reuse TIME_WAIT connections (using tcp_tw_recycle in /proc/...) which has helped - but it's not scalable.
As the load increases, the amount of new connections is overwhelming.

This was taken yesterday by me:
Quote
jine@bitcoins:~$ netstat -an | awk '/^tcp/ {A[$(NF)]++} END {for (I in A) {printf "%5d %s\n", A, I}}'
   22 FIN_WAIT2
   10 LISTEN
   58 SYN_RECV
  229 CLOSE_WAIT
21330 TIME_WAIT
 3899 ESTABLISHED
  282 LAST_ACK
   12 FIN_WAIT1

At that moment, we had 21.000+ TIME_WAIT sockets ready to be reused and/or closed due to timeout.
When that figure reaches 25k+ - the server starts to get hard to keep up, we're also hitting limits with num openfiles (nofiles in ulimit) which we increased from 1024 to 128.000+.

We're looking for a solution to get rid of "all" those TIME_WAIT connections (which 99.99999% comes from 127.0.0.1:8332 (bitcoind) against pushpoold at 127.0.0.01)
The best way I've come up with is implementing keep-alive support in bitcoind - hence this thread.

This will mean that bitcoind and pushpoold are only using a couple (multi threaded) keep-alive sockets for getwork and sending responses.
This will drastically lower the amount of open sockets (nofiles dropping to the bottom) and free both ports (limit of 65554 ports in TCP/IP afaik) and also make it more scalable.

Are everyone following now?

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
Bloody Bell
Newbie
*
Offline Offline

Activity: 18
Merit: 0


View Profile
June 26, 2011, 07:59:35 PM
 #13

Are everyone following now?

I am, and this is what I read out from the first post too Smiley

On the other hand let's hope that libcurl can properly reuse the connections kept open, otherwise the new problem will be the overhead of creating/destroying the threads for each connection.

Not that if my bitcoind patch worked Cheesy
sakkaku
Member
**
Offline Offline

Activity: 70
Merit: 10



View Profile WWW
June 26, 2011, 08:05:15 PM
 #14

Lets get things straight.

It's an issue on localhost, for each and every getwork request pushpoold gets - it opens up a NEW socket against bitcoind, and issues getwork over that socket.
When it's finished and pushpoold got his work, the socket is left untouched and still opened.


Has this been confirmed as a bitcoind problem?  Maybe it is pushpool that is misbehaving?  What about explicitly terminating the connection?  Adding keepalive isn't really a magical fix for that because pushpoold could decide it wants to open a new connection anyway.

13NiQcetcioQj3YwHL1ZWvgQg8eAjkzUdt
Blog/Projects: zxlu.com | syn-multiminer
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 26, 2011, 08:31:13 PM
 #15

pushpoold does NOT create a new socket explicitly for each connection, nor closing the previous one.
I got a PM with a "temporary" fix for this - make it explicitly close & open new connections.

We'll see how that goes, I'll try it out later on.

I don't know if this problem is confirmed to be bitcoind-only, but all larger pools based upon bitcoind+pushpoold are suffering of it AFAIK.
I'm gladly accepting comments, suggestions or other tips on this matter.

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
Bloody Bell
Newbie
*
Offline Offline

Activity: 18
Merit: 0


View Profile
June 26, 2011, 08:40:27 PM
 #16

Has this been confirmed as a bitcoind problem?  Maybe it is pushpool that is misbehaving?  What about explicitly terminating the connection?  Adding keepalive isn't really a magical fix for that because pushpoold could decide it wants to open a new connection anyway.

You can try it by connecting to port 8332 with telnet: after one request has been serverd, bitcoind will close the connection. Even worse, as long as that connection is open it won't accept an other one. Everything is done on a single thread, you can see it in rpc.cpp,  at ThreadRPCServer2()
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
June 26, 2011, 09:19:05 PM
Last edit: June 26, 2011, 09:48:21 PM by JoelKatz
 #17

I modified bitcoind both to implement keepalive properly and to fully multithread its RPC so that it can handle multiple RPC connections.

You can download a modified rpc.cpp here:
http://davids.webmaster.com/~davids/rpc.cpp

RPC performance is much snappier and it seems to work. But I am by no means a boost expert, so this is at your own risk. This is definitely 'quick and dirty, should clean up later' type code. Appreciation may be shown at this address: 1H3STBxuzEHZQQD4hkjVE22TWTazcZzeBw

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
spiccioli
Legendary
*
Offline Offline

Activity: 1378
Merit: 1003

nec sine labore


View Profile
June 26, 2011, 09:44:09 PM
 #18

But we're seeing 20-30k TIME_WAIT connections due to this.
It's not the overhead, it's the amount of new sockets/connections.

This is a known problem for both us, eligius and btcguild (This is why they got multiple nodes with maximum ~500GH each)

Jine,

maybe I'm saying something that is known to everybody here... but, I'll say anyway since I had a similar problem long ago (not bitcoin related, though).

Every socket spends some time in TIME_WAIT to correctly handle the closing of itself.

You can change the time it spends in TIME_WAIT which is set too long for a localhost only socket.

Per RFC every client waits 2*MSL (maximum segment life) before finally closing a socket, if you lower this time (default should be at least 120 seconds) to 10 seconds (for example) you should see that you have a lot less sockets waiting, and this should help you avoid reaching this 25K waiting sockets limit.

I don't know how to change MSL in Linux, though Smiley

Best regards.

spiccioli
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 26, 2011, 09:57:20 PM
 #19

I modified bitcoind both to implement keepalive properly and to fully multithread its RPC so that it can handle multiple RPC connections.

You can download a modified rpc.cpp here:
http://davids.webmaster.com/~davids/rpc.cpp

RPC performance is much snappier and it seems to work. But I am by no means a boost expert, so this is at your own risk. This is definitely 'quick and dirty, should clean up later' type code. Appreciation may be shown at this address: 1H3STBxuzEHZQQD4hkjVE22TWTazcZzeBw

At first sight i was like this:  Shocked Shocked Shocked Shocked Shocked  Cheesy Cheesy Cheesy Cheesy
A bit later...  Cry

Seg-faulted after a few seconds under the load. Not sure why, can't find any logs about it... :/

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 26, 2011, 09:59:19 PM
 #20

I don't know how to change MSL in Linux, though Smiley


I think it's possible by patching the kernel.
This will however not fix the issue completly.

I'm using tcp_max_tw_bucket (or something similar) atm - that limits TW-sockets to 10k max.
It works, but it does not solve the problem

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!