Bitcoin Forum
May 01, 2024, 10:12:54 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2] 3 4 5 6 7 8 9 10 11 12 13 »  All
  Print  
Author Topic: [20 BTC] Multithreaded Keep-alive Implementation in Bitcoind  (Read 31401 times)
spiccioli
Legendary
*
Offline Offline

Activity: 1378
Merit: 1003

nec sine labore


View Profile
June 26, 2011, 10:20:49 PM
 #21

I don't know how to change MSL in Linux, though Smiley


I think it's possible by patching the kernel.
This will however not fix the issue completly.

I'm using tcp_max_tw_bucket (or something similar) atm - that limits TW-sockets to 10k max.
It works, but it does not solve the problem


Jine,

I've google around a little, tcp_max_tw_bucket puts a limit (default 180000) on the maximum number of sockets in TIME_WAIT, if this limit is exceeded sockets entering TIME_WAIT state are closed without waiting and a warning issued... but first you have to reach the limit.

tcp_fin_timeout, or net.ipv4.tcp_fin_timeout, on the other hand, seems to be what you need.

Lowering it (it could be 60 seconds right now) you should be able to limit  TIME_WAIT sockets to a lower number without/before reaching the bucket limit.

I mean, if you set it to 3 second, you'll end up having as many TIME_WAIT sockets as your incoming connections in a 3 seconds period, so your 25K sockets should go down to 1500-2000.

spiccioli.
1714601574
Hero Member
*
Offline Offline

Posts: 1714601574

View Profile Personal Message (Offline)

Ignore
1714601574
Reply with quote  #2

1714601574
Report to moderator
Bitcoin addresses contain a checksum, so it is very unlikely that mistyping an address will cause you to lose money.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
June 26, 2011, 10:27:05 PM
Last edit: June 26, 2011, 10:38:54 PM by JoelKatz
 #22

At first sight i was like this:  Shocked Shocked Shocked Shocked Shocked  Cheesy Cheesy Cheesy Cheesy
A bit later...  Cry

Seg-faulted after a few seconds under the load. Not sure why, can't find any logs about it... :/
Sorry to hear that. I'll audit my changes for the kinds of things that typically cause seg faults.

Update: Found the problem. Fixing it now.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 26, 2011, 10:30:38 PM
 #23

I mean, if you set it to 3 second, you'll end up having as many TIME_WAIT sockets as your incoming connections in a 3 seconds period, so your 25K sockets should go down to 1500-2000.

tcp_fin_timeout is already set to 3s, forgot to mention that in my previous post.
These settings are currently used (since a week back), and we still have the problems described before


Quote
echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
echo 0 > /proc/sys/net/ipv4/tcp_timestamps
echo 3 > /proc/sys/net/ipv4/tcp_fin_timeout
echo 30 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 5 > /proc/sys/net/ipv4/tcp_keepalive_probes
echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_all
echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts
echo 1 > /proc/sys/net/ipv4/tcp_syncookies

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 26, 2011, 10:31:21 PM
 #24

Sorry to hear that. I'll audit my changes for the kinds of things that typically cause seg faults.

Thanks, please do!
If we solve this permanently, I'm sending 20 BTC your way...

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
June 26, 2011, 10:41:44 PM
Last edit: June 26, 2011, 11:29:22 PM by JoelKatz
 #25

Okay, new build is up. It passes a pretty aggressive stress test, but I'll keep stressing it, just in case. Thanks for trying it.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
phorensic
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500



View Profile
June 26, 2011, 11:34:14 PM
 #26

You guys might find the comments at the bottom of this pushpool commit interesting as well: https://github.com/shanew/pushpool/commit/ef4f7261a839e628bfa24a988055a23fe442e5d2  He mentions changing from TCP sockets to unix domain sockets for the communications between pushpoold and bitcoind.
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 27, 2011, 12:03:50 AM
 #27

Okay, new build is up. It passes a pretty aggressive stress test, but I'll keep stressing it, just in case. Thanks for trying it.

Same URL? Will try it out in a moment.

You guys might find the comments at the bottom of this pushpool commit interesting as well: https://github.com/shanew/pushpool/commit/ef4f7261a839e628bfa24a988055a23fe442e5d2  He mentions changing from TCP sockets to unix domain sockets for the communications between pushpoold and bitcoind.

Thats actually better, BUT it would require modifications to both bitcoind and pushpoold.
This was the previous main target, but i couldn't find anyone able to implement it.

It would be really. really. awesome. Offering another 10 BTC for such an implementation.

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
June 27, 2011, 12:15:53 AM
 #28

Same URL? Will try it out in a moment.
Yes, same URL.

Quote
Thats actually better, BUT it would require modifications to both bitcoind and pushpoold.
This was the previous main target, but i couldn't find anyone able to implement it.
It's better in some ways and worse in others. Alone, it won't help the fact that bitcoind can't respond while it's waiting for another connection to respond. But it would eliminate the TIME_WAIT states that pile up because UNIX domain sockets don't have them. You'd need a multi-listening patch like mine to listen on a UNIX-domain socket and a TCP socket at the same time (or you could call select, but that's ugly). I haven't seen the code on the other side yet, so I don't know what's involved in making it use a UNIX domain socket.

Of course, if you have multi-threaded listening, keep alives, and UNIX domain sockets, that should *really* solve the problem.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
hamdi
Hero Member
*****
Offline Offline

Activity: 826
Merit: 500



View Profile
June 27, 2011, 12:17:44 AM
 #29

best would be to remove the network way from bitcoind to pushpoold.

on one hand shitty to force both running on one machine, but the network bottleneck would be gone by sharing that data via ram instead of network
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
June 27, 2011, 12:18:27 AM
 #30

best would be to remove the network way from bitcoind to pushpoold.

on one hand shitty to force both running on one machine, but the network bottleneck would be gone by sharing that data via ram instead of network
It is shared by RAM. Connections between two processes on the same machine don't actually flow over any network. It just emulates a network, and with that comes both good things and bad things.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 27, 2011, 01:01:47 AM
 #31

Same URL? Will try it out in a moment.
Yes, same URL.

Quote
Thats actually better, BUT it would require modifications to both bitcoind and pushpoold.
This was the previous main target, but i couldn't find anyone able to implement it.
It's better in some ways and worse in others. Alone, it won't help the fact that bitcoind can't respond while it's waiting for another connection to respond. But it would eliminate the TIME_WAIT states that pile up because UNIX domain sockets don't have them. You'd need a multi-listening patch like mine to listen on a UNIX-domain socket and a TCP socket at the same time (or you could call select, but that's ugly). I haven't seen the code on the other side yet, so I don't know what's involved in making it use a UNIX domain socket.

Of course, if you have multi-threaded listening, keep alives, and UNIX domain sockets, that should *really* solve the problem.


Unfortunately, i could not get this version to work in our production environment either.
The strange thing is that this DO work if i run it separately... Downloaded nightly build of blockchain, tried it out on port 1337 and it started, i could issue getwork-requests to it and it seems to have reused the socket to.

On the other hand, when i replace our bitcoind with this patched version - it starts, runs for 4 seconds and then silently dies. I was able to issue one getinfo requests from it's client - but then it died.

Straaaaange... but i really think we're onto something here.

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
June 27, 2011, 03:13:11 AM
Last edit: June 27, 2011, 04:30:29 AM by JoelKatz
 #32

It must be a problem that only appears under load or under some particular combination of requests. I'll try to do some more troubleshooting. I don't have much time left today, but I should have a few hours tomorrow that I can dedicate.

Update1: Do you compile bitcoind with any non-standard settings? And do you start it with any command line flags other than '-daemon'? Do you enable RPC over SSL?

Update2: I made a few cleanups and fixed a few very minor issues. But I can't replicate your issue, which means either I solved it or it requires something unique to your setup to replicate. If you get a chance to try my latest (same place) please do. If you can compile it with '-g' (I believe that's provided by default), make sure not to strip the executable, do a 'ulimit -c unlimited' before you execute it, and if it crashes, run 'gdb' on the core file like this: 'gdb /path/to/bitcoind /path/to/core.filename' and then message me the output of a 'where' command. (You may have to hit 'enter' a few times to get the full output. It'll be the last few lines that will be the most helpful.)

As soon as you have the core file, you can restart the original bitcoind. You don't need to have any downtime. Just make sure to pass 'gdb' the path to the bitcoind executable that generate the core file.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
June 27, 2011, 08:40:34 AM
Last edit: June 28, 2011, 02:01:34 AM by JoelKatz
 #33

There's another new version up. I realized the JSON code wasn't re-entrant, which creates a problem when you try to use it in more than one thread. Unfortunately, the 'getblock' code isn't re-entrant, and the simplest way to deal with that is to wrap all the RPC handlers in a big mutex. It doesn't seem to have any effect on performance though, so I'll leave it that way because it's much safer.

This version makes the JSON code re-entrant, but single threads calls to do the actual RPC. This is slightly less than optimum, but in all of my tests it made no difference. Multi-threading the actual RPC calls would carry a significant risk that some part of that code would break for no significant benefit and unless the 'getblock' code was pessimized for the most common case, it wouldn't benefit anyway. (Plus, there would have to be invasive changes to the code that handles when you successfully find a block, and that scares me because it's so critical and so hard to test.)

Please test this version. It should solve the problem.

I also have a version with UNIX domain sockets available if anyone's interested (it's not up at the moment, but PM me if you want it). It's very ugly right now because I haven't had time to polish it, but it does work. It supports a
'-unixsocket=<filename>' option. The protocol is a single line query and a single line response, no headers, no authentication (so put the socket in a directory only the authorized user can access). There is also code to issue RPC calls over the UNIX-domain socket, so you can see how to do it and see that it works. The biggest issue with it right now is that if you make any errors, it just closes the socket. You can issue multiple requests over a single connection though, and of course there are no stale socket issues.

The biggest ugliness is that I couldn't figure out how to bind a basic_istream to a local::stream_protocol::socket. So I had to use a 'receive' call instead of 'getline'. If anyone knows how to do that, I'd appreciate a PM. (I always meant to learn boost.)

In truth, none of these are the right solution. I have some ideas for the 'right' solution (bitcoind should push changes to the mining controller so it doesn't have to poll), and I'll try to get them thought out and proposed as modifications to the official source. (Think of it as extending long polling back one more link in the chain.)

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
NANO
Newbie
*
Offline Offline

Activity: 28
Merit: 0


View Profile
June 28, 2011, 01:40:23 AM
 #34

JoelKatz seems to be a Hero...
ius
Newbie
*
Offline Offline

Activity: 56
Merit: 0


View Profile
June 28, 2011, 02:17:25 AM
 #35

In truth, none of these are the right solution. I have some ideas for the 'right' solution (bitcoind should push changes to the mining controller so it doesn't have to poll), and I'll try to get them thought out and proposed as modifications to the official source. (Think of it as extending long polling back one more link in the chain.)

Even so it would make much more sense to do it properly (so we end up having a useful pull request against bitcoin); the asio route appears to be the way to go (patch is there, yet bugged) instead of spawning multiple threads.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
June 28, 2011, 02:23:34 AM
 #36

Even so it would make much more sense to do it properly (so we end up having a useful pull request against bitcoin); the asio route appears to be the way to go (patch is there, yet bugged) instead of spawning multiple threads.
Oh, do you know where the patch is? There's a good chance I could debug it. I've been meaning to learn about boost anyway. (My day job involves high-performance, multi-threaded TCP server code.)

Asio won't actually gain you much over my patch. The main advantage of asio is when you have large numbers of connections, most of which aren't very active or when large numbers of them become active at the same time. You would want it in a mining controller. Think about the large number of connections, the long periods of inactivity, and the sudden burst when a new block comes out. Without asio, you have to have a context switch for each connection. With asio, you do not.

That said, there's basically no downside, and it's also the right thing to do.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 28, 2011, 02:25:13 AM
 #37

Lets try this out right now. Compiling the code as we speak.
Give you an update and/or coredump in a while Smiley

The "right way" seems pretty awesome to me... Smiley

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
June 28, 2011, 02:32:08 AM
Last edit: June 28, 2011, 02:52:46 AM by JoelKatz
 #38

The "right way" seems pretty awesome to me... Smiley
Okay. If I get the existing asio work for bitcoind, I'll work on that. I've grabbed the source code to the mining daemon and am looking at how it implements long polling.

Update: It looks like pushpoold already has a way to do this, with blkmond. So the only issue is the connection buildup, which I think I've already fixed.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Jine (OP)
Sr. Member
****
Offline Offline

Activity: 403
Merit: 250


View Profile
June 28, 2011, 03:24:28 AM
 #39

Something goes wrong when running your current patched bitcoind in our prod. environment.
Now the process just hung, couldn't get any coredump out of it or similar Sad

Suggestions? I'll try this one in my personal dev-environment and see if i can replicate the issue.

Previous founder of Bit LC Inc. | I've always loved the idea of bitcoin.
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
June 28, 2011, 03:42:08 AM
Last edit: June 28, 2011, 04:09:41 AM by JoelKatz
 #40

Suggestions? I'll try this one in my personal dev-environment and see if i can replicate the issue.
I'm guessing that there's some path that doesn't release the RPC mutex. I put up a new build that might fix it, but it's hard to be sure since I can't replicate the problem.

Update: If that fails, I can just pull the multi-threading stuff and only fix keep-alives. That's a no-brainer (two lines change in trivial ways) and is very unlikely to cause any problems.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Pages: « 1 [2] 3 4 5 6 7 8 9 10 11 12 13 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!