DavinciJ15
|
|
September 22, 2011, 05:10:45 AM |
|
Does your daemon have the 4diff patch? I assumed the latest version of namecoind has those patches but I'm not sure. I don't know for sure but I'd be surprised if it's included in the stock namecoind. Perhaps best to check with the devs. I haven't got far enough to with native merged mining to need a patched namecoind yet so I haven't checked... a quick grep search shows the patch is not applied I will I apply it after I modify 4diff.txt for namecoind source code changes unless you have a copy that google is hiding.
|
|
|
|
shads (OP)
|
|
September 22, 2011, 09:39:35 AM |
|
a quick grep search shows the patch is not applied I will I apply it after I modify 4diff.txt for namecoind source code changes unless you have a copy that google is hiding.
Well the good news is after reviewing your logs I'm pretty sure 4diff will solve your problem. If you look at the consolidated log I sent you you can see towards the end of the period the namecoin daemon suddenly starts sending masses of duplicate works and at one point reaches a duplicate rate of nearly 100%. This is a known bug that the 4diff patch fixes. The problem here is that poolserverj checks for duplicates incoming from the daemon and discards them if it finds one. If this bug is happening you're likely to be only gettting one unique work/second from the daemon. Because it can't fill it's cache psj keeps issuing getwork requests to the daemon and keeps discarding them when they come back duplicates. If poolserverj wasn't behaving like this you would be issuing the same work to all your miners, you and they would think everything is fine but eventually you'd notice that your pool is not finding anywhere near as many blocks as it should be. If you've got 10 miners all working on the same work you're effectively only getting 1 miners worth of work done. IMHO it's better for the server to crash in this case then at least you'll know something is wrong. Pushpool does not check incoming work for duplicates so this is why you saw it wasn't working nearly as hard. It was feeding the duplicates to your miners then going back to sleep whereas psj was caning your daemon trying to get valid work and eventually everything went down in flames... You will see massive improvements with a 4diff patched daemon.
|
|
|
|
flower1024
Legendary
Offline
Activity: 1428
Merit: 1000
|
|
September 22, 2011, 09:45:17 AM |
|
is it feasible to include the mining proxy (for merged mining) directly in PoolServJ?
if namecoin daemon dies bitcoin deamon could still deliever getworks (or vice versa)
|
|
|
|
shads (OP)
|
|
September 22, 2011, 09:47:46 AM |
|
is it feasible to include the mining proxy (for merged mining) directly in PoolServJ?
if namecoin daemon dies bitcoin deamon could still deliever getworks (or vice versa)
I am working on that right now... You can thank Davinci for tempting me with a fat bounty or I probably wouldn't have. I'm waiting on some detail from one of the namecoin devs before I can start implementing...
|
|
|
|
flower1024
Legendary
Offline
Activity: 1428
Merit: 1000
|
|
September 22, 2011, 09:50:06 AM |
|
is it feasible to include the mining proxy (for merged mining) directly in PoolServJ?
if namecoin daemon dies bitcoin deamon could still deliever getworks (or vice versa)
I am working on that right now... You can thank Davinci for tempting me with a fat bounty or I probably wouldn't have. I'm waiting on some detail from one of the namecoin devs before I can start implementing... +1 to davince and you ,)
|
|
|
|
DavinciJ15
|
|
September 22, 2011, 12:20:48 PM |
|
a quick grep search shows the patch is not applied I will I apply it after I modify 4diff.txt for namecoind source code changes unless you have a copy that google is hiding.
Well the good news is after reviewing your logs I'm pretty sure 4diff will solve your problem. If you look at the consolidated log I sent you you can see towards the end of the period the namecoin daemon suddenly starts sending masses of duplicate works and at one point reaches a duplicate rate of nearly 100%. This is a known bug that the 4diff patch fixes. The problem here is that poolserverj checks for duplicates incoming from the daemon and discards them if it finds one. If this bug is happening you're likely to be only gettting one unique work/second from the daemon. Because it can't fill it's cache psj keeps issuing getwork requests to the daemon and keeps discarding them when they come back duplicates. If poolserverj wasn't behaving like this you would be issuing the same work to all your miners, you and they would think everything is fine but eventually you'd notice that your pool is not finding anywhere near as many blocks as it should be. If you've got 10 miners all working on the same work you're effectively only getting 1 miners worth of work done. IMHO it's better for the server to crash in this case then at least you'll know something is wrong. Pushpool does not check incoming work for duplicates so this is why you saw it wasn't working nearly as hard. It was feeding the duplicates to your miners then going back to sleep whereas psj was caning your daemon trying to get valid work and eventually everything went down in flames... You will see massive improvements with a 4diff patched daemon. Thanks I will apply the patch and document provide the public with edited version of 4diff as it does not patch the namecoind code correctly.
|
|
|
|
|
DavinciJ15
|
|
September 22, 2011, 01:07:09 PM |
|
It does not compile namecoind root@bitcoinpool:/home/bitcoinpool/ArtForz-namecoin-127deb4/src# make -f makefile.unix namecoind g++ -c -O2 -Wno-invalid-offsetof -Wformat -g -D__WXDEBUG__ -DNOPCH -DFOURWAYSSE2 -DUSE_SSL -o obj/nogui/namecoin.o namecoin.cpp namecoin.cpp: In function âjson_spirit::Value name_history(const json_spirit::Array&, bool)â: namecoin.cpp:616:30: error: âforeachâ was not declared in this scope namecoin.cpp:617:9: error: expected â;â before â{â token namecoin.cpp:1789:84: error: expected â}â at end of input namecoin.cpp:1789:84: error: expected â}â at end of input make: *** [obj/nogui/namecoin.o] Error 1
|
|
|
|
DavinciJ15
|
|
September 22, 2011, 06:56:30 PM |
|
It does not compile namecoind root@bitcoinpool:/home/bitcoinpool/ArtForz-namecoin-127deb4/src# make -f makefile.unix namecoind g++ -c -O2 -Wno-invalid-offsetof -Wformat -g -D__WXDEBUG__ -DNOPCH -DFOURWAYSSE2 -DUSE_SSL -o obj/nogui/namecoin.o namecoin.cpp namecoin.cpp: In function âjson_spirit::Value name_history(const json_spirit::Array&, bool)â: namecoin.cpp:616:30: error: âforeachâ was not declared in this scope namecoin.cpp:617:9: error: expected â;â before â{â token namecoin.cpp:1789:84: error: expected â}â at end of input namecoin.cpp:1789:84: error: expected â}â at end of input make: *** [obj/nogui/namecoin.o] Error 1
Commenting out the code in name_history function (its not needed for mining) does the trick.
|
|
|
|
DavinciJ15
|
|
September 22, 2011, 11:15:06 PM Last edit: September 22, 2011, 11:30:03 PM by DavinciJ15 |
|
So PoolSeverJ is working perfectly with my patched namecoind. The CGIMiner is unable to take down the pool with scantime set to 1.
Thanks you have been a big help in fact you have helped me the most.
|
|
|
|
shads (OP)
|
|
September 26, 2011, 11:32:32 AM |
|
Changelog:
[0.3.0.FINAL] - partial implementation of worker cache preloading. This is not active yet. - fix: stop checking if continuation state is initial. It can be if a previous Jetty filter has suspended/resumed the request. In that case it immediately sends and empty LP response. This might be the cause of a bug where cgminer immediately sends another LP. This turns into a spam loop. This only seems to be triggered under heavy load and only seems to happen with cgminer clients connected. - added commented out condition to stop manual block checks if native LP enabled and verification off. - remove warning for native LP when a manual block check is fired. We want this occur in most circumstances. - extra trace targets for longpoll empty and expired responses. - fix: handle clients sending longpoll without trailing slash. This can result in the LP request being routed through the main handler and returning immediately setting up request spamming loop. This patch checks for the LP url from the main handler and redirects to the LP handler if it's found. - add threadDump method to mgmt interface - add timeout to notify-lp-clients-executor thread in case dispatch threads do not report back correctly and counters aren't updated. Solved a problem where counter mismatch can prevent the thread from ever finishing thus hogging the executor and preventing future long poll cycles. - add shutdown check to lp dispatch timeout
|
|
|
|
DavinciJ15
|
|
September 26, 2011, 01:23:26 PM |
|
Changelog:
[0.3.0.FINAL] - partial implementation of worker cache preloading. This is not active yet. - fix: stop checking if continuation state is initial. It can be if a previous Jetty filter has suspended/resumed the request. In that case it immediately sends and empty LP response. This might be the cause of a bug where cgminer immediately sends another LP. This turns into a spam loop. This only seems to be triggered under heavy load and only seems to happen with cgminer clients connected. - added commented out condition to stop manual block checks if native LP enabled and verification off. - remove warning for native LP when a manual block check is fired. We want this occur in most circumstances. - extra trace targets for longpoll empty and expired responses. - fix: handle clients sending longpoll without trailing slash. This can result in the LP request being routed through the main handler and returning immediately setting up request spamming loop. This patch checks for the LP url from the main handler and redirects to the LP handler if it's found. - add threadDump method to mgmt interface - add timeout to notify-lp-clients-executor thread in case dispatch threads do not report back correctly and counters aren't updated. Solved a problem where counter mismatch can prevent the thread from ever finishing thus hogging the executor and preventing future long poll cycles. - add shutdown check to lp dispatch timeout
What is "worker cache preloading"?
|
|
|
|
shads (OP)
|
|
September 26, 2011, 02:08:19 PM Last edit: September 26, 2011, 03:29:16 PM by shads |
|
What is "worker cache preloading"?
Well as it says it's not activated... you'd have to make some code mods and rebuild from source to use atm... But in a nutshell... when a busy pool comes up it's worker cache is empty. It suddenly get's hit by a ton of requests which translates into a ton of single selects to the db. Preloading dumps the worker id's from the cache to a file on shutdown. Then on start up it grabs the worker id's and does a single bulk select to fill the worker cache. Much more efficient but probably not an issue until you get to the terahash range.
|
|
|
|
DavinciJ15
|
|
September 26, 2011, 04:22:53 PM |
|
What is "worker cache preloading"?
Well as it say it's not activated... you'd have to make some code mods and rebuild from source to use atm... But in a nutshell... when a busy pool comes up it's worker cache is empty. It suddenly get's hit by a ton of requests which translates into a ton of single selects to the db. Preloading dumps the worker id's from the cache to a file on shutdown. Then on start up it grabs the worker id's and does a single bulk select to fill the worker cache. Much more efficient but probably not an issue until you get to the terahash range. This is where memcache can help. A pool would have a memcache server that has all the usernames and passwords cashed when it reboots one of the servers the cache will have all the users assuming the users where connecting to another server before hand. This allows for clustering so all servers have the same cached username and password. What's your opinion of this design?
|
|
|
|
eleuthria
Legendary
Offline
Activity: 1750
Merit: 1007
|
|
September 26, 2011, 06:00:21 PM |
|
After hammering out the last few bugs we found at BTC Guild with PoolServerJ, I'm almost ready to completely remove pushpool from my servers.
While the CPU load of PoolServerJ is higher than pushpool, I would not call it inefficient since PoolServerJ is doing far more work than pushpool. PoolServerJ is doing full difficulty checks internally, prioritizing getwork responses to known good clients [QoS filtering], organizing work from multiple bitcoind nodes [faster LP delivery], running a cache of work so miner requests are responded to from the server rather than the server proxying the request to bitcoind [faster getwork delivery]. In the end, as long as the servers have enough extra RAM, the performance is outstanding.
PoolServerJ's work caching means if bitcoind stutters for a second, you have work ready and available to send to your miners, whereas pushpool becomes useless until bitcoind can respond, and cause miners to complain.
The tradeoff is simple: pushpool will run on minimal specs, but becomes slow to respond after a certain level of load, even though CPU & RAM are sitting idle. PoolServerJ will use significantly more RAM, but as long as its there, it won't choke until either your bitcoind can't provide the work as fast as its being requested, or your CPU is at full utilization. A small pool would probably stick with pushpool, due to its low footprint and reasonable performance. Any pool that is starting to have growing pains (they start around 250 GH/sec but they aren't "a problem" til ~450 GH/sec) would benefit greatly from taking a look at PoolServerJ.
|
RIP BTC Guild, April 2011 - June 2015
|
|
|
eleuthria
Legendary
Offline
Activity: 1750
Merit: 1007
|
|
September 29, 2011, 02:09:04 PM |
|
An update: BTC Guild is running PoolServerJ for the entire pool. We were able to push out 10 pushpool/10 bitcoind nodes with load balancing and replace them with a single PoolServerJ and 2 bitcoind nodes.
|
RIP BTC Guild, April 2011 - June 2015
|
|
|
shads (OP)
|
|
September 29, 2011, 02:26:03 PM |
|
An update: BTC Guild is running PoolServerJ for the entire pool. We were able to push out 10 pushpool/10 bitcoind nodes with load balancing and replace them with a single PoolServerJ and 2 bitcoind nodes.
you forgot to mention the whopping 16% cpu load... but I'm glad yo forgot to mention the memory usage
|
|
|
|
DavinciJ15
|
|
September 29, 2011, 07:19:44 PM |
|
Quick question?
Is there a way to include the USER_ID found in the worker table when inserting into the shares table?
|
|
|
|
BurningToad
|
|
October 06, 2011, 05:19:42 PM |
|
An update: BTC Guild is running PoolServerJ for the entire pool. We were able to push out 10 pushpool/10 bitcoind nodes with load balancing and replace them with a single PoolServerJ and 2 bitcoind nodes.
you forgot to mention the whopping 16% cpu load... but I'm glad yo forgot to mention the memory usage Hmm, could you post some configuration perhaps? I've been getting a very high CPU load with PoolServerJ. Really need a way to reduce it.
|
|
|
|
|
|