DrHaribo (OP)
Legendary
Offline
Activity: 2730
Merit: 1034
Needs more jiggawatts
|
|
December 18, 2012, 12:30:07 PM |
|
Turbor, your explanation is as good as any I can come up with.
Looks like all I/O got slower and slower. After a while some getwork requests took 5 minutes. The database connections terminated with "end of file" and it was impossible to establish new ones. syslog tells me the web server got stuck for over 120 seconds waiting for a page from disk. But there is nothing wrong with the disks. The pool server had several hundred thousand closed connections that somehow the kernel stopped clearing away. The load was 25-30 with almost no cpu usage. Apparently all these processes were waiting on IO. I could still send a request to bitcoind or namecoind and it would register in their logs but no response would come out. After restarting each process everything is fine. It's like the kernel just wouldn't honor IO requests from the "old" processes. I've never seen anything like it.
So yes, maybe it was the ghost in the machine.
I'll be working on robustness and automatic recovery, although I hope not to see this type of behavior again.
|
|
|
|
lenny_
Legendary
Offline
Activity: 1036
Merit: 1000
DARKNETMARKETS.COM
|
|
December 18, 2012, 06:48:29 PM |
|
DrHaribo, what you think about implementing Hall of Fame (users who found most amount of blocks) like in Slush pool? http://mining.bitcoin.cz/stats/hall-of-fame/
|
|
|
|
loshia
Legendary
Offline
Activity: 1610
Merit: 1000
|
|
December 18, 2012, 11:18:35 PM |
|
Turbor, your explanation is as good as any I can come up with.
Any recent kernel upgrade of the pool? If yes just go back to old one
|
|
|
|
organofcorti
Donator
Legendary
Offline
Activity: 2058
Merit: 1007
Poor impulse control.
|
|
December 18, 2012, 11:40:19 PM |
|
.... although I hope not to see this type of behavior again.
Everyone says this about their child at some point
|
|
|
|
LazyOtto
|
|
December 19, 2012, 03:47:13 AM |
|
So you did something to resume 'normal' / getwork processing?
I had eight shifts with zero or near zero scores.
|
|
|
|
DrHaribo (OP)
Legendary
Offline
Activity: 2730
Merit: 1034
Needs more jiggawatts
|
|
December 19, 2012, 02:05:23 PM |
|
Hall of fame is on my list. I had eight shifts with zero or near zero scores.
Could be because your miner stayed at a backup pool or that it crashed when it was disconnected by the server. Some miner programs have been having those issues lately.
|
|
|
|
MrTeal
Legendary
Offline
Activity: 1274
Merit: 1004
|
|
December 19, 2012, 03:07:11 PM |
|
.... although I hope not to see this type of behavior again.
Everyone says this about their child at some point I think I say this every day for my 2 year old.
|
|
|
|
DrHaribo (OP)
Legendary
Offline
Activity: 2730
Merit: 1034
Needs more jiggawatts
|
|
December 19, 2012, 04:32:28 PM |
|
Need to restart bitcoind and namecoind. They take a while to get up and running again, sorry for the delay..
|
|
|
|
lenny_
Legendary
Offline
Activity: 1036
Merit: 1000
DARKNETMARKETS.COM
|
|
December 19, 2012, 04:41:51 PM |
|
Stratum port 5050 seems to be down. Port 3333 working OK.
|
|
|
|
DrHaribo (OP)
Legendary
Offline
Activity: 2730
Merit: 1034
Needs more jiggawatts
|
|
December 20, 2012, 11:53:54 AM |
|
Stratum port 5050 seems to be down. Port 3333 working OK.
It's working for me. It's just a redirect, so 5050 and 3333 should be up/down at the same time.
|
|
|
|
juhakall
|
|
December 20, 2012, 06:28:27 PM |
|
I noticed an odd rejected share, using cgminer 2.10.2 & stratum: Rejected 3fd895f5 Diff 4/4 GPU 1 pool 0 (Work below difficulty) I only barely understand how share difficulties work, but I was told in #cgminer IRC channel that the target hash for diff4 should be 0x000000003fffc000. My rejected share hash was 0x000000003fd895f5, which is smaller than the target, so shouldn't it be accepted? I'm also aware of the discrepancy between "real" and actually used diff1 shares, which is explained here by kanoi. Could this be a similar problem, meaning that there's a disagreement between cgminer and BitMinter as to what actually is the target for a diff4 share?
|
|
|
|
miter_myles
|
|
December 20, 2012, 11:00:01 PM |
|
down?
|
BTC - 1D7g5395bs7idApTx1KTXrfDW7JUgzx6Z5 LTC - LVFukQnCWUimBxZuXKqTVKy1L2Jb8kZasL
|
|
|
juhakall
|
|
December 20, 2012, 11:20:17 PM |
|
Yeah, seems to be down.
|
|
|
|
Equilux
|
|
December 20, 2012, 11:23:20 PM |
|
Website itself is up, but my client can't connect ...
|
|
|
|
DrHaribo (OP)
Legendary
Offline
Activity: 2730
Merit: 1034
Needs more jiggawatts
|
|
December 20, 2012, 11:58:08 PM |
|
bitcoind and namecoind both slowed down at the same time. There's some sort of issue there. Pool recovered after a couple of minutes though.
juhakall, a 4/4 difficulty proof being rejected sounds strange.. I'll have a look at it. This was with Stratum?
|
|
|
|
juhakall
|
|
December 21, 2012, 03:47:08 AM |
|
juhakall, a 4/4 difficulty proof being rejected sounds strange.. I'll have a look at it. This was with Stratum?
Yes, that's with stratum.
|
|
|
|
DrHaribo (OP)
Legendary
Offline
Activity: 2730
Merit: 1034
Needs more jiggawatts
|
|
December 21, 2012, 09:49:50 AM |
|
juhakall, a 4/4 difficulty proof being rejected sounds strange.. I'll have a look at it. This was with Stratum?
Yes, that's with stratum. The target is the exact same you mentioned previously, so the share should be accepted. Do you have the time when this happened? If so I'll have a look at the logs if there are any errors. Maybe the data is mangled somehow, so when the server hashes it the share doesn't even meet diff 1.
|
|
|
|
juhakall
|
|
December 21, 2012, 12:22:58 PM |
|
juhakall, a 4/4 difficulty proof being rejected sounds strange.. I'll have a look at it. This was with Stratum?
Yes, that's with stratum. The target is the exact same you mentioned previously, so the share should be accepted. Do you have the time when this happened? If so I'll have a look at the logs if there are any errors. Maybe the data is mangled somehow, so when the server hashes it the share doesn't even meet diff 1. Date was 2012-12-20 17:29:06 UTC.
|
|
|
|
DrHaribo (OP)
Legendary
Offline
Activity: 2730
Merit: 1034
Needs more jiggawatts
|
|
December 21, 2012, 12:50:05 PM |
|
Date was 2012-12-20 17:29:06 UTC.
Nothing logged at that time. I don't really have any data to go on. Have you seen this multiple times or just that once? If it happens repeatedly I could add more logging to try and figure out what's going on.
|
|
|
|
juhakall
|
|
December 21, 2012, 03:39:02 PM |
|
Nope, I just happened to catch that one as it happened. I'll try using cgminer's sharelog function, but if that rejection only occurs with shares very close to the target, it might be rare and take a while to see again.
|
|
|
|
|