shad
|
|
January 12, 2012, 07:49:31 PM |
|
Would one of you guys be willing to run a version of the code with a lot of debug statements added? If so, I'll prepare something that might help us get to the bottom of these bugs. Thanks!
I'd be happy to. I seem to be getting this issue at least once a day (I have several boards!) so should be able to report back quickly. Wow, is it really that bad?! I'll get something ready for you and either PM it to you or create a branch on Github. I think we will need to log these extra messages to file, as shad suggested, so I'll be adding file logging in addition to the logging to the terminal (you don't want to pipe the regular output to a file because you'll have all that status update line in there many times, I think). in my case it happens only 1-2times a week i made a little update for the miner script 350.09 MH/s | 0: 44/0/0 0.0% | 1: 43/0/0 0.0% | 17m44s | AH00WIX5 will send the infos if it runs for 1 hour
|
15dUzJEUkxgjrtcvDSdsEDkXu7E7RCbNN3
|
|
|
|
|
|
|
|
"The nature of Bitcoin is such that once version 0.1 was released, the
core design was set in stone for the rest of its lifetime." -- Satoshi
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
|
|
|
shad
|
|
January 12, 2012, 09:19:29 PM |
|
339.42 MH/s | 0: 164/0/0 0.0% | 1: 165/0/0 0.0% | 1h10m | AH00WIX5
funny thing, fizzisist had the same idea and dropped a new version some hours ago to github
i am testing his new version and it looks good!
|
15dUzJEUkxgjrtcvDSdsEDkXu7E7RCbNN3
|
|
|
li_gangyi
|
|
January 12, 2012, 09:56:34 PM |
|
I've noticed the Mhash display in software sometimes goes crazy and displays an insanely high value, only to freeze there for a moment before dropping back down. The timer still works all the time though, not sure what"s going on there.
|
|
|
|
coblee
Donator
Legendary
Offline
Activity: 1654
Merit: 1286
Creator of Litecoin. Cryptocurrency enthusiast.
|
|
January 12, 2012, 10:01:37 PM |
|
It might be the hub. I can get 290-300mhash on my iMac, but switching to the hub dropped it to 260-270.
I think I found a reason for slowed down USB communication when running with many boards. I pushed an update to Github, and in my own testing it seems to have helped. Please let me know how it works for you. That worked! My hashrate went from 270 to 313.
|
|
|
|
shad
|
|
January 12, 2012, 10:43:03 PM Last edit: January 12, 2012, 11:21:11 PM by shad |
|
I've noticed the Mhash display in software sometimes goes crazy and displays an insanely high value, only to freeze there for a moment before dropping back down. The timer still works all the time though, not sure what"s going on there.
do you have one or more boards? can somebody confirm the drop of reject-rate i see with the ned version? 0.3%
|
15dUzJEUkxgjrtcvDSdsEDkXu7E7RCbNN3
|
|
|
freshzive
|
|
January 13, 2012, 01:53:57 AM |
|
Testing now on a couple of mine. What's the command to update using git? "clone" will only let me use empty directories which is super annoying.
|
|
|
|
coblee
Donator
Legendary
Offline
Activity: 1654
Merit: 1286
Creator of Litecoin. Cryptocurrency enthusiast.
|
|
January 13, 2012, 02:08:47 AM |
|
Testing now on a couple of mine. What's the command to update using git? "clone" will only let me use empty directories which is super annoying.
git pull
|
|
|
|
freshzive
|
|
January 13, 2012, 02:39:42 AM |
|
fpga1 before: Total hashrate for device: 287.48 MH/s / 290.07 MH/s after: Total hashrate for device: 325.48 MH/s / 322.11 MH/s / 317.05 MH/s fpga2 before: Total hashrate for device: 277.52 MH/s / 279.72 MH/s after: Total hashrate for device: 309.55 MH/s / 309.55 MH/s / 301.81 MH/s Seems like there is a 30MH/s or more increase using the new version, thanks for the great update. My rejects are still in the 1-2% range though. Now to test whether I get these great speeds using my hub
|
|
|
|
freshzive
|
|
January 13, 2012, 03:38:52 AM |
|
Faster using my hub now with the updates, but still ~20Mh/s slower than connecting them directly.
|
|
|
|
|
fizzisist (OP)
|
|
January 13, 2012, 08:20:41 AM |
|
I've noticed the Mhash display in software sometimes goes crazy and displays an insanely high value, only to freeze there for a moment before dropping back down. The timer still works all the time though, not sure what"s going on there.
I think I know exactly why this is, and it has to do with the somewhat stupid way that it calculates the hashrate based on the last 3 hours. This was done in a real "quick and dirty" method and not a very clever one. I'll play around with it a bit more to see if I can come up with something more intelligent. Well, I might as well put the question to you guys. How would you keep a rolling counter of the number of valid nonces received in the past N seconds? I wouldn't be surprised if shad comes up with the exact same idea as me. Btw, shad, I see about the same number of rejects as before (1-2%). What pool are you using? Are you still getting such low reject rates? freshzive, what kind of hub are you using? Is it slow when a single board is connected to the hub, meaning that the hub itself is to blame? I'm currently using 2 of seven port hubs with one connected to the other and I see no slowdown.
|
|
|
|
coblee
Donator
Legendary
Offline
Activity: 1654
Merit: 1286
Creator of Litecoin. Cryptocurrency enthusiast.
|
|
January 13, 2012, 08:52:15 AM |
|
I just had the 0 mhash bug happened to me. Stopping the mining and restarting didn't help. I had to reprogram it and things are back to normal.
|
|
|
|
Hawkix
|
|
January 13, 2012, 12:14:14 PM |
|
Well, I might as well put the question to you guys. How would you keep a rolling counter of the number of valid nonces received in the past N seconds? I wouldn't be surprised if shad comes up with the exact same idea as me. You have two possibilities. Either keep the data in some cyclic buffer and recalculate upon new insertion to this buffer. Or, less precisely, but easy to code, use the exponential decay formulae: AvgValue = decay*OldValue + (1-decay)*NewValue where decay is something like 0.99 or so.
|
|
|
|
freshzive
|
|
January 13, 2012, 02:37:51 PM |
|
freshzive, what kind of hub are you using? Is it slow when a single board is connected to the hub, meaning that the hub itself is to blame? I'm currently using 2 of seven port hubs with one connected to the other and I see no slowdown. I'm using this one: http://www.amazon.com/Plugable-USB-Port-Power-Adapter/dp/B00483WRZ6/ref=sr_1_2?ie=UTF8&qid=1326465373&sr=8-2 . And yes, it happens even when a single FPGA is connected, so maybe the hub just blows. Weird that it sped up a little with the new software though. On a more positive note, my boards are at <1% rejects this morning as shad had mentioned. This is on slush's pool. Still chugging away at a high hashrate too, yay! 329.55 MH/s | 0: 1500/7/7 0.5%/0.5% | 1: 1473/2/10 0.1%/0.7% | 10h55m | AH00WOVL 314.09 MH/s | 0: 1431/6/9 0.4%/0.6% | 1: 1519/7/6 0.5%/0.4% | 10h56m | AH00WOWI
|
|
|
|
allinvain
Legendary
Offline
Activity: 3080
Merit: 1080
|
|
January 13, 2012, 03:31:53 PM |
|
quick tip (maybe you know this already) but find the pool with the lowest latency to you (ping test it). It will help improve your hashrate.
|
|
|
|
shad
|
|
January 13, 2012, 03:56:33 PM Last edit: January 13, 2012, 04:29:22 PM by shad |
|
1. i am also at slush's, i believe this guy isn't sleeping if there is a pool problem it is fixed so fast 2. i dont like this "maybe" MHs/sec, i am testing some optimization and i wanted a stable allover measurement so i did some changes accsum = 0 for chain in self.chain_list: acc = self.accepted_count[chain] accsum = accsum + acc rej = self.rejected_count[chain] tot = self.nonce_count[chain] inv = self.invalid_count[chain] try: rej_pct = 100.*rej/(acc+rej) except ZeroDivisionError: rej_pct = 0 try: inv_pct = 100.*inv/tot except ZeroDivisionError: inv_pct = 0 status += ' | %d: %d/%d/%d %.1f%%/%.1f%%' % (chain, acc, rej, inv, rej_pct, inv_pct) status += ' | ' + formatTime(time()-self.start_time) status += ' | %.2f Gn/min' % (accsum/((time()-self.start_time+1)/60))
i removed the id thing because of space, even with the original mine.py the line is becoming to long for a windows-console.... i am thinking about a 2line status-line, but thats not the priority atm i get 4.7Gn/min with the github code and 4.8Gn/min with my code so now its .2f, have to test this over a longer time and i guess the problem of "hanging up" has to do with the python lock() thing you implemantation of lock() is clean afaik found some forum posts on the internet with people having problems that 2 threads lock at the same time but if its really a python bug... we are running on an old version so hard to say i am doing some testing with a main-thread and only 1 mining thread for 2 chains, because the only part of the code that needs time is the communication-part, everything else i measured with 0.0000 seconds 3. @thirdlight: what OS do you have? and what python version? i guess 2.6.6 or 2.6.7? 4. @fizzisist: forget the last PM i send you yesterday, now i realized that you need to run 2 time mine.py if you have 2 cards i only have one
|
15dUzJEUkxgjrtcvDSdsEDkXu7E7RCbNN3
|
|
|
|
shad
|
|
January 13, 2012, 05:17:35 PM |
|
dont know if its really needed if you have a trusted network Changelog 2.6.6 to 2.6.7 What's New in Python 2.6.7 rc 2? ================================
*Release date: 2011-05-20*
*NOTE: Python 2.6 is in security-fix-only mode. No non-security bug fixes are allowed. Python 2.6.7 and beyond will be source only releases.*
Library -------
- Issue #11662: Make urllib and urllib2 ignore redirections if the scheme is not HTTP, HTTPS or FTP (CVE-2011-1521).
- Issue #11442: Add a charset parameter to the Content-type in SimpleHTTPServer to avoid XSS attacks.
What's New in Python 2.6.7 rc 1? ================================
*Release date: 2011-05-06*
Library -------
- Issue #9129: smtpd.py is vulnerable to DoS attacks deriving from missing error handling when accepting a new connection.
|
15dUzJEUkxgjrtcvDSdsEDkXu7E7RCbNN3
|
|
|
fizzisist (OP)
|
|
January 13, 2012, 10:48:00 PM |
|
Sorry, things have been really busy with my day job this week, so I only have a minute to address the lock-up bug a few people have been having. Shad, thanks for finding the info about the possible python bug where threads could obtain the lock at the same time. To test this, could someone having the bug uncomment line 166 in rpcClient.py? It also needs a couple self.'s added to it. It should read:
self.logger.reportDebug("(FPGA%d) jobqueue loaded (%d)" % (chain, self.jobqueue[chain].qsize()))
This would tell us if the miner threads are locked up, but the rpcClient is still running. This only adds one line before each "Job data loaded" output so it shouldn't overwhelm the log output.
|
|
|
|
thirdlight
|
|
January 14, 2012, 09:24:31 AM |
|
could someone having the bug... Running on one board now, seeing jobqueue loaded (1)I'll swap some other boards over to this version a little later...
|
|
|
|
|