Soros Shorts
Donator
Legendary
Offline
Activity: 1617
Merit: 1012
|
|
August 01, 2013, 11:37:11 AM |
|
I'm having an issue (sort of) in Win8x64 with CGMiner 3.3.1. CGMiner stops updating it's stats, and acts like it's frozen. Key commands don't do anything, and if I didn't know otherwise, I'd think it had frozen. It's still using CPU time, using network activity, and my pools keep getting shares, so I know it's working. It happened again today. I checked my miner tonight (~1am on the 1st), and it had frozen ~10:15am on the 31st. So it sat for almost 15 hours acting like it had frozen, but it was running all day. I minimized and opened it again, and now it's working just fine, and even showing the shares that should have been going by when it was "frozen". I'm assuming you're going to say this is a curses issue? I really have no idea I'm not a programmer. Other than that issue, it's working great on my BFL equipment. Thanks! I have also seen this happen with 3.3.1 on low-powered CPUs (Atom class) on both Windows 7 and 8.
|
|
|
|
lastbit
|
|
August 01, 2013, 11:49:56 AM |
|
I have an Avalon batch3 4 modules, firmware 20130723. After a few hours of hashing, some of the modules stops (once all stopped, once three of them, once two of them). The other(s) continue hashing. Temperatures are normal, 21,65,50. It happened @300 and @282MHz, although unit can hash @350MHz at least for some time. I have measured power consumption at the wall and it's in normal PSU parameters. When module(s) stop, there's this error in System Log: usb 1-1: clear tt 1 (0030) error -71 Any ideas?
|
|
|
|
ebereon
|
|
August 01, 2013, 11:58:25 AM |
|
Temperatures are normal, 21,65,50.
Are you sure? 65 is chip max I think and also standard cutoff in cgminer?
|
|
|
|
lastbit
|
|
August 01, 2013, 12:42:52 PM |
|
Temperatures are normal, 21,65,50.
Are you sure? 65 is chip max I think and also standard cutoff in cgminer? Default target was 70, cutoff 90. I have put them at 65 and 85.
|
|
|
|
PSL
Member
Offline
Activity: 166
Merit: 10
|
|
August 01, 2013, 01:24:29 PM |
|
My GPU was SICK for several days and I missed that because monitoring script reading API port 4028 reported OK. Is there a way to detect SICK card through API? cgminer 3.2.2 reported: [2013-07-28 11:29:54] Stratum from pool 1 detected new block [2013-07-28 11:29:55] Pool 1 stale share detected, discarding [2013-07-28 11:29:56] Accepted 876fa488 Diff 4/2 GPU 0 pool 1 [2013-07-28 11:31:28] Stratum connection to pool 1 interrupted [2013-07-28 11:31:28] Lost 517 shares due to stratum disconnect on pool 1 [2013-07-28 11:31:30] Pool 1 stratum share submission failure [2013-07-28 11:32:00] Pool 1 communication resumed, submitting work [2013-07-28 11:32:00] Rejected acc5c400 Diff 3/2 GPU 0 pool 1 [2013-07-28 11:32:32] GPU0: Idle for more than 60 seconds, declaring SICK! [2013-07-28 11:32:32] GPU0: Attempting to restart [2013-07-28 11:32:32] Thread 0 still exists, killing it off [2013-07-28 11:32:32] Thread 0 restarted
"devs" report for SICK card: echo '{"command" : "devs"}' | nc localhost 4028 | tr -d '\0' | python -mjson.tool { "DEVS": [ { "Accepted": 694192, "Diff1 Work": 2380127, "Difficulty Accepted": 1360131.0, "Difficulty Rejected": 436120.0, "Enabled": "Y", "Fan Percent": 56, "Fan Speed": -1, "GPU": 0, "GPU Activity": 0, "GPU Clock": 157, "GPU Voltage": 1.1, "Hardware Errors": 0, "Intensity": "18", "Last Share Difficulty": 2.0, "Last Share Pool": 1, "Last Share Time": 1375003796, "Last Valid Work": 1375003890, "MHS 5s": 0.0, "MHS av": 0.17, "Memory Clock": 300, "Powertune": 0, "Rejected": 222954, "Status": "Alive", "Temperature": 40.0, "Total MH": 156195.3567, "Utility": 44.85 } ], "STATUS": [ { "Code": 9, "Description": "cgminer 3.2.2", "Msg": "1 GPU(s) - ", "STATUS": "S", "When": 1375361725 } ], "id": 1 }
I am not sure but this could be a bug. I can try to detect SICK state from several parameters (MHS 5s, GPU Activity, Temperature) but is it correct way? If it is, what parameter should be used for detection? BTW, reported parameter "MHS av" is wrong, it was 0.00, because card was sick for several days...
|
|
|
|
ebereon
|
|
August 01, 2013, 01:38:34 PM |
|
Temperatures are normal, 21,65,50.
Are you sure? 65 is chip max I think and also standard cutoff in cgminer? Default target was 70, cutoff 90. I have put them at 65 and 85. My batch#2 Avalons make problems with 55+. Try 50 as target and you will have no issues.
|
|
|
|
lastbit
|
|
August 01, 2013, 01:54:28 PM |
|
Temperatures are normal, 21,65,50.
Are you sure? 65 is chip max I think and also standard cutoff in cgminer? Default target was 70, cutoff 90. I have put them at 65 and 85. My batch#2 Avalons make problems with 55+. Try 50 as target and you will have no issues. in batch#3 they moved the temperature sensor to the board. 50 @batch#2 is roughly equiv. to 70 @batch#3. So I'm actually 5 degrees lower.
|
|
|
|
crazyates
Legendary
Offline
Activity: 952
Merit: 1000
|
|
August 01, 2013, 03:20:41 PM |
|
I'm having an issue (sort of) in Win8x64 with CGMiner 3.3.1. CGMiner stops updating it's stats, and acts like it's frozen. Key commands don't do anything, and if I didn't know otherwise, I'd think it had frozen. It's still using CPU time, using network activity, and my pools keep getting shares, so I know it's working. It happened again today. I checked my miner tonight (~1am on the 1st), and it had frozen ~10:15am on the 31st. So it sat for almost 15 hours acting like it had frozen, but it was running all day. I minimized and opened it again, and now it's working just fine, and even showing the shares that should have been going by when it was "frozen". I'm assuming you're going to say this is a curses issue? I really have no idea I'm not a programmer. Other than that issue, it's working great on my BFL equipment. Thanks! I have also seen this happen with 3.3.1 on low-powered CPUs (Atom class) on both Windows 7 and 8. Its a 990FXA board with a FX-8120, CPU so its not underpowered. And yes, I only minimized it, I did not close and reopen.
|
|
|
|
kano
Legendary
Offline
Activity: 4606
Merit: 1851
Linux since 1997 RedHat 4
|
|
August 01, 2013, 07:58:47 PM |
|
My GPU was SICK for several days and I missed that because monitoring script reading API port 4028 reported OK. Is there a way to detect SICK card through API? cgminer 3.2.2 reported: [2013-07-28 11:29:54] Stratum from pool 1 detected new block [2013-07-28 11:29:55] Pool 1 stale share detected, discarding [2013-07-28 11:29:56] Accepted 876fa488 Diff 4/2 GPU 0 pool 1 [2013-07-28 11:31:28] Stratum connection to pool 1 interrupted [2013-07-28 11:31:28] Lost 517 shares due to stratum disconnect on pool 1 [2013-07-28 11:31:30] Pool 1 stratum share submission failure [2013-07-28 11:32:00] Pool 1 communication resumed, submitting work [2013-07-28 11:32:00] Rejected acc5c400 Diff 3/2 GPU 0 pool 1 [2013-07-28 11:32:32] GPU0: Idle for more than 60 seconds, declaring SICK! [2013-07-28 11:32:32] GPU0: Attempting to restart [2013-07-28 11:32:32] Thread 0 still exists, killing it off [2013-07-28 11:32:32] Thread 0 restarted
"devs" report for SICK card: echo '{"command" : "devs"}' | nc localhost 4028 | tr -d '\0' | python -mjson.tool { "DEVS": [ { "Accepted": 694192, "Diff1 Work": 2380127, "Difficulty Accepted": 1360131.0, "Difficulty Rejected": 436120.0, "Enabled": "Y", "Fan Percent": 56, "Fan Speed": -1, "GPU": 0, "GPU Activity": 0, "GPU Clock": 157, "GPU Voltage": 1.1, "Hardware Errors": 0, "Intensity": "18", "Last Share Difficulty": 2.0, "Last Share Pool": 1, "Last Share Time": 1375003796, "Last Valid Work": 1375003890, "MHS 5s": 0.0, "MHS av": 0.17, "Memory Clock": 300, "Powertune": 0, "Rejected": 222954, "Status": "Alive", "Temperature": 40.0, "Total MH": 156195.3567, "Utility": 44.85 } ], "STATUS": [ { "Code": 9, "Description": "cgminer 3.2.2", "Msg": "1 GPU(s) - ", "STATUS": "S", "When": 1375361725 } ], "id": 1 }
I am not sure but this could be a bug. I can try to detect SICK state from several parameters (MHS 5s, GPU Activity, Temperature) but is it correct way? If it is, what parameter should be used for detection? BTW, reported parameter "MHS av" is wrong, it was 0.00, because card was sick for several days... Firstly, "When" - "Last Share Time" was 357929s ago ... so yeah that explains the rejects - it's spitting out crap 357929s = 99h 25m 29s Probably scrypt mining on too high settings? The best one to use there would be "Last Valid Work" ... that says 357835s ago or 99h 23m 55s ... The MH av will take a long time to get to 0 since it is the av since it started and it obviously did 'some' work. It says you did "Total MH": 156195.3567, so it would take a long time for that to average out to less than 10kH/s (more than 180 days) The device isn't SICK when you did the API command it was "Status": "Alive" The notify command would say when it last has a problem and also how many times it was SICK.
|
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
August 01, 2013, 08:34:22 PM |
|
I have an Avalon batch3 4 modules, firmware 20130723. After a few hours of hashing, some of the modules stops (once all stopped, once three of them, once two of them). The other(s) continue hashing. Temperatures are normal, 21,65,50. It happened @300 and @282MHz, although unit can hash @350MHz at least for some time. I have measured power consumption at the wall and it's in normal PSU parameters. When module(s) stop, there's this error in System Log: usb 1-1: clear tt 1 (0030) error -71 Any ideas? I did not create that firmware... However that looks like a usb connectivity issue which happens quite often on avalon because of the terrible hardware that it's run on (the wrt703n router). If you can connect to it via ethernet and fully disable wifi, you will find it more reliable. (Remember I'm providing generic support for firmware I did not create though).
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
PSL
Member
Offline
Activity: 166
Merit: 10
|
|
August 01, 2013, 11:16:47 PM |
|
Firstly, "When" - "Last Share Time" was 357929s ago ... so yeah that explains the rejects - it's spitting out crap 357929s = 99h 25m 29s Probably scrypt mining on too high settings? I don't comply about rejects. These are OK, it was mining against p2pool. Problem is that HASHRATE of this card was 0 for few days and API didn't indicate that card is SICK and AVG rate was not updated. Other problem is that cgminer had no data for few seconds, marked card SICK and never tried to restart mining. Two or three different issues. I am not sure what is the trigger of this situation, maybe that PC running p2pool was rebooted to apply security updates, so p2pool was offline for few minutes. The best one to use there would be "Last Valid Work" ... that says 357835s ago or 99h 23m 55s ...
This can work when mining against pool but cannot be used for solo mining at higher diff; in that situation you can mine several days without block or share and Last valid work cannot be used. The MH av will take a long time to get to 0 since it is the av since it started and it obviously did 'some' work. It says you did "Total MH": 156195.3567, so it would take a long time for that to average out to less than 10kH/s (more than 180 days)
The device isn't SICK when you did the API command it was "Status": "Alive"
Card was running ok for several days and in one moment card was marked as SICK, hashrate changed to 0, temperature and GPU engine frequency were set low; card was marked sick for several days and status was ALIVE all the time, otherwise monitoring script will highlight the problem. And this is why I thing there is a bug... Is it expected that SICK card is reported in API as DEAD? The notify command would say when it last has a problem and also how many times it was SICK.
I don't have "notify" report for sick card; next time...
|
|
|
|
kano
Legendary
Offline
Activity: 4606
Merit: 1851
Linux since 1997 RedHat 4
|
|
August 02, 2013, 01:18:25 AM |
|
... The best one to use there would be "Last Valid Work" ... that says 357835s ago or 99h 23m 55s ...
This can work when mining against pool but cannot be used for solo mining at higher diff; in that situation you can mine several days without block or share and Last valid work cannot be used. ... Nope. I really did say that coz it is correct ... I added "Last Valid Work" to cgminer coz it is the best way to determine something is wrong. It has nothing to do with shares or difficulty. It is the last time a valid hash was returned by the device.
|
|
|
|
PSL
Member
Offline
Activity: 166
Merit: 10
|
|
August 02, 2013, 06:45:56 AM |
|
What about API and blocks found by cgminer? It looks like noone is really interested in blocks, all effort is focused to shares reporting... The only API command "summary" reports blocks found. I can manually display blocks found for a pool. Could by API command "pools" extended to report blocks found for a pool?
|
|
|
|
PSL
Member
Offline
Activity: 166
Merit: 10
|
|
August 02, 2013, 07:36:08 AM |
|
... The best one to use there would be "Last Valid Work" ... that says 357835s ago or 99h 23m 55s ...
This can work when mining against pool but cannot be used for solo mining at higher diff; in that situation you can mine several days without block or share and Last valid work cannot be used. ... Nope. I really did say that coz it is correct ... I added "Last Valid Work" to cgminer coz it is the best way to determine something is wrong. It has nothing to do with shares or difficulty. It is the last time a valid hash was returned by the device. Tested and it doesn't work for solo mining. I calculated "timeout" from devs data, like devs.STATUS.When-devs.DEVS(x).Last Valid Work; I see that timeout can be high even for healthy card...
|
|
|
|
kano
Legendary
Offline
Activity: 4606
Merit: 1851
Linux since 1997 RedHat 4
|
|
August 02, 2013, 08:05:06 AM |
|
... The best one to use there would be "Last Valid Work" ... that says 357835s ago or 99h 23m 55s ...
This can work when mining against pool but cannot be used for solo mining at higher diff; in that situation you can mine several days without block or share and Last valid work cannot be used. ... Nope. I really did say that coz it is correct ... I added "Last Valid Work" to cgminer coz it is the best way to determine something is wrong. It has nothing to do with shares or difficulty. It is the last time a valid hash was returned by the device. Tested and it doesn't work for solo mining. I calculated "timeout" from devs data, like devs.STATUS.When-devs.DEVS(x).Last Valid Work; I see that timeout can be high even for healthy card... Try again ... it works ... I wrote it [STATUS] => ( [STATUS] => S [When] => 1375430059 [ Code] => 9 [Msg] => 0 GPU(s) - 3 ASC(s) - 6 PGA(s) - [Description] => Subaru ) [ASC0] => ( [ASC] => 0 [Name] => BAS [ID] => 0 [Enabled] => Y [Status] => Alive [Temperature] => 67.00 [MHS av] => 61520.46 [MHS 5s] => 61445.47 [Accepted] => 5040 [Rejected] => 30 [Hardware Errors] => 19392 [Utility] => 3.35 [Last Share Pool] => 0 [Last Share Time] => 1375430055 [Total MH] => 5561359878.0407 [Diff1 Work] => 1292409 [Difficulty Accepted] => 1271115.00000000 [Difficulty Rejected] => 7680.00000000 [Last Share Difficulty] => 256.00000000 [No Device] => false [Last Valid Work] => 1375430059 )
1375430059-1375430059=0 Yep a BAS (61.5GH/s) will average >14 results a second so will usually be 0 For an AMU as it says 335MH/s [STATUS] => ( [STATUS] => S [When] => 1375430228 [ Code] => 9 [Msg] => 0 ASC(s) - 3 PGA(s) - [Description] => Pi ) [PGA0] => ( [PGA] => 0 [Name] => AMU [ID] => 0 [Enabled] => Y [Status] => Alive [Temperature] => 0.00 [MHS av] => 335.97 [MHS 5s] => 335.36 [Accepted] => 35 [Rejected] => 0 [Hardware Errors] => 90 [Utility] => 0.02 [Last Share Pool] => 0 [Last Share Time] => 1375428250 [Total MH] => 37978409.3286 [Frequency] => 0.00 [Diff1 Work] => 8761 [Difficulty Accepted] => 7430.00000000 [Difficulty Rejected] => 0.00000000 [Last Share Difficulty] => 256.00000000 [No Device] => false [Last Valid Work] => 1375430227 )
1375430228-1375430227=1 It will average 12.8 seconds Variance of up to 8 times is not rare so 102 would not be unexpected (My script that checks it for my old Icarus that sometimes stops working uses a factor of 21.2 to restart cgminer) Of course if you get a hardware error, that won't count - so consider that like doubling the time ... then a few HW in a row ... and ...
|
|
|
|
Lucko
|
|
August 02, 2013, 08:42:05 AM |
|
I'm sure this was answered but can't find it using search and reading all is out of the question...
Jalapeno
[2013-08-02 09:57:22] USB init, open device failed, err -12, you need to install a WinUSB driver for - BFL device 7:2
I'm googling and the only solution that work was use BFGminer... Now I'm sure it is possible to use cgminer but have no idea how... Where do I get that driver?
I found one on net but didn't work... And according to Jalapeno how-to it should be plug and play...
|
|
|
|
kano
Legendary
Offline
Activity: 4606
Merit: 1851
Linux since 1997 RedHat 4
|
|
August 02, 2013, 09:13:30 AM |
|
Read any of README, ASIC-README or even FPGA-README that tells you how to setup the windows driver ... FPGA-README also has the link to download it rather than typing the name Zadig into google
|
|
|
|
PSL
Member
Offline
Activity: 166
Merit: 10
|
|
August 02, 2013, 09:20:25 AM |
|
Try again ... it works ... I wrote it PWC solo & 5750 (about 80 kHash), Last Valid Work is 1751 seconds old just now and still growing. I already noticed that this value was higher than 2000 seconds. On the other side, faster card (7950) and different scrypt coin has lower values but I see 244 seconds timeout just now... These values are related to coin and GPU hashrate.
|
|
|
|
Lucko
|
|
August 02, 2013, 09:29:07 AM Last edit: August 02, 2013, 09:48:36 AM by Lucko |
|
You really need to make this README more dumber friendly...
I have no idea that this refers to my problem: When mining on windows, the driver being used will determine if mining will work.
If the driver doesn't allow mining, you will get a "USB init," error message i.e. one of: open device failed, err %d, you need to install a Windows USB driver for the device or claim interface %d failed, err %d
And why is then saying WinUSB not Zadig?
And why dose it work on a BFGminer?
EDIT: Zagid doesn't detect my device... And please don't send me to another readme... I will say BFGminer is fine in this case...
|
|
|
|
os2sam
Legendary
Offline
Activity: 3586
Merit: 1098
Think for yourself
|
|
August 02, 2013, 10:53:44 AM |
|
You really need to make this README more dumber friendly...
Doesn't seem too difficult to me. Could be worded better I guess -----------------------ASIC Readme------------------------- WINDOWS: On windows, the direct USB support requires the installation of a WinUSB driver (NOT the ftdi_sio driver), and attach it to the Butterfly labs device. The easiest way to do this is to use the zadig utility which will install the drivers for you and then once you plug in your device you can choose the "list all devices" from the "option" menu and you should be able to see the device as something like: "BitFORCE SHA256 SC". Choose the install or replace driver option and select WinUSB. You can either google for zadig or download it from the cgminer directoy in the DOWNLOADS link above. ------------------------------------------------------------- Here is my interpretation of the above paragraph and the steps I took 1. Unplug all Erupters 2. Run Zadig - Install WinUSB (I may have rebooted, can't remember for sure) - I could not install WinUSB while Erupters were plugged in. 3. Plug in one Erupter, verify that it is set to WinUSB. If not set it to WinUSB. 4. Close Zadig, plug in the rest of the Erupters, run CGMiner. Done
|
A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?
|
|
|
|