MrTeal (OP)
Legendary
Offline
Activity: 1274
Merit: 1004
|
|
October 08, 2014, 10:24:38 PM |
|
[..] With that bad work sequence error, are you running multiple pools with load balance or balance in cgminer? I have seen that before in that specific instance, and it has to do with the HF global work queue. Disabling multipool should fix that error. Just on the one pool (bitminter). There are backup pools configured, but failover mode. I have now enabled failover-only additionally. What kind of error do you get on the zombie?
Along those lines: ERR: Asked to memcpy 0 bytes from usbutils.c _usb_read():3170 HFA : OP_USB_INIT failed! Operation status 20 (Regulator programming error) HFA : hfa_send_frame: USB Send error, ret 0 amount 0 vs. tx_length 8, retrying FAIL: USB get_lock not found (1:88)
And from syslog: Oct 8 19:03:16 lubuntu kernel: [92818.100201] usb 1-4.3: USB disconnect, device number 100 Oct 8 19:03:16 lubuntu ModemManager[4491]: <info> (tty/ttyACM0): released by modem /sys/devices/pci0000:00/0000:00:12.2/usb1/1-4/1-4.3 Oct 8 19:03:16 lubuntu kernel: [92818.571585] usb 1-4.3: new full-speed USB device number 101 using ehci-pci Oct 8 19:03:17 lubuntu kernel: [92818.683103] usb 1-4.3: New USB device found, idVendor=297c, idProduct=0001 Oct 8 19:03:17 lubuntu kernel: [92818.683115] usb 1-4.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 Oct 8 19:03:17 lubuntu kernel: [92818.683123] usb 1-4.3: Product: M1 Module Oct 8 19:03:17 lubuntu kernel: [92818.683129] usb 1-4.3: Manufacturer: HashFast LLC Oct 8 19:03:17 lubuntu kernel: [92818.683135] usb 1-4.3: SerialNumber: 2cedd52ddd49635e09d4d299e2a16213 Oct 8 19:03:17 lubuntu kernel: [92818.683856] cdc_acm 1-4.3:1.0: ttyACM0: USB ACM device Oct 8 19:03:17 lubuntu mtp-probe: checking bus 1, device 101: "/sys/devices/pci0000:00/0000:00:12.2/usb1/1-4/1-4.3" Oct 8 19:03:17 lubuntu mtp-probe: bus: 1, device: 101 was not an MTP device Oct 8 19:03:19 lubuntu ModemManager[4491]: <info> (tty/ttyACM0): released by modem /sys/devices/pci0000:00/0000:00:12.2/usb1/1-4/1-4.3 Oct 8 19:03:19 lubuntu ModemManager[4491]: <warn> (Plugin Manager) (Iridium) [ttyACM0] error when checking support: '(tty/ttyACM0) failed to open $ Oct 8 19:03:19 lubuntu ModemManager[4491]: <warn> (Plugin Manager) (Nokia) [ttyACM0] error when checking support: '(Nokia) Missing port probe for $ Oct 8 19:03:19 lubuntu ModemManager[4491]: <warn> (Plugin Manager) (Via CBP7) [ttyACM0] error when checking support: '(Via CBP7) Missing port prob$ Oct 8 19:03:19 lubuntu ModemManager[4491]: <warn> (Plugin Manager) (Generic) [ttyACM0] error when checking support: '(Generic) Missing port probe $ Oct 8 19:03:19 lubuntu ModemManager[4491]: <warn> Couldn't find support for device at '/sys/devices/pci0000:00/0000:00:12.2/usb1/1-4/1-4.3': not s$ Oct 8 19:03:21 lubuntu kernel: [92823.310510] usb 1-4.3: reset full-speed USB device number 101 using ehci-pci Oct 8 19:03:22 lubuntu kernel: [92824.422789] cdc_acm 1-4.3:1.0: ttyACM0: USB ACM device
Bus numbers don't match as the pastes are from different times. I understand from the thread that the power required per board and USB is 300mA. The amount of boards x 300mA is just a bit above of what the hub power supply provides, so I got another hub on the way. The start-up sequence doesn't seem to matter. Keep in mind that these Habaneros have operated fine at the previous owner's place. So unless the DHL guys provided a bumpy ride... They don't draw 300mA in normal operation, if 12V is present they won't draw anything from the USB port as the 3.3V rail is preferentially powered from 12V. For the regulator programming error, check for a short between the VDD test points and ground along the edge of the board for the four dies.
|
|
|
|
MrTeal (OP)
Legendary
Offline
Activity: 1274
Merit: 1004
|
|
October 08, 2014, 10:26:52 PM |
|
MrTeal, is there any chance you could release a purely experimental big red warning firware with the 110 ceiling? I've got a three-die card that isn't staying cool enough and I'd like to raise the ceiling, even if it means burning it up.
+1 Hmmm.. I suppose I could. I have some changes I've added to the code base anyway that should probably be pushed out once I get them tested fully. Might help identify Newar's regulator programmer error as well.
|
|
|
|
Newar
Legendary
Offline
Activity: 1358
Merit: 1001
https://gliph.me/hUF
|
|
October 09, 2014, 05:22:20 AM |
|
Thanks for the quick reply. I see if the additional hub helps and report back. Let me add to the experimental firmware request too
|
|
|
|
Newar
Legendary
Offline
Activity: 1358
Merit: 1001
https://gliph.me/hUF
|
|
October 09, 2014, 04:36:45 PM |
|
Sort-of-update, nothing new or resolved, but maybe some more detail/accuracy: I have isolated the Zombie to its own cgminer instance and it returns only this on startup: ERR: Asked to memcpy 0 bytes from usbutils.c _usb_read(): 3170 HFA : OP_USB_INIT failed! Operation status 20 (Regulator programming error) Repeat. I was not able to detect a short on the board. For the good boards, even with fail-over only I still get the: [2014-10-09 16:56:52] HFB hab5: Bad work sequence tail 1633 head 322 devhead 32 2 devtail 1730 sequence 2048 [2014-10-09 16:56:52] HFB 1 failure, disabling! [2014-10-09 16:56:52] HFB 3 failure, disabling! [2014-10-09 16:56:53] HFB hab4: Bad work sequence tail 1409 head 1881 devhead 1 881 devtail 1505 sequence 2048 [2014-10-09 16:56:53] HFB hab2: Bad work sequence tail 115 head 726 devhead 726 devtail 239 sequence 2048 [2014-10-09 16:56:53] HFB 0 failure, disabling! [2014-10-09 16:56:53] HFB 4 failure, disabling! Whilst at the same time in syslog: Oct 9 16:56:52 lubuntu kernel: [171751.096002] cdc_acm 1-4.3:1.0: ttyACM0: USB ACM device Oct 9 16:56:52 lubuntu kernel: [171751.096370] cdc_acm 1-2.3:1.0: ttyACM1: USB ACM device Oct 9 16:56:53 lubuntu kernel: [171751.618102] cdc_acm 1-4.4:1.0: ttyACM2: USB ACM device Oct 9 16:56:53 lubuntu kernel: [171751.622705] cdc_acm 1-2.2:1.0: ttyACM3: USB ACM device
This happens roughly once per hour. I'd still be interested to hear what cgminer version fellow miners are running. Not that it will solve the problems above, but maybe get same additional stability.
|
|
|
|
MrTeal (OP)
Legendary
Offline
Activity: 1274
Merit: 1004
|
|
October 09, 2014, 05:13:46 PM |
|
Sort-of-update, nothing new or resolved, but maybe some more detail/accuracy: I have isolated the Zombie to its own cgminer instance and it returns only this on startup: ERR: Asked to memcpy 0 bytes from usbutils.c _usb_read(): 3170 HFA : OP_USB_INIT failed! Operation status 20 (Regulator programming error) Repeat. I was not able to detect a short on the board. For the good boards, even with fail-over only I still get the: [2014-10-09 16:56:52] HFB hab5: Bad work sequence tail 1633 head 322 devhead 32 2 devtail 1730 sequence 2048 [2014-10-09 16:56:52] HFB 1 failure, disabling! [2014-10-09 16:56:52] HFB 3 failure, disabling! [2014-10-09 16:56:53] HFB hab4: Bad work sequence tail 1409 head 1881 devhead 1 881 devtail 1505 sequence 2048 [2014-10-09 16:56:53] HFB hab2: Bad work sequence tail 115 head 726 devhead 726 devtail 239 sequence 2048 [2014-10-09 16:56:53] HFB 0 failure, disabling! [2014-10-09 16:56:53] HFB 4 failure, disabling! Whilst at the same time in syslog: Oct 9 16:56:52 lubuntu kernel: [171751.096002] cdc_acm 1-4.3:1.0: ttyACM0: USB ACM device Oct 9 16:56:52 lubuntu kernel: [171751.096370] cdc_acm 1-2.3:1.0: ttyACM1: USB ACM device Oct 9 16:56:53 lubuntu kernel: [171751.618102] cdc_acm 1-4.4:1.0: ttyACM2: USB ACM device Oct 9 16:56:53 lubuntu kernel: [171751.622705] cdc_acm 1-2.2:1.0: ttyACM3: USB ACM device
This happens roughly once per hour. I'd still be interested to hear what cgminer version fellow miners are running. Not that it will solve the problems above, but maybe get same additional stability. For the board with the regulator programming error, can you verify that 12V is good to each of the connectors? For the other error, I've only seen that once in relation to multipool. Does that happen immediately after a block is detected and you get a stratum restart? I've talked with Con a little about this, and his opinion is also that it is a problem with the GWQ. The longterm solution is probably to completely replace the HF driver and allow cgminer to schedule work, but I honestly don't think that will ever happen. Try removing backup pools temporarily to see if the issue goes away. I don't have 4.6.x on any machines, mostly a mix of 4.5 and 4.3.
|
|
|
|
Taugeran
|
|
October 09, 2014, 06:08:26 PM |
|
Sort-of-update, nothing new or resolved, but maybe some more detail/accuracy: I have isolated the Zombie to its own cgminer instance and it returns only this on startup: ERR: Asked to memcpy 0 bytes from usbutils.c _usb_read(): 3170 HFA : OP_USB_INIT failed! Operation status 20 (Regulator programming error) Repeat. I was not able to detect a short on the board. For the good boards, even with fail-over only I still get the: [2014-10-09 16:56:52] HFB hab5: Bad work sequence tail 1633 head 322 devhead 32 2 devtail 1730 sequence 2048 [2014-10-09 16:56:52] HFB 1 failure, disabling! [2014-10-09 16:56:52] HFB 3 failure, disabling! [2014-10-09 16:56:53] HFB hab4: Bad work sequence tail 1409 head 1881 devhead 1 881 devtail 1505 sequence 2048 [2014-10-09 16:56:53] HFB hab2: Bad work sequence tail 115 head 726 devhead 726 devtail 239 sequence 2048 [2014-10-09 16:56:53] HFB 0 failure, disabling! [2014-10-09 16:56:53] HFB 4 failure, disabling! Whilst at the same time in syslog: Oct 9 16:56:52 lubuntu kernel: [171751.096002] cdc_acm 1-4.3:1.0: ttyACM0: USB ACM device Oct 9 16:56:52 lubuntu kernel: [171751.096370] cdc_acm 1-2.3:1.0: ttyACM1: USB ACM device Oct 9 16:56:53 lubuntu kernel: [171751.618102] cdc_acm 1-4.4:1.0: ttyACM2: USB ACM device Oct 9 16:56:53 lubuntu kernel: [171751.622705] cdc_acm 1-2.2:1.0: ttyACM3: USB ACM device
This happens roughly once per hour. I'd still be interested to hear what cgminer version fellow miners are running. Not that it will solve the problems above, but maybe get same additional stability. I know it will sounds odd but I've enjoyed a certain amount of success using bfgminer (which uses the other protocol supported by the HF boards). It has very good and off the bat detection/disablement of bad hash cores. Though I have seen oddities where the device(s) must be manually added to bfgminer to initialize properly bfgminer <other options> -S HFA:noauto --set HFA:clock=650 M + HFA:auto my two bitcents and it can dump out the whole HF_Frame using the commandline flags: -D --device-protocol-dump 2> HF.Logfile.log
|
Bitfury HW & Habañero : 1.625Th/s tips/Donations: 1NoS89H3Mr6U5CmP4VwWzU2318JEMxHL1 Come join Coinbase
|
|
|
xjack
|
|
October 09, 2014, 10:43:02 PM |
|
@ Newar - fwiw, I run JakeTri cgminer 4.4.0 on ubuntu 13.10, and JakeTri 4.4.1 on Debian/Beaglebone. Both are rock solid 24/7. Here are my configs... BBB - runs speed/voltage set in firmware - hfa-hash-clock 1. screen -dmS hab /root/cg-hab/cgminer -c /root/cg.conf
root@beaglebone:~# cat cg.conf { "pools" : [ { ...snip... } ], "hfa-hash-clock" :"1", "hfa-fan" : "100", "hfa-temp-target" : "0", "hfa-temp-overheat" : "104", "hfa-fail-drop" : "10", "api-allow" : "W:127.0.0.1,W:192.168.2.0/24", "api-listen" : true, "api-port" : "4028", "failover-only" : true, "widescreen" : true }
ubuntu - same cgminer.conf, but with hfa-hash-clock removed. screen -dmS hab /home/cg-hab/cgminer -c /home/cgminer.conf --hfa-options "Chip:950@980,Dabs:950@0@980@980@970:0:0:0:-25"
|
|
|
|
Newar
Legendary
Offline
Activity: 1358
Merit: 1001
https://gliph.me/hUF
|
|
October 10, 2014, 05:41:54 AM |
|
I know it will sounds odd but I've enjoyed a certain amount of success using bfgminer [...]
Thank you for the input. Which version are you using? On 4.9.0 I get a ton of: [2014-10-10 07:32:08] hashfast fd=40: SEND (aa0b035203000050) => -1 errno=5(Input/output error) [2014-10-10 07:32:08] hashfast fd=40: SEND (aa0b035303000046) => -1 errno=5(Input/output error) [2014-10-10 07:32:08] hashfast fd=40: SEND (aa0b035403000024) => -1 errno=5(Input/output error) [2014-10-10 07:32:08] hashfast fd=40: SEND (aa0b035503000032) => -1 errno=5(Input/output error) and others which go past too quick for a copy and paste. After a few minutes it quits with: Segmentation fault (core dumped)
|
|
|
|
Newar
Legendary
Offline
Activity: 1358
Merit: 1001
https://gliph.me/hUF
|
|
October 10, 2014, 05:43:06 AM |
|
MrTeal and xjack, also thanks for the input. I will try those suggestions.
|
|
|
|
Taugeran
|
|
October 10, 2014, 05:44:52 AM |
|
I know it will sounds odd but I've enjoyed a certain amount of success using bfgminer [...]
Thank you for the input. Which version are you using? On 4.9.0 I get a ton of: [2014-10-10 07:32:08] hashfast fd=40: SEND (aa0b035203000050) => -1 errno=5(Input/output error) [2014-10-10 07:32:08] hashfast fd=40: SEND (aa0b035303000046) => -1 errno=5(Input/output error) [2014-10-10 07:32:08] hashfast fd=40: SEND (aa0b035403000024) => -1 errno=5(Input/output error) [2014-10-10 07:32:08] hashfast fd=40: SEND (aa0b035503000032) => -1 errno=5(Input/output error) and others which go past too quick for a copy and paste. After a few minutes it quits with: Segmentation fault (core dumped) 4.2.0
|
Bitfury HW & Habañero : 1.625Th/s tips/Donations: 1NoS89H3Mr6U5CmP4VwWzU2318JEMxHL1 Come join Coinbase
|
|
|
xjack
|
|
October 10, 2014, 09:30:49 PM |
|
Uh oh.... Dabs may have cashed in her chips.
I shut it off to put a kill-a-watt on it, and when I switched back on - nil. Lights blink for just a sec and then the PSU shuts down.
PSU roulette with another habanero eliminates the PSU as a problem.
Can someone point me in the right direction to troubleshoot it?
|
|
|
|
Taugeran
|
|
October 11, 2014, 06:16:15 AM |
|
Uh oh.... Dabs may have cashed in her chips.
I shut it off to put a kill-a-watt on it, and when I switched back on - nil. Lights blink for just a sec and then the PSU shuts down.
PSU roulette with another habanero eliminates the PSU as a problem.
Can someone point me in the right direction to troubleshoot it?
Maybe try each individual atx connector by itself? See if one in particular shorts. Cuz if the PSU shuts off that sounds like a short
|
Bitfury HW & Habañero : 1.625Th/s tips/Donations: 1NoS89H3Mr6U5CmP4VwWzU2318JEMxHL1 Come join Coinbase
|
|
|
Newar
Legendary
Offline
Activity: 1358
Merit: 1001
https://gliph.me/hUF
|
|
October 11, 2014, 03:20:24 PM |
|
For the board with the regulator programming error, can you verify that 12V is good to each of the connectors? I'll be able to check that on Monday. For the other error, I've only seen that once in relation to multipool. Does that happen immediately after a block is detected and you get a stratum restart? I've talked with Con a little about this, and his opinion is also that it is a problem with the GWQ. The longterm solution is probably to completely replace the HF driver and allow cgminer to schedule work, but I honestly don't think that will ever happen. Try removing backup pools temporarily to see if the issue goes away. I don't have 4.6.x on any machines, mostly a mix of 4.5 and 4.3.
I have removed the backup pools and still get that error. For finding the block detections and stratum restarts I guess I have to pipe the cgminer output to a log file? Or is there another way other than sitting in front of it and waiting for a block to happen? JakeTri cgminer 4.4.0 [...] --hfa-options "Chip:950@980,Dabs:950@0@980@980@970:0:0:0:-25"
In the meanwhile I have read through the HF-Tool thread as well. But no matter how I send the info to my boards, it only picks up the first to values and assigns the second value to the other 3 boards as well. To illustrate: --hfa-options "hab2:800@890,hab3:875@935,hab4:850@920,hab5:875@935,hab6:850@920" This results in hab2 at 800 and hab3, hab4, hab5 and hab6 at 875. I'm using 4.4.1 I am going on a limb and say nobody ever tried that with more than 2 devices? Uh oh.... Dabs may have cashed in her chips.
I shut it off to put a kill-a-watt on it, and when I switched back on - nil. Lights blink for just a sec and then the PSU shuts down.
PSU roulette with another habanero eliminates the PSU as a problem.
Can someone point me in the right direction to troubleshoot it?
Did you take the Kill-a-Watt out of the loop whilst troubleshooting?
|
|
|
|
xjack
|
|
October 11, 2014, 09:55:26 PM |
|
Uh oh.... Dabs may have cashed in her chips.
I shut it off to put a kill-a-watt on it, and when I switched back on - nil. Lights blink for just a sec and then the PSU shuts down.
PSU roulette with another habanero eliminates the PSU as a problem.
Can someone point me in the right direction to troubleshoot it?
Did you take the Kill-a-Watt out of the loop whilst troubleshooting? Yes - my "production" mining rack is 240v. Test bed is 120v(my K-A-W is 120v). Tested both PSUs on 120 and 240 and with two different boards. The problem is iso'd to this board. @Taugeran - Thanks for the suggestion - die 1 input is the culprit. Where do I go from here?
|
|
|
|
MrTeal (OP)
Legendary
Offline
Activity: 1274
Merit: 1004
|
|
October 11, 2014, 10:34:43 PM |
|
Uh oh.... Dabs may have cashed in her chips.
I shut it off to put a kill-a-watt on it, and when I switched back on - nil. Lights blink for just a sec and then the PSU shuts down.
PSU roulette with another habanero eliminates the PSU as a problem.
Can someone point me in the right direction to troubleshoot it?
Did you take the Kill-a-Watt out of the loop whilst troubleshooting? Yes - my "production" mining rack is 240v. Test bed is 120v(my K-A-W is 120v). Tested both PSUs on 120 and 240 and with two different boards. The problem is iso'd to this board. @Taugeran - Thanks for the suggestion - die 1 input is the culprit. Where do I go from here? You can disable that die by using the hftool. $ ./hftool.py -w 0:VLT@FRQ,1:0@0,2:VLT@FRQ,3:VLT@FRQ The board itself is causing the PSU to turn off, it's probably a hardware fault. PM me and I can arrange an RMA.
|
|
|
|
Taugeran
|
|
October 11, 2014, 11:52:18 PM |
|
Oh any suggestions for bringing die temps down? I just got a corsair H110 for my hab and am using AS 5 TIM
Dies 1 and 3 are running 15-20C hotter than 0&2.
Tried readjusting torque. Re applying TIM. BUT still 15-20 difference.
And this is limiting me to 675MHz@885mV using HFTool before I run into a die hitting 100C
|
Bitfury HW & Habañero : 1.625Th/s tips/Donations: 1NoS89H3Mr6U5CmP4VwWzU2318JEMxHL1 Come join Coinbase
|
|
|
MrTeal (OP)
Legendary
Offline
Activity: 1274
Merit: 1004
|
|
October 12, 2014, 01:31:23 AM |
|
Oh any suggestions for bringing die temps down? I just got a corsair H110 for my hab and am using AS 5 TIM
Dies 1 and 3 are running 15-20C hotter than 0&2.
Tried readjusting torque. Re applying TIM. BUT still 15-20 difference.
And this is limiting me to 675MHz@885mV using HFTool before I run into a die hitting 100C
Check the flatness of the waterblock. I've noticed a few of the Asetek waterblocks I've seen have been very convex.
|
|
|
|
SVK
|
|
October 13, 2014, 05:35:22 AM |
|
Oh any suggestions for bringing die temps down? I just got a corsair H110 for my hab and am using AS 5 TIM
Dies 1 and 3 are running 15-20C hotter than 0&2.
Tried readjusting torque. Re applying TIM. BUT still 15-20 difference.
And this is limiting me to 675MHz@885mV using HFTool before I run into a die hitting 100C
I have similar problem with ASIC 1. It runs far too hot to a point that I have turned that board off completely. No point in running it.
|
|
|
|
xjack
|
|
October 14, 2014, 06:03:40 PM |
|
Decided not to pursue my problem board - 12v input is definitely shorted. That leaves two working dies on this board. If anyone is interested in an as-is tinker toy board - PM me. Offering one Nepton 280 for sale in the Marketplace - https://bitcointalk.org/index.php?topic=823639.new#new
|
|
|
|
Newar
Legendary
Offline
Activity: 1358
Merit: 1001
https://gliph.me/hUF
|
|
October 17, 2014, 02:36:27 PM |
|
This has been asked before in this thread, but it wasn't fully answered: Does the Chain UP/DOWN work out of the box on the Habanero?
|
|
|
|
|