TXSteve
|
|
August 13, 2015, 09:03:28 PM |
|
here's the screen shot, trying your latest commit now Wait, this is while bfgminer says its configured successfully?! ... cuz according to screenshot its not configured successfully at all, its not hashing ... thats why it got hard reset. yeah it an odd one, it is hashing -- that die never displays -- if I turn off all the good dies and let that die run I get aprox 10 mh/s on that bad die btw that latest commit fixed it, the soft resets are random as needed -- before they occurred every 37 secs or so while in that loop I bet whats going on is one of the DCDC's for that die is completely shot and the other is barely limping along. The shot one would of course fail the threshold test. Now I have the script doing an AND comparison of threshold results for both DCDC's in the pair for the die. If both are fail the threshold test then it schedules the hard reset. I think dies that need a hard reset will have the DCDC's not putting any current out anyways so thats good, they would be scheduled for hard reset as desired. Dies such as urs still have a bit of current through at least one DCDC so we wouldnt want a hard reset on it. For those miners who have completely dead dies, they should be set to OFF in the webgui that way my script wont constantly try to do hard resets. Also, keep in mind this will still schedule a hard reset in the event that there has been no work from pool in over 6mins or so(once it enters the die reset loop). Hopefully this will work as expected for needed hard resets, thats what Im waiting to test next on my rig haha! Also, eventually I will scale back the log output during that loop, once we confirm stuff works so the log doesnt get huge. this thing is running smooth ... no hard resets today at all, but I haven't needed any either, usually just need 1 or 2 every few days
|
|
|
|
vegasguy
Legendary
Offline
Activity: 1610
Merit: 1003
"Yobit pump alert software" Link in my signature!
|
|
August 13, 2015, 09:24:11 PM |
|
I have download the latest image and can confirm, it works excellent!
Vegas
|
I want to make sure everyone knows that I just released my software called "Yobit pump alert". THis is custom software that uses an algo to detect the start of a pump here on yobit, the second it starts. YOu can even filter the coins you see by price. Most pumps start less than 100 sats , so you can easily filter the cheap coins, so they are the only ones displayed https://bitcointalk.org/index.php?topic=1945937.msg20241953#msg20241953
|
|
|
xenostar
Newbie
Offline
Activity: 14
Merit: 1
|
|
August 13, 2015, 09:29:57 PM |
|
If GenTarkin or anyone else could provide input, I would appreciate it.
I have a two titans that were working solid, one was 330 the other 320, 1 die dead on one 2 dies dead on the 2nd. After shipment, there are 4 dead die's, 1 half working die @ 200mhz, and 2-3 dies that work when reset, but slowly gain in amperage from standard area around 38-42 (300-325) all the way up to around 52(and over 100C) when they cutoff and everything goes 0.00. I am unsure how this occurred as everything was working fine prior to shipping. I thought at first it was the diff brand PSU's being used, but they are excellent brand/models and should not be causing any issues.
Is the die's slowly gaining amperage a possibility of heatsinks getting knocked loose in shipping, or do you suspect something else?
I will be able to test your fw on the units tomorrow and provide results. In the mean time I would like to brainstorm possibilities for resolutions.
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 13, 2015, 09:37:36 PM |
|
If GenTarkin or anyone else could provide input, I would appreciate it.
I have a two titans that were working solid, one was 330 the other 320, 1 die dead on one 2 dies dead on the 2nd. After shipment, there are 4 dead die's, 1 half working die @ 200mhz, and 2-3 dies that work when reset, but slowly gain in amperage from standard area around 38-42 (300-325) all the way up to around 52(and over 100C) when they cutoff and everything goes 0.00. I am unsure how this occurred as everything was working fine prior to shipping. I thought at first it was the diff brand PSU's being used, but they are excellent brand/models and should not be causing any issues.
Is the die's slowly gaining amperage a possibility of heatsinks getting knocked loose in shipping, or do you suspect something else?
I will be able to test your fw on the units tomorrow and provide results. In the mean time I would like to brainstorm possibilities for resolutions.
My firmware is not designed to fix anything that is related to actually damaged dies. You can give it a shot but any dies which misbehave please flag as OFF or to the speed they actually do run. It sounds like the one thats climbing in temp maybe needs HSF reseated, that is of course if only its the dies that are overheating.
|
|
|
|
vegasguy
Legendary
Offline
Activity: 1610
Merit: 1003
"Yobit pump alert software" Link in my signature!
|
|
August 13, 2015, 10:03:08 PM |
|
Hi GenTarkin can you please list all the changes from the original img from KNCMINER 2.0 to your latest image? Please list it here or preferably on the github?
Thanks Vegas
|
I want to make sure everyone knows that I just released my software called "Yobit pump alert". THis is custom software that uses an algo to detect the start of a pump here on yobit, the second it starts. YOu can even filter the coins you see by price. Most pumps start less than 100 sats , so you can easily filter the cheap coins, so they are the only ones displayed https://bitcointalk.org/index.php?topic=1945937.msg20241953#msg20241953
|
|
|
TXSteve
|
|
August 13, 2015, 10:10:34 PM |
|
If GenTarkin or anyone else could provide input, I would appreciate it.
I have a two titans that were working solid, one was 330 the other 320, 1 die dead on one 2 dies dead on the 2nd. After shipment, there are 4 dead die's, 1 half working die @ 200mhz, and 2-3 dies that work when reset, but slowly gain in amperage from standard area around 38-42 (300-325) all the way up to around 52(and over 100C) when they cutoff and everything goes 0.00. I am unsure how this occurred as everything was working fine prior to shipping. I thought at first it was the diff brand PSU's being used, but they are excellent brand/models and should not be causing any issues.
Is the die's slowly gaining amperage a possibility of heatsinks getting knocked loose in shipping, or do you suspect something else?
I will be able to test your fw on the units tomorrow and provide results. In the mean time I would like to brainstorm possibilities for resolutions.
sounds like the thermal grease under the heatsink may need to be replaced
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 13, 2015, 10:11:26 PM |
|
Hi GenTarkin can you please list all the changes from the original img from KNCMINER 2.0 to your latest image? Please list it here or preferably on the github?
Thanks Vegas
Well, for my latest changes including the smarter soft / hard reset ... I havent made a new release out of those. But if you go through all the releases on my github, I have notes for all the changes made since the previous release. At some point when the current test is confirmed working Ill make a new release and post change notes to it =)
|
|
|
|
BTC_Z1
Newbie
Offline
Activity: 2
Merit: 0
|
|
August 14, 2015, 08:40:00 AM |
|
If GenTarkin or anyone else could provide input, I would appreciate it.
I have a two titans that were working solid, one was 330 the other 320, 1 die dead on one 2 dies dead on the 2nd. After shipment, there are 4 dead die's, 1 half working die @ 200mhz, and 2-3 dies that work when reset, but slowly gain in amperage from standard area around 38-42 (300-325) all the way up to around 52(and over 100C) when they cutoff and everything goes 0.00. I am unsure how this occurred as everything was working fine prior to shipping. I thought at first it was the diff brand PSU's being used, but they are excellent brand/models and should not be causing any issues.
Is the die's slowly gaining amperage a possibility of heatsinks getting knocked loose in shipping, or do you suspect something else?
I will be able to test your fw on the units tomorrow and provide results. In the mean time I would like to brainstorm possibilities for resolutions.
After suffering a similar issue with a Titan I bought recently I decided to contact KnC and told them the symptoms and they asked me to check the heatsink. There is a rivet at the back of the cube which holds a screw in place, on mine that was pushed out leaving the heat sink loose at the back. I just used a slightly longer screw with a washer and nut on the outside of the case and the dies then worked as they should. Might also be worth refreshing the CPU paste as well, mine had dried up from the heat build up with the heat sink separated from the chip.
|
|
|
|
vegasguy
Legendary
Offline
Activity: 1610
Merit: 1003
"Yobit pump alert software" Link in my signature!
|
|
August 14, 2015, 12:56:25 PM |
|
its your power supply. These units are VERY finiky. If you want NO power supply issues. Use 2 X EVGA 850 W. (for batch 1), and 3X EVGA 850 for batch 2. As a computer repair shop owner I stock about 15 different kinds of power supplies. I was fortunate enough to be in a position to test as many power supplies as I want. This is the ONLY perfect config. Put the PCI-E cables in and X config when you plug them into your power supply. If you are low on funds try any other power supply maybe just 1 cube by itself. These cubes do not turn off because of heat, they will usually burn themselves up first. The only exception MIGHT be if the actual processor in the middle is not even touching the processor in the middle. But ive left the heatsink by itself without any bracket just sitting on top of the processor for hours and it doesnt go above 50C, dont try this though. I DID have VRMs covered. Its your psu. Send your Donation to Tarkin, not me.
Vegas
|
I want to make sure everyone knows that I just released my software called "Yobit pump alert". THis is custom software that uses an algo to detect the start of a pump here on yobit, the second it starts. YOu can even filter the coins you see by price. Most pumps start less than 100 sats , so you can easily filter the cheap coins, so they are the only ones displayed https://bitcointalk.org/index.php?topic=1945937.msg20241953#msg20241953
|
|
|
TXSteve
|
|
August 14, 2015, 02:10:11 PM |
|
GenTarkin -- looks like after one soft reset fail it is going straight to hard reset not the loop.
btw how can I delete this log and start over?
[2015-08-14 08:39:44] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439559594,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:40:24] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:45:38] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } Soft reset failed, initiatng hard reset Stopping bfgminer. Power cycling ASIC# 5 INFO: Attempt to power down dc/dc INFO: Attempt to power UP dc/dc Starting bfgminer. [2015-08-14 08:46:45] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 14, 2015, 02:33:44 PM |
|
GenTarkin -- looks like after one soft reset fail it is going straight to hard reset not the loop.
btw how can I delete this log and start over?
[2015-08-14 08:39:44] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439559594,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:40:24] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:45:38] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } Soft reset failed, initiatng hard reset Stopping bfgminer. Power cycling ASIC# 5 INFO: Attempt to power down dc/dc INFO: Attempt to power UP dc/dc Starting bfgminer. [2015-08-14 08:46:45] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [
Thats intentional, if it fails a soft reset issuing the waas -s command, it will immediately perform hard reset. The reason is, from watching my titan, in all cases where it failed waas -s it could never be brought back unless a hard reset was performed. Its working as I designed it =) ... Remember, the waas -s command should work even if there is pool issues, therfore like I said, waas -s failing needs a hard reset. KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } ^ means waas -s failed. I guess, if it really is problematic for your rig, I could have it just skip the waas -s failed / success checking and simply rely on current readings from the DCDC's to measure if soft resets worked on the die. But that could be potentially time loss because its running the loop for 5 minutes. Can you confirm if just soft reset spamming this die actually brings it back, 100% of the time? If so, Ill upload a mod for ya that disables waas -s success/fail checking. Also, regarding logs, I just looked at the log directory, the monitordcdc.log is not getting large for me. So, you shouldnt have to worry bout space consumption(of course this depends on how misbehaving ur dies are in a given timeframe =P. I think they get cycled out too though =). If 'catting' the log file is too large for you then just use tail to return n lines from the end of it like tail -n 25 /var/log/monitordcdc.log
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 14, 2015, 02:50:01 PM |
|
So, I got a question for yall, in the status page. What is the "Temperature" column? on my titan its substantially lower than DCDC temp average. Is it the ASIC temp or some sort of ambient temp? I dont see how it would be the ASIC itself.... since mines so cool and I would expect the chips to run quite hot.
If its the actual ASIC temp, I *may* be able to code in a temperature protection for the chip. But if its not then, I wouldnt be able to.
|
|
|
|
vegasguy
Legendary
Offline
Activity: 1610
Merit: 1003
"Yobit pump alert software" Link in my signature!
|
|
August 14, 2015, 02:53:04 PM |
|
Ive got this firmware running on multiple full sets of titans and its flawless.
Vegas
|
I want to make sure everyone knows that I just released my software called "Yobit pump alert". THis is custom software that uses an algo to detect the start of a pump here on yobit, the second it starts. YOu can even filter the coins you see by price. Most pumps start less than 100 sats , so you can easily filter the cheap coins, so they are the only ones displayed https://bitcointalk.org/index.php?topic=1945937.msg20241953#msg20241953
|
|
|
TXSteve
|
|
August 14, 2015, 02:59:11 PM |
|
GenTarkin -- looks like after one soft reset fail it is going straight to hard reset not the loop.
btw how can I delete this log and start over?
[2015-08-14 08:39:44] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439559594,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:40:24] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:45:38] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } Soft reset failed, initiatng hard reset Stopping bfgminer. Power cycling ASIC# 5 INFO: Attempt to power down dc/dc INFO: Attempt to power UP dc/dc Starting bfgminer. [2015-08-14 08:46:45] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [
Thats intentional, if it fails a soft reset issuing the waas -s command, it will immediately perform hard reset. The reason is, from watching my titan, in all cases where it failed waas -s it could never be brought back unless a hard reset was performed. Its working as I designed it =) ... Remember, the waas -s command should work even if there is pool issues, therfore like I said, waas -s failing needs a hard reset. KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } ^ means waas -s failed. I guess, if it really is problematic for your rig, I could have it just skip the waas -s failed / success checking and simply rely on current readings from the DCDC's to measure if soft resets worked on the die. But that could be potentially time loss because its running the loop for 5 minutes. Can you confirm if just soft reset spamming this die actually brings it back, 100% of the time? If so, Ill upload a mod for ya that disables waas -s success/fail checking. Also, regarding logs, I just looked at the log directory, the monitordcdc.log is not getting large for me. So, you shouldnt have to worry bout space consumption(of course this depends on how misbehaving ur dies are in a given timeframe =P. I think they get cycled out too though =). If 'catting' the log file is too large for you then just use tail to return n lines from the end of it like tail -n 25 /var/log/monitordcdc.log spamming it with soft resets sometimes does bring it back, that is the default way that KNC handles it so I have seen them come back, surprisingly Also the soft resets are occurring about 6-7 min apart -- it's like this up the whole log, everywhere I checked. Seems odd to be that regular, isn't 6-7 min about the time it takes to run thru the loop?? thx for the tip: tail -n 25 /var/log/monitordcdc.log [2015-08-14 08:10:24] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439557835,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:11:05] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:16:22] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439558190,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:17:00] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:22:09] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439558537,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:22:47] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:28:01] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439558890,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:28:41] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:33:52] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439559239,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:34:29] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:39:44] Die 5-1 requires restart
|
|
|
|
TXSteve
|
|
August 14, 2015, 03:20:22 PM |
|
GenTarkin I sent you .25 btc more because I am working you so hard lol
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 14, 2015, 04:27:57 PM |
|
GenTarkin -- looks like after one soft reset fail it is going straight to hard reset not the loop.
btw how can I delete this log and start over?
[2015-08-14 08:39:44] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439559594,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:40:24] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:45:38] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } Soft reset failed, initiatng hard reset Stopping bfgminer. Power cycling ASIC# 5 INFO: Attempt to power down dc/dc INFO: Attempt to power UP dc/dc Starting bfgminer. [2015-08-14 08:46:45] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [
Thats intentional, if it fails a soft reset issuing the waas -s command, it will immediately perform hard reset. The reason is, from watching my titan, in all cases where it failed waas -s it could never be brought back unless a hard reset was performed. Its working as I designed it =) ... Remember, the waas -s command should work even if there is pool issues, therfore like I said, waas -s failing needs a hard reset. KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } ^ means waas -s failed. I guess, if it really is problematic for your rig, I could have it just skip the waas -s failed / success checking and simply rely on current readings from the DCDC's to measure if soft resets worked on the die. But that could be potentially time loss because its running the loop for 5 minutes. Can you confirm if just soft reset spamming this die actually brings it back, 100% of the time? If so, Ill upload a mod for ya that disables waas -s success/fail checking. Also, regarding logs, I just looked at the log directory, the monitordcdc.log is not getting large for me. So, you shouldnt have to worry bout space consumption(of course this depends on how misbehaving ur dies are in a given timeframe =P. I think they get cycled out too though =). If 'catting' the log file is too large for you then just use tail to return n lines from the end of it like tail -n 25 /var/log/monitordcdc.log spamming it with soft resets sometimes does bring it back, that is the default way that KNC handles it so I have seen them come back, surprisingly Also the soft resets are occurring about 6-7 min apart -- it's like this up the whole log, everywhere I checked. Seems odd to be that regular, isn't 6-7 min about the time it takes to run thru the loop?? thx for the tip: tail -n 25 /var/log/monitordcdc.log [2015-08-14 08:10:24] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439557835,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:11:05] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:16:22] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439558190,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:17:00] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:22:09] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439558537,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:22:47] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:28:01] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439558890,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:28:41] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:33:52] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439559239,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:34:29] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:39:44] Die 5-1 requires restart I just thought of something based on your log... You know what?! .... KNC's main loop that tests DCDC current does a DCDC 0 / 1 OR comparison as well. In your case KNC's loop is sending your 1 failed DCDC on that die to my loop. So that particular die is ALWAYS gonna get at least a soft reset. If I change KNC's loop to do an AND comparison like my loop then that die should work just fine & dandy at 150 w/o a soft loop ever being needed until both DCDC's drop below the threshold. The question is... what affect would that cause on other peoples titans or KNC's original loop intent. I cant think of any side affects atm. We know that, so far, both DCDC's register near 0 current when the die needs a soft or hard reset. I think in EVERY case, if a die was humming along just fine and one of the DCDC's stopped for w/e reason ... the other DCDC would drop because the die wouldnt be hashing at all in that case. So ... I THINK doing an AND comparison of the DCDC's in KNC's loop "should" work. Thoughts? Thank you SO MUCH!!! for the donation man! I greatly greatly appreciate it =)
|
|
|
|
TXSteve
|
|
August 14, 2015, 05:40:01 PM |
|
GenTarkin -- looks like after one soft reset fail it is going straight to hard reset not the loop.
btw how can I delete this log and start over?
[2015-08-14 08:39:44] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439559594,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:40:24] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:45:38] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } Soft reset failed, initiatng hard reset Stopping bfgminer. Power cycling ASIC# 5 INFO: Attempt to power down dc/dc INFO: Attempt to power UP dc/dc Starting bfgminer. [2015-08-14 08:46:45] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [
Thats intentional, if it fails a soft reset issuing the waas -s command, it will immediately perform hard reset. The reason is, from watching my titan, in all cases where it failed waas -s it could never be brought back unless a hard reset was performed. Its working as I designed it =) ... Remember, the waas -s command should work even if there is pool issues, therfore like I said, waas -s failing needs a hard reset. KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } ^ means waas -s failed. I guess, if it really is problematic for your rig, I could have it just skip the waas -s failed / success checking and simply rely on current readings from the DCDC's to measure if soft resets worked on the die. But that could be potentially time loss because its running the loop for 5 minutes. Can you confirm if just soft reset spamming this die actually brings it back, 100% of the time? If so, Ill upload a mod for ya that disables waas -s success/fail checking. Also, regarding logs, I just looked at the log directory, the monitordcdc.log is not getting large for me. So, you shouldnt have to worry bout space consumption(of course this depends on how misbehaving ur dies are in a given timeframe =P. I think they get cycled out too though =). If 'catting' the log file is too large for you then just use tail to return n lines from the end of it like tail -n 25 /var/log/monitordcdc.log spamming it with soft resets sometimes does bring it back, that is the default way that KNC handles it so I have seen them come back, surprisingly Also the soft resets are occurring about 6-7 min apart -- it's like this up the whole log, everywhere I checked. Seems odd to be that regular, isn't 6-7 min about the time it takes to run thru the loop?? thx for the tip: tail -n 25 /var/log/monitordcdc.log [2015-08-14 08:10:24] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439557835,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:11:05] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:16:22] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439558190,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:17:00] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:22:09] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439558537,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:22:47] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:28:01] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439558890,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:28:41] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:33:52] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439559239,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:34:29] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:39:44] Die 5-1 requires restart I just thought of something based on your log... You know what?! .... KNC's main loop that tests DCDC current does a DCDC 0 / 1 OR comparison as well. In your case KNC's loop is sending your 1 failed DCDC on that die to my loop. So that particular die is ALWAYS gonna get at least a soft reset. If I change KNC's loop to do an AND comparison like my loop then that die should work just fine & dandy at 150 w/o a soft loop ever being needed until both DCDC's drop below the threshold. The question is... what affect would that cause on other peoples titans or KNC's original loop intent. I cant think of any side affects atm. We know that, so far, both DCDC's register near 0 current when the die needs a soft or hard reset. I think in EVERY case, if a die was humming along just fine and one of the DCDC's stopped for w/e reason ... the other DCDC would drop because the die wouldnt be hashing at all in that case. So ... I THINK doing an AND comparison of the DCDC's in KNC's loop "should" work. Thoughts? Thank you SO MUCH!!! for the donation man! I greatly greatly appreciate it =) I couldn't think of a reason why it would be going thru your loop, however that makes a lot of sense. Let's try it, I have 3 good cubes, a cube with 2 dead dies, and the soft reset cube on this rig, so we should be able to test it pretty good. there's also a hard reset cube on this rig every few days. I am going to make a back up image in case it goes haywire, but I doubt I'll need it
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 14, 2015, 05:41:41 PM |
|
GenTarkin -- looks like after one soft reset fail it is going straight to hard reset not the loop.
btw how can I delete this log and start over?
[2015-08-14 08:39:44] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439559594,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:40:24] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:45:38] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } Soft reset failed, initiatng hard reset Stopping bfgminer. Power cycling ASIC# 5 INFO: Attempt to power down dc/dc INFO: Attempt to power UP dc/dc Starting bfgminer. [2015-08-14 08:46:45] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [
Thats intentional, if it fails a soft reset issuing the waas -s command, it will immediately perform hard reset. The reason is, from watching my titan, in all cases where it failed waas -s it could never be brought back unless a hard reset was performed. Its working as I designed it =) ... Remember, the waas -s command should work even if there is pool issues, therfore like I said, waas -s failing needs a hard reset. KnC: Frequency change FAILED! { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } ^ means waas -s failed. I guess, if it really is problematic for your rig, I could have it just skip the waas -s failed / success checking and simply rely on current readings from the DCDC's to measure if soft resets worked on the die. But that could be potentially time loss because its running the loop for 5 minutes. Can you confirm if just soft reset spamming this die actually brings it back, 100% of the time? If so, Ill upload a mod for ya that disables waas -s success/fail checking. Also, regarding logs, I just looked at the log directory, the monitordcdc.log is not getting large for me. So, you shouldnt have to worry bout space consumption(of course this depends on how misbehaving ur dies are in a given timeframe =P. I think they get cycled out too though =). If 'catting' the log file is too large for you then just use tail to return n lines from the end of it like tail -n 25 /var/log/monitordcdc.log spamming it with soft resets sometimes does bring it back, that is the default way that KNC handles it so I have seen them come back, surprisingly Also the soft resets are occurring about 6-7 min apart -- it's like this up the whole log, everywhere I checked. Seems odd to be that regular, isn't 6-7 min about the time it takes to run thru the loop?? thx for the tip: tail -n 25 /var/log/monitordcdc.log [2015-08-14 08:10:24] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439557835,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:11:05] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:16:22] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439558190,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:17:00] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:22:09] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439558537,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:22:47] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:28:01] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439558890,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:28:41] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:33:52] Die 5-1 requires restart Attempting softreset of ASIC# 5 DIE# 1 { "asic_6_voltage": { "die2": "-0.0366" }, "asic_6_frequency": { "die2": "150" } } STATUS=S,When=1439559239,Code=92,Msg=PGA 0 set OK: Die setup Ok; asic 5 die 1 cmd RECONFIGURE,Description=bfgminer 5.2.0| [2015-08-14 08:34:29] Die 5-1 restarted Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 2 Manually disabled die detected, skipping dead die detection. ASIC# 4, DIE# 4 Moving on with dead die test, no manual disabled die found [2015-08-14 08:39:44] Die 5-1 requires restart I just thought of something based on your log... You know what?! .... KNC's main loop that tests DCDC current does a DCDC 0 / 1 OR comparison as well. In your case KNC's loop is sending your 1 failed DCDC on that die to my loop. So that particular die is ALWAYS gonna get at least a soft reset. If I change KNC's loop to do an AND comparison like my loop then that die should work just fine & dandy at 150 w/o a soft loop ever being needed until both DCDC's drop below the threshold. The question is... what affect would that cause on other peoples titans or KNC's original loop intent. I cant think of any side affects atm. We know that, so far, both DCDC's register near 0 current when the die needs a soft or hard reset. I think in EVERY case, if a die was humming along just fine and one of the DCDC's stopped for w/e reason ... the other DCDC would drop because the die wouldnt be hashing at all in that case. So ... I THINK doing an AND comparison of the DCDC's in KNC's loop "should" work. Thoughts? Thank you SO MUCH!!! for the donation man! I greatly greatly appreciate it =) I couldn't think of a reason why it would be going thru your loop, however that makes a lot of sense. Let's try it, I have 3 good cubes, a cube with 2 dead dies, and the soft reset cube on this rig, so we should be able to test it pretty good. there's also a hard reset cube on this rig every few days. I am going to make a back up image in case it goes haywire, but I doubt I'll need it Alright, Ill let ya know when to do a git pull =)
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 14, 2015, 06:28:43 PM |
|
Alright, do perform a git pull n test it out. Basically, if I have it set correctly, Ive modified KNC's loop to see if either DCDC are supplying current, if so then it doesnt kick it over to my loop(the custom resetting loop)
If all is correct, it will only kick over to my custom reset loop when both dcdc's on a die are registering current below the threshold of 5 amps.
|
|
|
|
TXSteve
|
|
August 14, 2015, 06:32:55 PM |
|
Alright, do perform a git pull n test it out. Basically, if I have it set correctly, Ive modified KNC's loop to see if either DCDC are supplying current, if so then it doesnt kick it over to my loop(the custom resetting loop)
If all is correct, it will only kick over to my custom reset loop when both dcdc's on a die are registering current below the threshold of 5 amps.
ok, I'll check it now
|
|
|
|
|