GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 12, 2015, 03:42:09 AM Last edit: August 12, 2015, 04:21:28 AM by GenTarkin |
|
https://github.com/GenTarkin/Titan/releases/tag/v.93New release published!! ---details--- Login through SSH & webgui is now: admin/admin (should be anyways LOL! Hope I updated it correctly) =P test it peoples! =) If soft die reset fails then initiate hard reset sequence (power off cube, restart bfgminer) Instead of setting Dies w/ overheating DCDC's to OFF, now scales down 25mhz each check until DCDC's are under temp threshold.. If goes to 100 then sets die to OFF. Added more temps to temp threshold setting, including all numbers between 90 & 95 *default DCDC temp monitoring settings are: ENABLED / 90C ---details--- Please test everyone, Ill fix as issues arise. Please Please donate =) Helps fuel my motivation to continue improving upon stuff =)
|
|
|
|
Searing
Copper Member
Legendary
Offline
Activity: 2898
Merit: 1465
Clueless!
|
|
August 12, 2015, 05:28:56 AM |
|
https://github.com/GenTarkin/Titan/releases/tag/v.93New release published!! ---details--- Login through SSH & webgui is now: admin/admin (should be anyways LOL! Hope I updated it correctly) =P test it peoples! =) If soft die reset fails then initiate hard reset sequence (power off cube, restart bfgminer) Instead of setting Dies w/ overheating DCDC's to OFF, now scales down 25mhz each check until DCDC's are under temp threshold.. If goes to 100 then sets die to OFF. Added more temps to temp threshold setting, including all numbers between 90 & 95 *default DCDC temp monitoring settings are: ENABLED / 90C ---details--- Please test everyone, Ill fix as issues arise. Please Please donate =) Helps fuel my motivation to continue improving upon stuff =) heh you keep adding stuff I need before i can get to installing it due to work at the end of the month...heh all suggestions I needed heh anyway gotta buy me some coin on coinbase and shoot you some again when i can get off work enough to test this and or at least get you some btc (main hoard is in paper wallet in safety deposit box) I'm sure there are more then a few of us that will trickle you some more btc again i'm sure i include everyone we appreciate your efforts (by the by all my posts till the end of the month will be away from miners at work...no joy to play with toys)
|
Old Style Legacy Plug & Play BBS System. Get it from www.synchro.net. Updated 1/1/2021. It also works with Windows 10 and likely 11 and allows 16 bit DOS game doors on the same Win 10 Machine in Multi-Node! Five Minute Install! Look it over it uninstalls just as fast, if you simply want to look it over. Freeware! Full BBS System! It is a frigging hoot!:)
|
|
|
xstr8guy
|
|
August 12, 2015, 08:42:30 AM |
|
I can check it out pretty quickly
btw, it doesn't seem like the max temp is working, I have it set for 90c and a die hit 92c, nothing happened
Testing a rearrangement & rewrite of hard reset detection. Will have to wait till mine actually needs resetting to see if it works. If it does, it should differentiate between soft reset success vs fail and then applying hard power reset to cube if needed. My Titan doesnt experience successfull soft resets. So, will need someone to test it out once I verify the soft reset fail then hard reset works. It doesnt happen instantly. Think it loops every 4 seconds. Ill test it here in a lil while. I dont see why it wouldnt work but Ill double check =) Grr nevermind, somehow it stopped writing to the config file. Ill have to look into it later. Not sure how it broke =P Probably a typo somewhere lol well I had 1 hard reset work flawlessly I'll pledge .5 btc for your efforts, if you can just drop the MHz from 325 to 300, instead of turning the die off. Also can you add another temp cut-off 93c -- I manually turn them down around 92/93 - it's usually at those temps for only a few hours, and haven't had any problems [/quote] Hrm....ok well I found an issue w/ the changes but only as of this morning when I started editing the code again, these erronous edits did not make it into my latest release. I fixed the issues but the release u downloaded should still have worked properly. Any chance you can paste the contents of /var/log/monitordcdc.log when the thermal trip doesnt work for you? It would require ssh'n into the pi and copying the contents of that file to a text file. The test works perfect on my box when I set the temp threshold to 70....(added as a testing temp =) ) I dont even have to hit refresh on the advanced settings page, I can see the dies get turned to 0's that go over threshold, after bfgminer restarts. Also, yes when I have more time, now that I see how KNC updates clocks without needing to restart bfgminer, I will implement a soft clock scale down w/o needing a bfgminer restart. =) I will also put 92/93C in there for ya =) [/quote] [/quote] Fix the damn broken quotes! It's unreadable.
|
|
|
|
TXSteve
|
|
August 12, 2015, 09:41:23 AM Last edit: August 12, 2015, 11:51:02 AM by TXSteve |
|
https://github.com/GenTarkin/Titan/releases/tag/v.93New release published!! ---details--- Login through SSH & webgui is now: admin/admin (should be anyways LOL! Hope I updated it correctly) =P test it peoples! =) If soft die reset fails then initiate hard reset sequence (power off cube, restart bfgminer) Instead of setting Dies w/ overheating DCDC's to OFF, now scales down 25mhz each check until DCDC's are under temp threshold.. If goes to 100 then sets die to OFF. Added more temps to temp threshold setting, including all numbers between 90 & 95 *default DCDC temp monitoring settings are: ENABLED / 90C ---details--- Please test everyone, Ill fix as issues arise. Please Please donate =) Helps fuel my motivation to continue improving upon stuff =) instead of burning the new img file I tried doing a git pull: cd knc-asic git stash save --keep-index (didn't need this line on all rigs) git pull cd ./update-webgui.sh seems to work but did I miss anything??
|
|
|
|
helmax
|
|
August 12, 2015, 11:23:11 AM |
|
anyone have full image SD card 1.5 GB for neptune ?
|
looking job
|
|
|
jelin1984
Legendary
Offline
Activity: 2408
Merit: 1004
|
|
August 12, 2015, 01:13:00 PM |
|
Can you make your firmware Work
With rasberry pi 2. Version?
That will be great
Titan firmware
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 12, 2015, 02:36:16 PM |
|
https://github.com/GenTarkin/Titan/releases/tag/v.93New release published!! ---details--- Login through SSH & webgui is now: admin/admin (should be anyways LOL! Hope I updated it correctly) =P test it peoples! =) If soft die reset fails then initiate hard reset sequence (power off cube, restart bfgminer) Instead of setting Dies w/ overheating DCDC's to OFF, now scales down 25mhz each check until DCDC's are under temp threshold.. If goes to 100 then sets die to OFF. Added more temps to temp threshold setting, including all numbers between 90 & 95 *default DCDC temp monitoring settings are: ENABLED / 90C ---details--- Please test everyone, Ill fix as issues arise. Please Please donate =) Helps fuel my motivation to continue improving upon stuff =) instead of burning the new img file I tried doing a git pull: cd knc-asic git stash save --keep-index (didn't need this line on all rigs) git pull cd ./update-webgui.sh seems to work but did I miss anything?? U would want to do the git pull from /home/pi (default dir that u log into via ssh) ... cuz otherwise the changed webpages wont download. Then yeah, run the update-webgui.sh and u should ... *should* be set ... haha (reapply desired temp threshold settings via webgui) .. mine defaults to ON/90
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 12, 2015, 02:36:52 PM |
|
Can you make your firmware Work
With rasberry pi 2. Version?
That will be great
Titan firmware
I only have a pi to work on, so no.
|
|
|
|
TXSteve
|
|
August 12, 2015, 02:57:52 PM Last edit: August 12, 2015, 03:11:16 PM by TXSteve |
|
https://github.com/GenTarkin/Titan/releases/tag/v.93New release published!! ---details--- Login through SSH & webgui is now: admin/admin (should be anyways LOL! Hope I updated it correctly) =P test it peoples! =) If soft die reset fails then initiate hard reset sequence (power off cube, restart bfgminer) Instead of setting Dies w/ overheating DCDC's to OFF, now scales down 25mhz each check until DCDC's are under temp threshold.. If goes to 100 then sets die to OFF. Added more temps to temp threshold setting, including all numbers between 90 & 95 *default DCDC temp monitoring settings are: ENABLED / 90C ---details--- Please test everyone, Ill fix as issues arise. Please Please donate =) Helps fuel my motivation to continue improving upon stuff =) instead of burning the new img file I tried doing a git pull: cd knc-asic git stash save --keep-index (didn't need this line on all rigs) git pull cd ./update-webgui.sh seems to work but did I miss anything?? everything is upgraded and running fine so far, just a couple observations: -- the temp throttling is sweet, it doesn't even reboot bfgminer, nice!! -- you might want to implement a delay of a few min or so before triggering a hard reset because: a) a soft reset sometimes needs a few tries before it works and this will minimizes bfgminer restarts -- when the rig is rented and the customer is using an unstable pool that takes forever for vardif to adjust & stabilize, frequent restarts are particularly troublesome b) a delay will be needed to optimize voltages & MHz, and/or to monitor which die is triggering the resets anyway I'll send .5 btc to 1Px71mWNQNKW19xuARqrmnbcem1dXqJ3At (sent) thx again, nice work
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 12, 2015, 03:31:48 PM Last edit: August 12, 2015, 03:44:38 PM by GenTarkin |
|
https://github.com/GenTarkin/Titan/releases/tag/v.93New release published!! ---details--- Login through SSH & webgui is now: admin/admin (should be anyways LOL! Hope I updated it correctly) =P test it peoples! =) If soft die reset fails then initiate hard reset sequence (power off cube, restart bfgminer) Instead of setting Dies w/ overheating DCDC's to OFF, now scales down 25mhz each check until DCDC's are under temp threshold.. If goes to 100 then sets die to OFF. Added more temps to temp threshold setting, including all numbers between 90 & 95 *default DCDC temp monitoring settings are: ENABLED / 90C ---details--- Please test everyone, Ill fix as issues arise. Please Please donate =) Helps fuel my motivation to continue improving upon stuff =) instead of burning the new img file I tried doing a git pull: cd knc-asic git stash save --keep-index (didn't need this line on all rigs) git pull cd ./update-webgui.sh seems to work but did I miss anything?? everything is upgraded and running fine so far, just a couple observations: -- the temp throttling is sweet, it doesn't even reboot bfgminer, nice!! -- you might want to implement a delay of a few min or so before triggering a hard reset because: a) a soft reset sometimes needs a few tries before it works and this will minimizes bfgminer restarts -- when the rig is rented and the customer is using an unstable pool that takes forever for vardif to adjust & stabilize, frequent restarts are particularly troublesome b) a delay will be needed to optimize voltages & MHz, and/or to monitor which die is triggering the resets anyway I'll send .5 btc to 1Px71mWNQNKW19xuARqrmnbcem1dXqJ3At (sent) thx again, nice work AWESOME !!! Thanks a ton!!! I just uploaded another change for webgui, it now shows bfgminer version in status screen. Ill look into the delay for ya =) Ill get to working on auto upscaling of cores that previously were downclocked. Have a busy schedule coming up so may not be released as quickly and this is fairly complex =) Regarding the soft reset, do you know where the soft reset actually fails? during the waas -s command or when bfgminer is told to reconfigure...? When u see this behaviour happen can you post the relevant contents of /var/log/monitordcdc.log? That way I can see exactly what needs delayed.(or tried a few times) If I had to guess, soft resets I check to see when they fail the waas command. I base the success / fail of that on whether a hard reset needs to be issued. So, I could do a timed loop of say up to 5 soft resets(on like a couple second timer) via waas command and if they all fail then perform hard reset, the first one that passes it exits loop then proceeds to tell BFGminer to do its die reconfigure. *NOT: The waas command has to succeed before BFGminer will show a "die successfully configured" message. How that sound?
|
|
|
|
TXSteve
|
|
August 12, 2015, 04:28:10 PM |
|
https://github.com/GenTarkin/Titan/releases/tag/v.93New release published!! ---details--- Login through SSH & webgui is now: admin/admin (should be anyways LOL! Hope I updated it correctly) =P test it peoples! =) If soft die reset fails then initiate hard reset sequence (power off cube, restart bfgminer) Instead of setting Dies w/ overheating DCDC's to OFF, now scales down 25mhz each check until DCDC's are under temp threshold.. If goes to 100 then sets die to OFF. Added more temps to temp threshold setting, including all numbers between 90 & 95 *default DCDC temp monitoring settings are: ENABLED / 90C ---details--- Please test everyone, Ill fix as issues arise. Please Please donate =) Helps fuel my motivation to continue improving upon stuff =) instead of burning the new img file I tried doing a git pull: cd knc-asic git stash save --keep-index (didn't need this line on all rigs) git pull cd ./update-webgui.sh seems to work but did I miss anything?? everything is upgraded and running fine so far, just a couple observations: -- the temp throttling is sweet, it doesn't even reboot bfgminer, nice!! -- you might want to implement a delay of a few min or so before triggering a hard reset because: a) a soft reset sometimes needs a few tries before it works and this will minimizes bfgminer restarts -- when the rig is rented and the customer is using an unstable pool that takes forever for vardif to adjust & stabilize, frequent restarts are particularly troublesome b) a delay will be needed to optimize voltages & MHz, and/or to monitor which die is triggering the resets anyway I'll send .5 btc to 1Px71mWNQNKW19xuARqrmnbcem1dXqJ3At (sent) thx again, nice work AWESOME !!! Thanks a ton!!! I just uploaded another change for webgui, it now shows bfgminer version in status screen. Ill look into the delay for ya =) Ill get to working on auto upscaling of cores that previously were downclocked. Have a busy schedule coming up so may not be released as quickly and this is fairly complex =) Regarding the soft reset, do you know where the soft reset actually fails? during the waas -s command or when bfgminer is told to reconfigure...? When u see this behaviour happen can you post the relevant contents of /var/log/monitordcdc.log? That way I can see exactly what needs delayed.(or tried a few times) If I had to guess, soft resets I check to see when they fail the waas command. I base the success / fail of that on whether a hard reset needs to be issued. So, I could do a timed loop of say up to 5 soft resets(on like a couple second timer) via waas command and if they all fail then perform hard reset, the first one that passes it exits loop then proceeds to tell BFGminer to do its die reconfigure. *NOT: The waas command has to succeed before BFGminer will show a "die successfully configured" message. How that sound? no, I don't know where the soft reset actually fails, I get the standard "die configuration failed" message and it tries again 20 or 30 sec later. If I can catch it I'll get the log file. I am using awesome miner to trigger hashrate alerts and can then monitor what's happening, but if the hard reset happens too quickly it doesn't trigger -- it instead triggers the rig offline alert, but then it's too late to see what happened what you suggested sounds good to me
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 12, 2015, 04:51:50 PM |
|
https://github.com/GenTarkin/Titan/releases/tag/v.93New release published!! ---details--- Login through SSH & webgui is now: admin/admin (should be anyways LOL! Hope I updated it correctly) =P test it peoples! =) If soft die reset fails then initiate hard reset sequence (power off cube, restart bfgminer) Instead of setting Dies w/ overheating DCDC's to OFF, now scales down 25mhz each check until DCDC's are under temp threshold.. If goes to 100 then sets die to OFF. Added more temps to temp threshold setting, including all numbers between 90 & 95 *default DCDC temp monitoring settings are: ENABLED / 90C ---details--- Please test everyone, Ill fix as issues arise. Please Please donate =) Helps fuel my motivation to continue improving upon stuff =) instead of burning the new img file I tried doing a git pull: cd knc-asic git stash save --keep-index (didn't need this line on all rigs) git pull cd ./update-webgui.sh seems to work but did I miss anything?? everything is upgraded and running fine so far, just a couple observations: -- the temp throttling is sweet, it doesn't even reboot bfgminer, nice!! -- you might want to implement a delay of a few min or so before triggering a hard reset because: a) a soft reset sometimes needs a few tries before it works and this will minimizes bfgminer restarts -- when the rig is rented and the customer is using an unstable pool that takes forever for vardif to adjust & stabilize, frequent restarts are particularly troublesome b) a delay will be needed to optimize voltages & MHz, and/or to monitor which die is triggering the resets anyway I'll send .5 btc to 1Px71mWNQNKW19xuARqrmnbcem1dXqJ3At (sent) thx again, nice work AWESOME !!! Thanks a ton!!! I just uploaded another change for webgui, it now shows bfgminer version in status screen. Ill look into the delay for ya =) Ill get to working on auto upscaling of cores that previously were downclocked. Have a busy schedule coming up so may not be released as quickly and this is fairly complex =) Regarding the soft reset, do you know where the soft reset actually fails? during the waas -s command or when bfgminer is told to reconfigure...? When u see this behaviour happen can you post the relevant contents of /var/log/monitordcdc.log? That way I can see exactly what needs delayed.(or tried a few times) If I had to guess, soft resets I check to see when they fail the waas command. I base the success / fail of that on whether a hard reset needs to be issued. So, I could do a timed loop of say up to 5 soft resets(on like a couple second timer) via waas command and if they all fail then perform hard reset, the first one that passes it exits loop then proceeds to tell BFGminer to do its die reconfigure. *NOT: The waas command has to succeed before BFGminer will show a "die successfully configured" message. How that sound? no, I don't know where the soft reset actually fails, I get the standard "die configuration failed" message and it tries again 20 or 30 sec later. If I can catch it I'll get the log file. I am using awesome miner to trigger hashrate alerts and can then monitor what's happening, but if the hard reset happens too quickly it doesn't trigger -- it instead triggers the rig offline alert, but then it's too late to see what happened what you suggested sounds good to me Ok, cool yeah "die configuration failed" .. if I would have to assume, means the waas soft reset has failed. At least, when my rig requires a hard reset ... thats the message I get until doing the hard reset. Ill impliment the loop sometime either today or tonight =) If other people could test out the firmware that would be great and any donations helps =) Thanks again for ur generous donation TXSteve, I really appreciate it =)
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 12, 2015, 07:45:40 PM Last edit: August 12, 2015, 08:14:15 PM by GenTarkin |
|
Came across a behviour issue on my Titan, I just now noticed it issues a soft reset via waas and that returns success yet it still fails to RECONFIGURE successfully in bfgminer. So, a hard reset would be inevitable and really theres no way to differentiate at this point between a soft reset full success vs failure =/ May have to reimpliment hard reset no matter what.
EDIT: yeah what a bummer, its attemping multiple soft resets w/ no success yet waas doesnt fail. I dont know if there is a way around that at this point, may have to revert just hard resets, bummer.
EDIT: rethinking it out, I may have another way to detect die status as a fallback. Coding that in will be tricky, will have to wait till later =) Damn these things for failing so many different ways ROFL!
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 12, 2015, 08:41:48 PM |
|
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken. Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset. I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one? and if thats reached then it performs a hard reset. I dont know, what do you think? Or anyone else care to chime in? Im just kinda doing a ton of trial and error here LOL! How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity. In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =)
|
|
|
|
TXSteve
|
|
August 12, 2015, 09:02:23 PM |
|
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken. Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset. I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one? and if thats reached then it performs a hard reset. I dont know, what do you think? Or anyone else care to chime in? Im just kinda doing a ton of trial and error here LOL! How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity. In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =) how about this? An "auto on/auto off" button & a "manual reset" button. On problematic rigs we turn auto reset off, & use manual reset as needed. It still beats physically powering down the rig, and restarting it ...just a thought I only have one die with these flaky soft resets, so it may not be a huge overall problem
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 12, 2015, 09:13:57 PM |
|
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken. Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset. I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one? and if thats reached then it performs a hard reset. I dont know, what do you think? Or anyone else care to chime in? Im just kinda doing a ton of trial and error here LOL! How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity. In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =) how about this? An "auto on/auto off" button & a "manual reset" button. On problematic rigs we turn auto reset off, & use manual reset as needed. It still beats physically powering down the rig, and restarting it ...just a thought I only have one die with these flaky soft resets, so it may not be a huge overall problem Trying to keep this as automated as possible and least user invasive as possible =), its hard for me to code custom stuff for webgui since I dont have much experience w/ all the crazy shit they have going on involving it. LOL
|
|
|
|
TXSteve
|
|
August 13, 2015, 12:54:01 AM |
|
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken. Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset. I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one? and if thats reached then it performs a hard reset. I dont know, what do you think? Or anyone else care to chime in? Im just kinda doing a ton of trial and error here LOL! How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity. In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =) how about this? An "auto on/auto off" button & a "manual reset" button. On problematic rigs we turn auto reset off, & use manual reset as needed. It still beats physically powering down the rig, and restarting it ...just a thought I only have one die with these flaky soft resets, so it may not be a huge overall problem Trying to keep this as automated as possible and least user invasive as possible =), its hard for me to code custom stuff for webgui since I dont have much experience w/ all the crazy shit they have going on involving it. LOL this v.93 seems to be running pretty good, even the flaky soft reset die seems to have stabilized after several hard resets when bfgminer randomly shuts down I attribute that to hard resets
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 13, 2015, 01:01:48 AM |
|
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken. Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset. I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one? and if thats reached then it performs a hard reset. I dont know, what do you think? Or anyone else care to chime in? Im just kinda doing a ton of trial and error here LOL! How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity. In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =) how about this? An "auto on/auto off" button & a "manual reset" button. On problematic rigs we turn auto reset off, & use manual reset as needed. It still beats physically powering down the rig, and restarting it ...just a thought I only have one die with these flaky soft resets, so it may not be a huge overall problem Trying to keep this as automated as possible and least user invasive as possible =), its hard for me to code custom stuff for webgui since I dont have much experience w/ all the crazy shit they have going on involving it. LOL this v.93 seems to be running pretty good, even the flaky soft reset die seems to have stabilized after several hard resets when bfgminer randomly shuts down I attribute that to hard resets Yeahp, bfgminer cant be running in order for a proper dcdc power down / up ... dont know if it has to do w/ bus traffic or what, but seems the dcdc power down / up does something but bfgminer will continually ignore them.
|
|
|
|
TXSteve
|
|
August 13, 2015, 01:08:45 AM |
|
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken. Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset. I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one? and if thats reached then it performs a hard reset. I dont know, what do you think? Or anyone else care to chime in? Im just kinda doing a ton of trial and error here LOL! How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity. In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =) how about this? An "auto on/auto off" button & a "manual reset" button. On problematic rigs we turn auto reset off, & use manual reset as needed. It still beats physically powering down the rig, and restarting it ...just a thought I only have one die with these flaky soft resets, so it may not be a huge overall problem Trying to keep this as automated as possible and least user invasive as possible =), its hard for me to code custom stuff for webgui since I dont have much experience w/ all the crazy shit they have going on involving it. LOL this v.93 seems to be running pretty good, even the flaky soft reset die seems to have stabilized after several hard resets when bfgminer randomly shuts down I attribute that to hard resets Yeahp, bfgminer cant be running in order for a proper dcdc power down / up ... dont know if it has to do w/ bus traffic or what, but seems the dcdc power down / up does something but bfgminer will continually ignore them. If those loops are too big a pain the neck to implement I wouldn't worry about them
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
August 13, 2015, 03:34:23 AM |
|
TXSteve, so the way these things work is every loop of the monitoring script it scans how much current is going through the DCDC's and if under 5 amps is going through it basically incriments the value /var/run/dieXX once /var/run/dieXX reaches over the threshold variable it takes action to reset the die. So...basically if the ASICs dont have work due to a shitty pool configuration like when renting... that /var/run/dieXX will get incrimented, if the pool is shitty enough and it increments over threshold ... no matter what... one of the reset actions will be taken. Im not exactly sure whats a good way to recode this so it can differentiate between possible pool comm issues vs an actual die needing reset. I could impliment maybe 2 thresholds, one being a lower one for the pool issues then another maybe being double of the first one? and if thats reached then it performs a hard reset. I dont know, what do you think? Or anyone else care to chime in? Im just kinda doing a ton of trial and error here LOL! How this all was originally(the way KNC set it up) was on threshold and if that threshold was hit it just performed a soft reset for an eternity. In the short term, I was thinking, pool issues aside. Now since we have this other stupid mode of failure where the waas command succeeds but it really doesnt bring the die back to life .... I could do another loop which would attempt soft resets up to say 5x sleep after each attempt, run the dcdc function to see if current is over 5A and each incriment its not, of course increase /var/run/dieXX ... then inside that same loop if /var/run/dieXX goes over the threhold then start a hard reset. I think that would solve the issue I ran into just today =) how about this? An "auto on/auto off" button & a "manual reset" button. On problematic rigs we turn auto reset off, & use manual reset as needed. It still beats physically powering down the rig, and restarting it ...just a thought I only have one die with these flaky soft resets, so it may not be a huge overall problem Trying to keep this as automated as possible and least user invasive as possible =), its hard for me to code custom stuff for webgui since I dont have much experience w/ all the crazy shit they have going on involving it. LOL this v.93 seems to be running pretty good, even the flaky soft reset die seems to have stabilized after several hard resets when bfgminer randomly shuts down I attribute that to hard resets Yeahp, bfgminer cant be running in order for a proper dcdc power down / up ... dont know if it has to do w/ bus traffic or what, but seems the dcdc power down / up does something but bfgminer will continually ignore them. If those loops are too big a pain the neck to implement I wouldn't worry about them Well, I rewrote the soft / hard reset code =) Basically, once it detects a die in error via /var/run/dieXX It calls the reset die function, it first attempts a soft reset... if that fails right off the bat via waas -s failing then it performs a hard reset. (waas -s shouldnt fail because of pool comm errors) If waas -s succeeds then it calls bfgminer to perform its internal die reconfiguration update Then script waits 30 seconds and measures the current output of die in question If either of the DCDC's are below current treshold then it incriments error count It will loop through the soft die resets up to 10 times, if it fails 10x then it performs a hard reset. So, it gives the die roughly 5-6 mins to "work" ... meaning have current flowing through it greater than the threshold, via soft resets. If that fails it hard resets. What ya think? I updated github w/ the changes if you wanna test. I can say it seems the soft reset logic is working. I have yet to be able to see a hard reset take place, I have to actually wait till my unit acts up to confirm hard reset functionality LOL!
|
|
|
|
|