Title: KnC die #0 disabled Post by: Bogart on November 01, 2013, 10:59:14 PM I thought this deserved its own thread.
I'm learning that a lot of the later shipments of KnC's miners have an issue where die #0 out of the 4 on each chip does not function. This can be seen by installing BertMod (http://forum.kncminer.com/forum/main-category/main-forum/6183-bertmod-0-2-unofficial-firmware-mod-feedback-thread) and looking at the output current of the DC-DC converters. Affected units will show a much lower current on #0 than on the other 3. I'm afraid I can't contribute a lot since I have an early Saturn that is not among the affected units. There is a much better summary of the issue over on the KnC forums: http://forum.kncminer.com/forum/main-category/hardware/13049-read-me-first-known-issues-slow-performance-dc-dc-problem-bad-core-map-etc I thought that a thread here would reach a wider audience, and may make for a more productive discussion. Title: Re: KnC die #0 disabled Post by: Bitcoinorama on November 01, 2013, 11:06:35 PM You can try this fix.
Note it's totally unofficial, but it's what appears to be working until we have some solid feedback on the test rigs over the next couple of days. Results may ramp up slow, and take upto 3 hours to see a solid performance, but it's brought several I have tested back to life; www.kncminer.com/userfiles/file/kncminer-0.98.1(beta).bin Title: Re: KnC die #0 disabled Post by: shmadz on November 02, 2013, 01:07:15 AM Does anyone know what exactly this firmware does?
It seems to get mixed results. I am only getting 460 on my Jupiter. I am tempted to try this firmware. *edit* So, I tried the firmware, nothing changed. Tried .96.1 .96 .95, enable cores, bunch of different s**t. Then I put the case back on to restrict airflow and re-flashed the .98.1 - the temps of the back 2 are right around 70, the front 2 are around 60. But the Jupiter is now reporting 534 and climbing. It will take an hour or so for the hashrate at the pool to normalize, but it is slowly rising as well... Thus far, it appears these things really do like it hot. Originally I hated the design of the cooling for this device, because I thought it was pretty crappy at cooling... Turns out crappy cooling appears to provide better hashrate, as many others have also reported... Title: Re: KnC die #0 disabled Post by: arlekyn13 on November 02, 2013, 07:40:52 AM It worked for my Jupiter the first time I tried it. It went from 414 avg on 0.98 to 545 avg and was still climbing, but I decided to run enablecores and after that reboot (and many others), it didn't work as well. Right now my Jupiter is going at 454 avg.
The 2 Saturns were very slowly going on 0.98.1 to about 200 avg before reboot (211-214 avg on 0.98), but they are now staying at 106 and 140, after few reboots. So it seems that you need a lucky reboot (?) to have the miner work better? Title: Re: KnC die #0 disabled Post by: fubly on November 02, 2013, 12:57:09 PM http://forum.kncminer.com/forum/main-category/main-forum/13767-firmware-beta-0-98-1-feedback-thread
where do you have the source code to look under the hood? With your mod, it´s possible to enable the DC / DC 0, is it an setting inside the setting, like an config file or what´s the change? Good Job ::) Title: Re: KnC die #0 disabled Post by: rograz on November 02, 2013, 01:16:11 PM Turns out crappy cooling appears to provide better hashrate, as many others have also reported... Do the KNC units vary their fan speeds? Could be a symptom of the board not getting enough airflow > chips gets hotter > fan speed increases > general airflow over board increases. Title: Re: KnC die #0 disabled Post by: shmadz on November 02, 2013, 01:41:19 PM Turns out crappy cooling appears to provide better hashrate, as many others have also reported... Do the KNC units vary their fan speeds? Could be a symptom of the board not getting enough airflow > chips gets hotter > fan speed increases > general airflow over board increases. The fans on the heatsinks have a 4 pin connector, so they definitely have the ability to vary the speed. I have no idea if they actually do or not. It seems as though the CPU coolers are complete overkill though. I tried it out with removing all the fans, and just put one external fan blowing at the system with the case open and the temps were 40 degrees or less, but the hashrate in that format was around 450-490. after more than an hour now, the rear 2 modules are staying steady at 70 degrees and the front ones are still at 60. (+/- 2 degrees) The hashrate reported by the device is now 558, and hashrate at the pool is reporting 549. I'm extremely happy right now ;D *edit* this is using the .98.1 firmware. I tested the exact same setup with the .98 firmware and I initially saw good results but within a few hours it was back down to 460 or so. it's only been a couple hours with the new firmware so the jury is still out on this one. Will have to wait and see if it can maintain these speeds, but it's looking solid so far. Also, as a side note. If the optimal temperature is actually 70, then I would like to push more voltage and clockspeed to this thing until you reach the point where you are struggling to keep it under say 75 or so... restricting the airflow to intentionally increase the temperature of computer hardware is really twisting my stomach in knots. Title: Re: KnC die #0 disabled Post by: dzindra on November 02, 2013, 04:57:05 PM Does anyone know what exactly this firmware does? It seems to get mixed results. I am only getting 460 on my Jupiter. I am tempted to try this firmware. tl;dr: this patch lowers some voltage on controller board from 1.95V to 1.45V and tries to restart failed dies in 20 sec intervals. Long story: Running diff on images Code: diff -rq kncminer-0.98/ramdisk kncminer-0.98.1\(beta\)/ramdisk 2> /dev/null Code: Files kncminer-0.98/ramdisk/etc/init.d/initc.sh and kncminer-0.98.1(beta)/ramdisk/etc/init.d/initc.sh differ We are actually interested in /etc/init.d/initc.sh and /sbin/monitordcdc as other files are just some new versions or changed timestamps. In initc.sh few lines were added at the end of file. First four lines sets DCDC1 voltage adjustment in controller board voltage controller to 1.450 V (value for 0.98 on my Mercury is 1.950 V). The rest sets GO flag in Slew rate register in order to apply voltage change. I am not sure what exactly is powered by this voltage. Code: v=56 Monitordcdc has more changes: Interval for checking VRMs that ouput zero current in monitordcdc was decreased from 15 minutes to 20 seconds (15 checks in 1minute vs 5 checks in 4secs). When VRM has more than 3 failures(=zero current output) in this 20 sec interval the die powered by this VRM is restarted (this was not present in 0.98). I am not sure why die 0 is restarted only when other dies have failed too (maybe die 0 is somehow connected to other dies?). Code: # restart die Dies 1-3 are also restarted in the beginning of the script. Code: # Give them a kick! Title: Re: KnC die #0 disabled Post by: jelin1984 on November 02, 2013, 06:15:30 PM If have good miner
Is good idea to install these firmware or not? Title: Re: KnC die #0 disabled Post by: DPoS on November 02, 2013, 10:19:07 PM I am not sure why die 0 is restarted only when other dies have failed too (maybe die 0 is somehow connected to other dies?). Code: # restart die seems that they had a lot of die0 DOA so I guess they are targetting that for quicker testing.. but yes I have a die3 out on a board for weeks. I am running 98.1 now on it but it didn't bring it back yet - will be interesting if KNC elaborates on their methods Title: Re: KnC die #0 disabled Post by: fubly on November 07, 2013, 07:20:24 PM Can anyone with an full speed machine send me the results of this:
i2cdump -y 1 0x24 i2cdump -y 2 0x20 i2cdump -y 2 0x21 i2cdump -y 2 0x22 i2cdump -y 2 0x23 i2cdump -y 2 0x24 i2cdump -y 2 0x25 i2cdump -y 3 0x20 i2cdump -y 3 0x21 i2cdump -y 3 0x22 i2cdump -y 3 0x23 i2cdump -y 3 0x24 i2cdump -y 3 0x25 i2cdump -y 4 0x20 i2cdump -y 4 0x21 i2cdump -y 4 0x22 i2cdump -y 4 0x23 i2cdump -y 4 0x24 i2cdump -y 4 0x25 i2cdump -y 5 0x20 i2cdump -y 5 0x21 i2cdump -y 5 0x22 i2cdump -y 5 0x23 i2cdump -y 5 0x24 i2cdump -y 5 0x25 i2cdump -y 6 0x20 i2cdump -y 6 0x21 i2cdump -y 6 0x22 i2cdump -y 6 0x23 i2cdump -y 6 0x24 i2cdump -y 6 0x25 i2cdump -y 7 0x20 i2cdump -y 7 0x21 i2cdump -y 7 0x22 i2cdump -y 7 0x23 i2cdump -y 7 0x24 i2cdump -y 7 0x25 i2cdump -y 8 0x20 i2cdump -y 8 0x21 i2cdump -y 8 0x22 i2cdump -y 8 0x23 i2cdump -y 8 0x24 i2cdump -y 8 0x25 Title: Re: KnC die #0 disabled Post by: Dalkore on November 07, 2013, 07:21:56 PM FYI: I applied 0.98-1 to a client's Jupiter unit that was at 320 GH and it fixed the issue and now it is well over 500 stable for almost a week.
Title: Re: KnC die #0 disabled Post by: dlasher on November 07, 2013, 11:22:59 PM Monitordcdc has more changes: Interval for checking VRMs that ouput zero current in monitordcdc was decreased from 15 minutes to 20 seconds (15 checks in 1minute vs 5 checks in 4secs). When VRM has more than 3 failures(=zero current output) in this 20 sec interval the die powered by this VRM is restarted (this was not present in 0.98). I am not sure why die 0 is restarted only when other dies have failed too (maybe die 0 is somehow connected to other dies?). I hacked up the monitordcdc script, tossing in a 'logger' line to write something to syslog each time it tries to restart a die.. (you have to start /etc/init.d/syslog.busybox as well, and then tail -f /var/log/messages) like this: Code: if [ "$failed1" = "1" ] ; then What I see looks like it's trying to restart individual dies, no matter which one it is. Quote Nov 7 23:10:49 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:12:35 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 7 23:12:35 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:14:23 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:16:09 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:17:57 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 7 23:17:57 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:19:43 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 0x20 = module 0 0x22 = module 1 0x24 = module 2 0x27 = module 3 (I think) 0, 1, 2, 3 = dies So that equates to 3rd module, 3rd core, then first module, 4th core, then repeatedly 3rd module, 3rd core. In my case, 0/4 can be restarted, and it does every few minutes when it stops, but 3/3 will never restart.. but the script tries repeatedly. Title: Re: KnC die #0 disabled Post by: fubly on November 08, 2013, 12:33:52 AM No thats not right,
SPM-Bus Protocol has redundant bus system, master - slave. Seen here: http://pmbus.org/docs/Using_The_PMBus_20051012.pdf have a look to page 142 http://i41.tinypic.com/15ekavc.jpg I need as quick as possible the above dumps, to verify something. Title: Re: KnC die #0 disabled Post by: fubly on November 08, 2013, 01:04:14 AM 0x20 = module 0 0x22 = module 1 0x24 = module 2 0x27 = module 3 (I think) 0, 1, 2, 3 = dies So that equates to 3rd module, 3rd core, then first module, 4th core, then repeatedly 3rd module, 3rd core. In my case, 0/4 can be restarted, and it does every few minutes when it stops, but 3/3 will never restart.. but the script tries repeatedly. 0 to 6 = the channel where are the a sic bord is attached 0x20 = Asic board on port / channel 1 0x21 = Asic board on port / channel 2 0x22 = Asic board on port / channel 3 0x23 = Asic board on port / channel 4 0x24 = Asic board on port / channel 5 0x25 = Asic board on port / channel 6 Title: Re: KnC die #0 disabled Post by: dlasher on November 08, 2013, 02:28:19 AM No thats not right, <snip> I need as quick as possible the above dumps, to verify something. Fubly: I ran this on a jupiter presently at 16/16. Wrote a quick script to grab what you wanted: Code: for a in 1 2 3 4 5 6 7 8; do Here's the output : http://pastebin.com/ZhzGZBuf Title: Re: KnC die #0 disabled Post by: dlasher on November 08, 2013, 02:32:58 AM I hacked up the monitordcdc script, tossing in a 'logger' line to write something to syslog each time it tries to restart a die.. (you have to start /etc/init.d/syslog.busybox as well, and then tail -f /var/log/messages) yikes.. I"m curious if other people are seeing monitorDCDC having to light cores back up as often as I am.. Quote Nov 7 23:28:38 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 7 23:28:39 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:30:25 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:32:12 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 7 23:32:12 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 0 Nov 7 23:32:12 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 1 Nov 7 23:32:12 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:32:12 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 3 Nov 7 23:33:59 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:35:46 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 7 23:35:46 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:37:33 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:39:19 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 7 23:39:20 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:41:07 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:42:54 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:44:41 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 7 23:44:41 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:46:28 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:48:15 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 7 23:48:15 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:50:02 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:51:48 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 7 23:51:49 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:53:36 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:55:22 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 7 23:55:23 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:57:10 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 7 23:58:57 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 7 23:58:57 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:00:43 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:02:31 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:02:31 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:04:18 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:06:05 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:06:05 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:07:52 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:09:38 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:09:38 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:11:26 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:13:12 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:13:12 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:15:00 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:16:46 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:16:47 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:18:33 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:20:20 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:20:21 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:22:07 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:23:54 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:23:55 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:25:41 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:27:29 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:27:29 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:29:16 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:31:02 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:31:03 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:32:50 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 0 Nov 8 00:32:50 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 1 Nov 8 00:32:50 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:32:50 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 3 Nov 8 00:34:36 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:34:36 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:36:24 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:38:10 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:39:57 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:39:57 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:41:44 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:43:31 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:43:31 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:45:19 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:47:05 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:47:05 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:48:52 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:50:39 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:50:39 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:52:26 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:54:13 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:54:13 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:56:00 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:57:47 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 00:57:48 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 00:59:34 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:01:21 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:01:21 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:03:09 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:04:55 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:04:55 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:06:43 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:08:29 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:08:30 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:10:16 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:12:03 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:12:03 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:13:50 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:15:37 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:15:38 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:17:24 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:19:10 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:19:11 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:20:58 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:22:44 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:22:45 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:24:32 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:26:18 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:26:18 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:28:05 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:29:52 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:29:52 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:31:39 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:33:26 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:33:26 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:35:13 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:37:00 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:37:00 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:38:46 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:40:33 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:40:33 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:42:20 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:44:07 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:44:07 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:45:54 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:47:41 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:47:41 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:49:28 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:51:15 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:51:15 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:53:01 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:54:49 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:54:49 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:56:35 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 01:58:21 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 01:58:22 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 02:00:09 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 02:01:56 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 02:01:56 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 02:03:43 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 02:05:30 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 02:05:30 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 02:07:18 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 02:09:04 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 02:09:05 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 02:10:51 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 02:12:38 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 02:12:38 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 02:14:25 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 02:16:12 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 02:16:12 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 02:17:59 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Nov 8 02:19:45 knc2 user.notice dcdc: i2cset -y 2 0x20 0xe5 3 Nov 8 02:19:45 knc2 user.notice dcdc: i2cset -y 2 0x24 0xe5 2 Title: Re: KnC die #0 disabled Post by: fubly on November 08, 2013, 06:05:21 AM No thats not right, <snip> I need as quick as possible the above dumps, to verify something. Fubly: I ran this on a jupiter presently at 16/16. Wrote a quick script to grab what you wanted: Code: for BUS in 1 2 3 4 5 6 7 8; do a = bus b = channel PMBus has redundant bus-system, which means here, that when you send data to: Bus 2 its enough For bus 1 it differs, here has the bus 1 only on channel 24 valid data Title: Re: KnC die #0 disabled Post by: dlasher on November 08, 2013, 09:35:17 AM a = bus b = channel PMBus has redundant bus-system, which means here, that when you send data to: Bus 2 its enough For bus 1 it differs, here has the bus 1 only on channel 24 valid data So was that what you needed? Title: Re: KnC die #0 disabled Post by: fubly on November 08, 2013, 12:32:23 PM Hi Bitcoinorama,
i hope that you have a good connection to KNC, can you send them my post? They can send me about .2 btc btc for my work: 12feS3BnvYkYAf3wsrrftmeNrw1B5HRSZ1 I figured out what it is. This procedure is repeatable, every one can check this! It is an temperature problem !!!!!! die 0 and die 1 are on the left side from the board and die 2 and die 3 are on the right side. CASE 1 1. updating from 0.98 to 0.98.1 2. when only die 0 is now working all is normal at this moment and when you wait awhile die 0 dies and the other ones start = this is normal! Why? 3. the monitordcdc script send commands to all 4 die´s in this moment goes the temp on them higher (doing something) volt power ampere working bit = heat 4. now die 3 or die 4 recognized that and switch on! Die 4 said to die 3 hey come up, die 3 said to die 2 hey come up but die 2 has dementia and for got to say die 0 hey come up. Or the last case die 0 said no i am the master only the controller can send commands to me, hmm, but when i was looking to the SPM bus documentation i can't find such things. Why? See the picture, an picture say more then 1 million words: Talking within 0 to 4 and not 4 to 0 or 3 to 2 or 2 to 1 ONLY 1 or 2 or 3 to the next http://i40.tinypic.com/rihqx4.jpg CASE 2 1. same like above 2. same like above NOW NOW NOW NOW NOW 3. I take an hairdryer an blow slowly warm air to the die 0 an 1 an the left side of the Asic Borad! 4. the temperature walks slowly higher, near the point to 69°C i stopped the hairdryer and some seconds later only die 1 is start working! NOW NOW WOW WOW WOW When you look here in the forum you will recognizing that many people talks about: HIGHER Temp = better performance! Exactly thats the problem, in SPM bus Docu you will find an command to switch on the die or off. Solution / Problem here: 1. ######################################## 10.8.2. Sending Too Few Bits PMBus (and SMBus) transactions are carried out one byte at time. If while a device is writing to a PMBus device the transmission is interrupted by a START or STOP condition before a complete byte has been sent, this is a data transmission fault. When a PMBus device detects this fault, it shall respond as follows: © 2007 System Management Interface Forum, Inc. Page 39 of 98 All Rights Reserved  PMBus Power System Mgt Protocol Specification – Part II – Revision 1.1 • Flush or ignore the received command code and any received data, • Set the CML bit in the STATUS_BYTE, • Set bit [1] (“Other” fault) bit in the STATUS_CML register (if supported), and • Notify the host as described in Section 10.2.2. READ on from here 2. ######################################## 10.8.7. Device Busy ME: Before sending commands we had to stop the device and send then the command 3. 11.2. STORE_DEFAULT_ALL The STORE_DEFAULT_ALL command instructs the PMBus device to copy the entire contents of the Operating Memory to the matching locations in the non-volatile Default © 2007 System Management Interface Forum, Inc. Page 43 of 98 All Rights Reserved Data Byte Value Meaning 1000 0000 Disable all writes except to the WRITE_PROTECT command 0100 0000 Disable all writes except to the WRITE_PROTECT, OPERATION and PAGE commands 0010 0000 Disable all writes except to the WRITE_PROTECT, OPERATION, PAGE, ON_OFF_CONFIG and VOUT_COMMAND commands 0000 0000 Enable writes to all commands. PMBus Power System Mgt Protocol Specification – Part II – Revision 1.1 Store memory. Any items in Operating Memory that do not have matching locations in the Default Store are ignored. It is permitted to use the STORE_DEFAULT_ALL command while the device is operating. However, the device may be unresponsive during the copy operation with unpredictable, undesirable or even catastrophic results. PMBus device users are urged to contact the PMBus device manufacturer about the consequences of using the STORE_DEFAULT command while the device is operating and providing output power. This command has no data bytes. This command is write only. 4. 11.3. RESTORE_DEFAULT_ALL ME: I think that can we figure out! The RESTORE_DEFAULT_ALL command instructs the PMBus device to copy the entire contents of the non-volatile Default Store memory to the matching locations in the Operating Memory. The values in the Operating Memory are overwritten by the value retrieved from the Default Store. Any items in Default Store that do not have matching locations in the Operating Memory are ignored. It is permitted to use the RESTORE_DEFAULT_ALL command while the device is operating. However, the device may be unresponsive during the copy operation with unpredictable, undesirable or even catastrophic results. PMBus device users are urged to contact the PMBus device manufacturer about the consequences of using the RESTORE_DEFAULT_ALL command while the device is operating and providing output power. This command has no data bytes. This command is write only. 5. All possible commands: Starting from page 73 http://i43.tinypic.com/11sjz1u.jpg 6. Potential Conflict http://i41.tinypic.com/2nriomp.jpg http://i43.tinypic.com/5aqatu.jpg Info´s here: http://pmbus.org/docs/PMBus_Revision_1-2_Presentation_20100228.pdf http://pmbus.org/docs/PMBus_Specification_Part_I_Rev_1-1_20070205.pdf http://pmbus.org/docs/PMBus_Specification_Part_II_Rev_1-1_20070205.pdf I am an absolut beginner, an have from SPM bus programming absolutely no know how, but one thing i know is that the CONTROLLING of the temps inside the die is the problem! Title: Re: KnC die #0 disabled Post by: fubly on November 08, 2013, 06:38:48 PM Here are the evidence:
http://i43.tinypic.com/2jb0cw9.jpg Hair dryer in action http://i42.tinypic.com/105e8pk.jpg http://i42.tinypic.com/258v12b.jpg http://i43.tinypic.com/29oln9j.jpg Come on KNC ORSOC Guys!!!!! Title: Re: KnC die #0 disabled Post by: edgar on November 11, 2013, 07:41:02 AM BUMP!!
11 days later!!! BUMP ffs! |