olymps2020 (OP)
Newbie
Offline
Activity: 2
Merit: 0
|
|
January 15, 2020, 11:32:57 AM Last edit: January 16, 2020, 01:20:51 AM by frodocooper |
|
The miner is just over a month old and is showing issues with Chain[0]. The kernel log shows the first error: temperature.c:697:get_temp_info: read temp sensor failed: chain = 0, sensor = 0, chip = 64, reg = 0
Then it shuts down that chip, and gets stuck in retrying to check temp. It worked normally a few weeks ago, and even worked with 2/3 chips yesterday with 2/3 hash rate. I have followed the five steps in the first post on this website but still nothing. Could anyone help ? 2020-01-15 11:31:28 thread.c:105:pic_heart_beat_thread: chain[0] heart beat fail 5 times. 2020-01-15 11:31:30 power_api.c:86:get_average_voltage: chain[0], voltage is: 0.000000 2020-01-15 11:31:32 power_api.c:86:get_average_voltage: chain[1], voltage is: 17.781328 2020-01-15 11:31:33 temperature.c:697:get_temp_info: read temp sensor failed: chain = 0, sensor = 2, chip = 168, reg = 1 2020-01-15 11:31:35 power_api.c:86:get_average_voltage: chain[2], voltage is: 18.001758 2020-01-15 11:31:35 power_api.c:97:get_average_voltage: aveage voltage is: 11.927695 2020-01-15 11:31:35 power_api.c:110:check_voltage: target_vol = 17.90, actural_vol = 11.93, more than 1.0v diff. 2020-01-15 11:31:36 power_api.c:124:check_voltage_multi: retry time: 6 2020-01-15 11:31:40 temperature.c:697:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 184, reg = 0 2020-01-15 11:31:48 power_api.c:86:get_average_voltage: chain[0], voltage is: 0.000000 2020-01-15 11:31:50 power_api.c:86:get_average_voltage: chain[1], voltage is: 17.787451 2020-01-15 11:31:53 temperature.c:697:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 184, reg = 1 2020-01-15 11:31:53 power_api.c:86:get_average_voltage: chain[2], voltage is: 18.014004 2020-01-15 11:31:53 power_api.c:97:get_average_voltage: aveage voltage is: 11.933818 2020-01-15 11:31:53 power_api.c:110:check_voltage: target_vol = 17.90, actural_vol = 11.93, more than 1.0v diff. 2020-01-15 11:31:54 power_api.c:124:check_voltage_multi: retry time: 7 2020-01-15 11:31:54 thread.c:642:check_temperature: over max temp, pcb temp 69 (max 80), chip temp 107(max 103) 2020-01-15 11:31:54 driver-btm-api.c:201:set_miner_status: ERROR_TEMP_TOO_HIGH 2020-01-15 11:31:54 driver-btm-api.c:142:stop_mining: stop mining: over max temp 2020-01-15 11:31:54 thread.c:824:cancel_temperature_monitor_thread: cancel thread 2020-01-15 11:31:54 thread.c:834:cancel_read_nonce_reg_thread: cancel thread 2020-01-15 11:31:54 driver-btm-api.c:128:killall_hashboard: ****power off hashboard**** 2020-01-15 11:31:56 thread.c:105:pic_heart_beat_thread: chain[0] heart beat fail 6 times. 2020-01-15 11:32:10 power_api.c:86:get_average_voltage: chain[0], voltage is: 0.000000 2020-01-15 11:32:12 power_api.c:86:get_average_voltage: chain[1], voltage is: 17.775205 2020-01-15 11:32:15 power_api.c:86:get_average_voltage: chain[2], voltage is: 18.007881 2020-01-15 11:32:15 power_api.c:97:get_average_voltage: aveage voltage is: 11.927695 2020-01-15 11:32:15 power_api.c:110:check_voltage: target_vol = 17.90, actural_vol = 11.93, more than 1.0v diff. 2020-01-15 11:32:16 power_api.c:124:check_voltage_multi: retry time: 8 2020-01-15 11:32:29 thread.c:105:pic_heart_beat_thread: chain[0] heart beat fail 7 times. 2020-01-15 11:32:30 power_api.c:86:get_average_voltage: chain[0], voltage is: 0.000000 2020-01-15 11:32:32 power_api.c:86:get_average_voltage: chain[1], voltage is: 17.799697 2020-01-15 11:32:35 power_api.c:86:get_average_voltage: chain[2], voltage is: 18.050742 2020-01-15 11:32:35 power_api.c:97:get_average_voltage: aveage voltage is: 11.950146 2020-01-15 11:32:35 power_api.c:110:check_voltage: target_vol = 17.90, actural_vol = 11.95, more than 1.0v diff. 2020-01-15 11:32:36 power_api.c:124:check_voltage_multi: retry time: 9 2020-01-15 11:32:48 power_api.c:86:get_average_voltage: chain[0], voltage is: 0.000000 2020-01-15 11:32:50 power_api.c:86:get_average_voltage: chain[1], voltage is: 17.799697 2020-01-15 11:32:53 power_api.c:86:get_average_voltage: chain[2], voltage is: 18.050742 2020-01-15 11:32:53 power_api.c:97:get_average_voltage: aveage voltage is: 11.950146 2020-01-15 11:32:53 power_api.c:110:check_voltage: target_vol = 17.90, actural_vol = 11.95, more than 1.0v diff. 2020-01-15 11:32:54 power_api.c:124:check_voltage_multi: retry time: 10 2020-01-15 11:32:56 thread.c:105:pic_heart_beat_thread: chain[0] heart beat fail 8 times. 2020-01-15 11:33:07 power_api.c:86:get_average_voltage: chain[0], voltage is: 0.000000 2020-01-15 11:33:08 power_api.c:86:get_average_voltage: chain[1], voltage is: 17.799697 2020-01-15 11:33:10 power_api.c:86:get_average_voltage: chain[2], voltage is: 18.050742 2020-01-15 11:33:10 power_api.c:97:get_average_voltage: aveage voltage is: 11.950146 2020-01-15 11:33:10 power_api.c:110:check_voltage: target_vol = 17.90, actural_vol = 11.95, more than 1.0v diff. 2020-01-15 11:33:11 power_api.c:124:check_voltage_multi: retry time: 11
UPDATE: I disassembled the miner and disconnected the power from Chain [2], now the miner works fine, including Chain[0] which had the problem before, does this mean that there is a power problem ?
|
|
|
|
philipma1957
Legendary
Offline
Activity: 4298
Merit: 8833
'The right to privacy matters'
|
|
January 15, 2020, 12:42:58 PM Last edit: January 16, 2020, 01:21:59 AM by frodocooper |
|
By disconnecting 1 of 3 boards you may have shifted the board counter over Hard to tell exactly what you did. If you could disconnect two boards and boot and do full code of log then do two boards again show full log then do two boards again show full log. each time you do this make sure board 0 on first try is connected board 1 on second try is connected board 2 on third they is connected. if you always get 1 board to work and if it is a different board each time you may not have a board issue. then try with 2 boards connected make board 0 disconnected try 2 boards connected make board 1 disconnect try 2 boards connected make board 2 disconnect if you always get 2 boards to work you may not have a board issue lastly try all three and if board zero does not work I suspect the psu which means order a psu https://shop.bitmain.com/product/detail?pid=0002019072316001724716dkNtX50679looks like it is sold out may need to wait til feb to get one. https://hmtech.co/ is in use they got me a psu for my m20s from whatsminer maybe they have a t17 psu but do the trouble shooting before you buy a psu. There is a cost analysis since you are getting 2 boards to work right now spending 200 for a psu many not pay off. If you are off 14 th that is about 2 dollars a day with free power if your power is a dollar it is a dollar a day so a 200 dollar psu will take 200 days to pay off. The ½ ing comes in 120 days. So using 2 boards may be your best choice.
|
|
|
|
mikeywith
Legendary
Offline
Activity: 2408
Merit: 6618
be constructive or S.T.F.U
|
|
January 15, 2020, 11:42:17 PM Last edit: January 16, 2020, 01:22:33 AM by frodocooper |
|
UPDATE:
I disassembled the miner and disconnected the power from Chain [2], now the miner works fine, including Chain[0] which had the problem before, does this mean that there is a power problem ?
This is a good sign that all 3 boards are still good, here are the three possible reasons. 1- Voltage: if you are not feeding the miner with 200-240v it could act weird and not be able to power on 3 hashboards. 2-PSU is dying: Due to unregulated voltage or simply bad luck, the PSU can't feed enough power for all boards 3-The data-cable Make sure you test all three of them by swapping them. also can you tell us what is the exact voltage you plugging the miner to? please notice these new gears take voltage very seriously, anything above 240v or below 200 vots is simply a free ride to RMA.
|
|
|
|
olymps2020 (OP)
Newbie
Offline
Activity: 2
Merit: 0
|
|
January 16, 2020, 02:52:06 AM Last edit: January 16, 2020, 02:53:18 AM by frodocooper |
|
... So using 2 boards may be your best choice.
I'm starting to think the same.. i guess the last attempt is to disassemble the PSU and look for any obvious damage. Thank you both!
|
|
|
|
mikeywith
Legendary
Offline
Activity: 2408
Merit: 6618
be constructive or S.T.F.U
|
|
January 16, 2020, 09:04:08 PM Last edit: January 17, 2020, 01:46:09 AM by frodocooper |
|
If you confirmed that all 3 hash boards work just fine, i think it would be best to buy a new PSU, for two main reasons. First one is this PSU is not good and it could damage the hash boards eventually, so there is a good amount of risk in keeping it. The second reason is economically based, T17 40th makes about 6.3$ a day (before electricity bill) so every board makes a good 60$ a month, T17 PSU costs 134$, say 30$ for shipping. you are looking at about 160$. If your power cost is 5 cents then PSU ROI will be about 5 months. if you decide to keep on mining with only 2 boards, at least sell the other one that's sitting there doing nothing. Just 2 two sats.
|
|
|
|
Scorpyy
Jr. Member
Offline
Activity: 43
Merit: 59
|
|
April 08, 2020, 11:14:12 PM Last edit: April 09, 2020, 12:24:06 AM by frodocooper |
|
Did you manage to find any fix to this issue? One of my T17+ machines is acting the same. I get the same info in log, 1-2 hours of mining one of boards stops hashing. After a reboot it works fine for few hours then issue repeats.
|
|
|
|
BitMaxz
Legendary
Online
Activity: 3430
Merit: 3168
Playbet.io - Crypto Casino and Sportsbook
|
|
April 09, 2020, 12:00:07 AM |
|
Did you manage to find any fix to this issue? One of my T17+ machines is acting the same. I get the same info in log, 1-2 hours of mining one of boards stops hashing. After a reboot it works fine for few hours then issue repeats.
Have you tried the other solution above? If you confirmed that the PSU is good then I would like to suggest moving the miner on the other outlets or plug the miner directly to the wall outlet then test it again. Because sometimes extensions can't handle high wattage and can't give enough power to the miner.
|
|
|
|
Scorpyy
Jr. Member
Offline
Activity: 43
Merit: 59
|
|
April 14, 2020, 02:29:22 PM Last edit: April 16, 2020, 03:30:18 AM by frodocooper Merited by frodocooper (3) |
|
I didn't because i don't think its neither hashboard or PSU problem. Here is why:
I have contacted Bitmain support. They have provided me with newest firmware: Antminer-T17 +-user-OM-202002281759-sig_5446.tar.gz - File System Version - Fri Feb 28 17:59:06 CST 2020
Previous system was File System Version Fri Dec 6 10:46:34 CST 2019
Outcome is that each time machine boots i get an: Hardware Version Socket connect failed: Connection refused ; And machine takes like 10-15 minutes to start hashing. However now it works well, all 3 hashboards are visible and 1 of them doesn't disappear anymore. I tried re-flashing board with sd card with zip file Bitmain sent me. However didn't manage to do it, both leds remain on for 30 mins, and nothing happens. Machine boots normally after SD card is removed. I also tried factory reset few times. I tried downgrading firmware to 2019 version but it won't let me go back.
So long story short, i fixed the problem with new firmware at the cost of machine taking 15 mins to boot which im fine with. Hope my experience helps.
|
|
|
|
philipma1957
Legendary
Offline
Activity: 4298
Merit: 8833
'The right to privacy matters'
|
|
April 14, 2020, 03:55:12 PM Last edit: April 16, 2020, 03:31:39 AM by frodocooper |
|
The cost of 15 minute boot time is pretty much what the t17 takes in the first place. socket connect fail is also normal
So you are looking good .
|
|
|
|
Scorpyy
Jr. Member
Offline
Activity: 43
Merit: 59
|
|
April 14, 2020, 04:33:11 PM Last edit: April 16, 2020, 03:32:07 AM by frodocooper |
|
In that 15 minute boot time i discounted the basic boot time. I have more T17 machines. And this one boots with 15 min delay by showing the socket connect fail error before it starts hashing.
But yeah im ok, as long as all 3 boards are visible and hashing. So main problem was fixed at the cost of causing a smaller problem to appear.
|
|
|
|
philipma1957
Legendary
Offline
Activity: 4298
Merit: 8833
'The right to privacy matters'
|
|
April 14, 2020, 08:53:10 PM Last edit: April 14, 2020, 09:10:39 PM by philipma1957 |
|
Of all the bitmain gear I purchased last year my t17+ has had the most issues.
I am glad you found that 17+ firmware I will give it a spin on my t17+ since it has been running with 2 boards for more then 3 months.
I will post back if it gives me the third board back
"Antminer T17+ Hostname antMiner Model GNU/Linux Hardware Version Socket connect failed: Connection refused Kernel Version Linux 4.6.0-xilinx-gff8137b-dirty #25 SMP PREEMPT Fri Nov 23 15:30:52 CST 2018 File System Version Wed Apr 8 11:27:07 CST 2020"
Just loaded it now see above.
and it booted in under five minutes dropped the board after it ran for 1 minute
booted again both times failed socket came up which is normal but this time the tuning is slower. 7 minutes vs 4.75 minutes and still tuning.
and 8 minutes in we are doing 3 boards will get back if it holds it is a score.
|
|
|
|
Scorpyy
Jr. Member
Offline
Activity: 43
Merit: 59
|
|
April 15, 2020, 02:01:07 AM |
|
Did you try to re-flash board with SD card?
|
|
|
|
philipma1957
Legendary
Offline
Activity: 4298
Merit: 8833
'The right to privacy matters'
|
|
April 15, 2020, 02:06:37 AM |
|
Did you try to re-flash board with SD card?
I am in Howell NJ gear is in Clifton NJ Corona-v lockdown means I should not drive to work on it. I can access it via team viewer that lets me reboot it. upgrade firmware. change pools. It is numbered I can call the guy in the warehouse and shut it off or power it up. But hands on card flashing, fan repairing, cleaning it out all have to wait until May something. Maybe right after the ½ ing.
|
|
|
|
mikeywith
Legendary
Offline
Activity: 2408
Merit: 6618
be constructive or S.T.F.U
|
|
April 15, 2020, 11:03:11 AM |
|
I think this is due to the fact that the tunning works better on this firmware version, it probably drops the frequency / increase the voltage of the board and thus it shows, every time one loses a board should really just give it a try with different firmware, preferably one that you can manually tune each board with, I had very good luck my old S9s by flashing this firmware. The success rate is not that high, and in most cases, the hash board never comes back to life, but you lose nothing trying a different firmware.
|
|
|
|
Scorpyy
Jr. Member
Offline
Activity: 43
Merit: 59
|
|
April 18, 2020, 05:20:38 AM Last edit: April 18, 2020, 05:34:11 AM by Scorpyy |
|
So long story short, i fixed the problem with new firmware at the cost of machine taking 15 mins to boot which im fine with. Hope my experience helps.
Update: Issue re-occurred after few days. I also managed to re-flash board ( i had to format sd cart to fat-32 ) and it worked. Then i loaded Antminer-T17+-user-OM-201911111409-sig_4637.tar.gz firmware. Machine runs fine for 2-3 days then issue repeats. Works fine again after reboot etc. If it gets worse i will try and swap PSU-s between 2 T17+ machines. Also did anyone try this method? https://bitcointalk.org/index.php?topic=5239515.0 By deleting some lines you can actually enter voltage control.
|
|
|
|
mikeywith
Legendary
Offline
Activity: 2408
Merit: 6618
be constructive or S.T.F.U
|
|
April 18, 2020, 10:04:30 PM |
|
NO, you can't get access to voltage control just buy altering the style="display:none", all you get is fan control ( that's on all miners ) and for some, you get to choose the frequency (Not for T17, S17, and their family), temperature threshold and etc. But not voltage control whatsoever. To play with the voltage you will need a firmware that allowed you to do so, the two known firmware out there for the T17 are AwoesmeMiner and Asic.to both are Vnish based and give you the ability to change both voltage and frequency, I suggest you start with lowering the frequency first, then go to voltage.
|
|
|
|
Scorpyy
Jr. Member
Offline
Activity: 43
Merit: 59
|
|
April 19, 2020, 04:01:33 PM |
|
AwoesmeMiner and Asic.to both are Vnish based and give you the ability to change both voltage and frequency, I
Do you know if switching firmware is reversible? I can try to see if issue will be perma-fixed with those firmware's
|
|
|
|
mikeywith
Legendary
Offline
Activity: 2408
Merit: 6618
be constructive or S.T.F.U
|
|
April 19, 2020, 04:50:13 PM |
|
Do you know if switching firmware is reversible? I can try to see if issue will be perma-fixed with those firmware's
of course, they are reversible so if you don't like the firmware you can go back to bitmain, although there is a good chance that you might prefer it to bitmain even if it doesn't fix the issue you have, you will be able to overclock the other two working boards to get something close to the initial hashrate with 3 working boards.
|
|
|
|
Scorpyy
Jr. Member
Offline
Activity: 43
Merit: 59
|
|
April 21, 2020, 04:17:06 AM Last edit: April 21, 2020, 11:36:00 PM by frodocooper |
|
Ok so i switched to asic.to firmware. Same thing occurred. I have noticed: [2020/04/21 04:11:03] ERROR: src/temp.c:218 chain[0] sen[2] - Lost, no updates for 10 sec [2020/04/21 04:11:03] ERROR: src/temp.c:218 chain[0] sen[3] - Lost, no updates for 10 sec [2020/04/21 04:11:03] WARN: chain[0] - 2 sensor(s) reported their temps! [2020/04/21 04:11:04] ERROR: src/temp.c:218 chain[0] sen[0] - Lost, no updates for 10 sec [2020/04/21 04:11:04] ERROR: src/temp.c:218 chain[0] sen[1] - Lost, no updates for 10 sec [2020/04/21 04:11:04] WARN: chain[0] - 0 sensor(s) reported their temps! [2020/04/21 04:11:04] ERROR: driver-btm-chain.c:950 chain[0] - Failed to read temp from all sensors! [2020/04/21 04:11:04] INFO: chain[0] - Shutting down the chain
Also the main issue olymps2020 posted had same errors regarding temperature sensors: temperature.c:697:get_temp_info: read temp sensor failed: chain = 0, sensor = 0, chip = 64, reg = 0
I am 90% sure this issue is related to temperature sensor. Is there a way we can try and disable the sensor check on that chain? Yes i am aware of a chance of miner catching fire if heatsink becomes loose.
|
|
|
|
mikeywith
Legendary
Offline
Activity: 2408
Merit: 6618
be constructive or S.T.F.U
|
|
April 21, 2020, 04:11:09 PM |
|
I am 90% sure this issue is related to temperature sensor. Is there a way we can try and disable the sensor check on that chain? Yes i am aware of a chance of miner catching fire if heatsink becomes loose.
You can post in asic.to thread or reach out to taserz if he does not reply let me know and I will contact him off-forum. have you actually tried to set a lower frequency on the bad chain?
|
|
|
|
|