Bitcoin Forum
May 06, 2024, 07:04:42 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 [13] 14 15 16 17 18 19 20 21 22 »  All
  Print  
Author Topic: T17/S17 malfunction: cases, solutions, remedies, RMA history  (Read 6913 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
favebook
Sr. Member
****
Offline Offline

Activity: 604
Merit: 416


View Profile
November 13, 2020, 04:43:30 PM
 #241

Hi guys
Which hashboard is hashboard 1 (chain 0)?
Closes to the PSU for example.
Maybe @mikeywith can help me?  Grin

Physical location does not matter. Check data connection cables.

IIRC order is:
0  3
1  2

They are marked on control board, you will have no trouble finding this out by yourself. If you have troubles, then send a picture of control board, and I will circle it for you.
1714979082
Hero Member
*
Offline Offline

Posts: 1714979082

View Profile Personal Message (Offline)

Ignore
1714979082
Reply with quote  #2

1714979082
Report to moderator
TalkImg was created especially for hosting images on bitcointalk.org: try it next time you want to post an image
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714979082
Hero Member
*
Offline Offline

Posts: 1714979082

View Profile Personal Message (Offline)

Ignore
1714979082
Reply with quote  #2

1714979082
Report to moderator
1714979082
Hero Member
*
Offline Offline

Posts: 1714979082

View Profile Personal Message (Offline)

Ignore
1714979082
Reply with quote  #2

1714979082
Report to moderator
Breeze
Newbie
*
Offline Offline

Activity: 86
Merit: 0


View Profile
November 13, 2020, 04:52:19 PM
Last edit: November 16, 2020, 02:27:01 AM by frodocooper
 #242

Now when i checked carefully with a flashlight, i saw J1, J2 and J3 on the control board. That should be it? Thanks.
favebook
Sr. Member
****
Offline Offline

Activity: 604
Merit: 416


View Profile
November 13, 2020, 05:12:04 PM
Last edit: November 16, 2020, 02:27:23 AM by frodocooper
 #243

Depends on what unit we are talking about. J1, J2 and J3 were used on old S9 model as far as I know. I didn't see them on any of S/T 17 miners.

Can you provide more info about your unit and your problem?
mikeywith
Legendary
*
Offline Offline

Activity: 2226
Merit: 6367


be constructive or S.T.F.U


View Profile
November 13, 2020, 05:46:39 PM
Last edit: November 16, 2020, 02:27:42 AM by frodocooper
 #244

Usually, the label on the control board will be represented in the kernel log, in case that is not the case, then the ordering is ascending, so your chain 0 will be J1, the labels are quite visible on the control board, you just need to follow the ribbon cable to see which hash board goes to which socket.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
Breeze
Newbie
*
Offline Offline

Activity: 86
Merit: 0


View Profile
November 13, 2020, 06:02:43 PM
Last edit: November 16, 2020, 02:29:00 AM by frodocooper
 #245

Depends on what unit we are talking about. J1, J2 and J3 were used on old S9 model as far as I know. I didn't see them on any of S/T 17 miners.

Can you provide more info about your unit and your problem?

S17 Pro. It's very hard to see on S17 control board. You can see J1, then you have to look with a flashlight between the tight cable connection to see J2 and J3. Chain 0 dropped out. Tried all the tricks on this thread to get it working. So decided to open it up and take a look on my heatsinks.

Usually, the label on the control board will be represented in the kernel log, in case that is not the case, then the ordering is ascending, so your chain 0 will be J1, the labels are quite visible on the control board, you just need to follow the ribbon cable to see which hash board goes to which socket.

Thanks. Wanted it to clarify it.
CryptoLLC
Newbie
*
Offline Offline

Activity: 25
Merit: 11


View Profile
November 22, 2020, 09:41:05 PM
Last edit: November 22, 2020, 11:12:34 PM by frodocooper
 #246

Turns out my host has about 25 dead S17 pros, not sure what % of his total that is. We've struck a deal so I'm going to attempt to fix his dead miners, I'll update the thread on what I find.

Did you figure out a way to find the bad chip? I have about 20 boards from S17+ 73th and all the boards have error temp sensor and then they show 0 asic found. I dont know how to find the bad chip? Which chip do I need to check the solder on?
wndsnb
Hero Member
*****
Offline Offline

Activity: 544
Merit: 589


View Profile
November 22, 2020, 10:16:16 PM
 #247

This is a good place to start: https://www.zeusbtc.com/NewsDetails.asp?ID=182

There is a download link on the page for the full repair manual, but it is in Chinese. You can load it into google translate to get a somewhat understandable translation.

If you want to try to fix them yourself, then you'll need a test fixture, a multimeter,  adjustable heat gun, soldiering iron, ...etc. An oscilloscope is also helpful. I wouldn't recommend it unless you already have a background in electronics and have some experience doing surface mount rework.

Have some dead Bitmain 17 series hashboards or full miners?
I'll buy them ... send me a PM with what you have and I'll make you an offer!
CryptoLLC
Newbie
*
Offline Offline

Activity: 25
Merit: 11


View Profile
November 22, 2020, 10:29:46 PM
 #248

I have this error for all my bad boards. I have 20+ bad boards. S17+ 73th.
read temp sensor failed
After some time the board shows 0 asic found.
After doing a lot of research the problems seems to be bad contact between the heat sink and the chip.
So my question is how do I find that bad contact? I tryed tapping on the heatsinks lightly to see if any heat sinks will come off. I tryed banging the board lightly to get some heat sinks to come off. But all the heat sinks are still on. I have all the tools to fix the problem. Solder, heat gun, tin, watt meter, oscilloscope, psu, extra chips and so on. I just dont know which chip is at fault. What are the ways to find that bad connection? Thank you.
mikeywith
Legendary
*
Offline Offline

Activity: 2226
Merit: 6367


be constructive or S.T.F.U


View Profile
November 22, 2020, 11:17:49 PM
 #249

I have all the tools to fix the problem. Solder, heat gun, tin, watt meter, oscilloscope, psu, extra chips and so on. I just dont know which chip is at fault. What are the ways to find that bad connection? Thank you.

You will need a fixture tool like this one , what does tool does is tell you where was the signal interrupted, and then to double-check you could measure the voltage and/or resistance of that chip, keep in mind that in the event of all 3 hash boards throwing that temp-sensor error then the problem is most likely a bad PSU.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
CryptoLLC
Newbie
*
Offline Offline

Activity: 25
Merit: 11


View Profile
November 22, 2020, 11:50:43 PM
 #250

I have that test fixture. It tells me that all temp sensors are bad. Also Im testing one board at a time. Im using a brand new psu for testing.  How do I test the voltage and resistance of the chip? Also where do I start at? Chip 1 or somewhere else. I have no idea what chip to start at since none of the heat sinks came loose. Is there a quick way to test each chip? Im 100% the problem has to be that a heat sink is not fully on. All I need is to find it and I have all the tools to re seat it back on.
wndsnb
Hero Member
*****
Offline Offline

Activity: 544
Merit: 589


View Profile
November 23, 2020, 12:24:23 AM
Last edit: November 23, 2020, 12:50:32 AM by frodocooper
 #251

Can you post the log from the test fixture run showing the bad temp sensors?

If the issue is only a poorly connected heat sink, then it would only fail when the chip overheats. Going to be hard to find that.

You might try feeling the temperature of the individual heat sinks right after it fails to see if a heatsink is warmer or cooler than the rest.

Have some dead Bitmain 17 series hashboards or full miners?
I'll buy them ... send me a PM with what you have and I'll make you an offer!
CryptoLLC
Newbie
*
Offline Offline

Activity: 25
Merit: 11


View Profile
November 23, 2020, 12:30:49 AM
 #252

Here here is the log from the test fixture.
http://servervideos.hopto.org/error.jpg
mikeywith
Legendary
*
Offline Offline

Activity: 2226
Merit: 6367


be constructive or S.T.F.U


View Profile
November 23, 2020, 05:20:14 PM
 #253

Here here is the log from the test fixture.
http://servervideos.hopto.org/error.jpg

I am not familiar with the fixture tool but looks like it is telling you that chip no 50 is bad, no?

Anyway you should contact zeusbtc and ask for the voltage reference range, each domain (group of chips) has its own normal voltage values, you just need a volt-meter and the reference table.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
CryptoLLC
Newbie
*
Offline Offline

Activity: 25
Merit: 11


View Profile
November 23, 2020, 08:46:50 PM
 #254

Another question I have is on some of my other boards when I take off a heat sink there seems to be very little solder on the heat sink. Looks like at manufacturing they did not apply enough solder. I want to add more solder to the heat sink and put it back on the chip. What do I use for this? Also how do I do this? I have low temp solder paste but is there a procedure to add more solder to the heat sink? Maybe some flux or some other chemical? I tried adding flux and then put some low temp solder paste but it does not stick to the heat sink edges where the solder is missing. The solder paste just melts off and connected to the solder that is already there. It does not stick to the heat sink edges where the solder is missing. How can I make it stick? Thank you.
mikeywith
Legendary
*
Offline Offline

Activity: 2226
Merit: 6367


be constructive or S.T.F.U


View Profile
November 23, 2020, 09:39:57 PM
Last edit: November 24, 2020, 11:14:29 PM by frodocooper
 #255

You can use something like https://www.amazon.com/Arctic-Silver-Premium-Adhesive-ASTA-7G/dp/B0087X7262, or use the black glue from the same website you got the tool, adding more solder to the existing isn't a good idea, you should clean the chip's surface and then start fresh, ensure that the amount of solder paste is equally even across the whole chip.

Watch this video https://youtu.be/5WH7g61d90w, it's helpful.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
CryptoLLC
Newbie
*
Offline Offline

Activity: 25
Merit: 11


View Profile
November 23, 2020, 09:49:50 PM
 #256

So you put the thermal solder on the chip? Do I need to put anything on the heat sink such as flux or something else?
mikeywith
Legendary
*
Offline Offline

Activity: 2226
Merit: 6367


be constructive or S.T.F.U


View Profile
November 23, 2020, 11:01:35 PM
Last edit: November 24, 2020, 11:14:58 PM by frodocooper
 #257

I am not really an expert in this field, but why would you use flux? it's not like you are soldering the chip on the hash board, maybe you could use flux to clean the heatsink before gluing it on the chip but I don't think you need to put anything else besides the thermal adhesive, by the way, here is a slightly different way of doing it > https://www.youtube.com/watch?v=378FPjkHQJc.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
CryptoLLC
Newbie
*
Offline Offline

Activity: 25
Merit: 11


View Profile
November 24, 2020, 04:44:23 AM
Last edit: November 24, 2020, 11:15:29 PM by frodocooper
 #258

I wouldnt want to use any permeneatny adhesive. Im looking to apply solder. What solder do you know I can use for this that can be taken off and on.

Also I have a whole bunch of boards that have the same exact errors when I run the test fixture. All the asic are found but I still get a temp sensor error.

Code:
1970-01-01 00:00:52 main.c:45:main: Ready for test
1970-01-01 00:00:59 single_board_test.c:2336:get_eeprom_info: get EEPROM info success!
1970-01-01 00:00:59 single_board_test.c:2585:single_board_test: g_test_level 7, pattern_test_time 1
1970-01-01 00:00:59 single_board_test.c:2375:do_single_board_test: Begin test
1970-01-01 00:00:59 fan.c:276:front_fan_power_on: Note: front fan is power on!
1970-01-01 00:00:59 fan.c:288:rear_fan_power_on: Note: rear fan is power on!
1970-01-01 00:00:59 driver-btm-api.c:1165:miner_device_init: Detect 256MB control board of XILINX
1970-01-01 00:00:59 driver-btm-api.c:1106:init_fan_parameter: fan_eft : 0  fan_pwm : 0
1970-01-01 00:01:05 driver-btm-api.c:1090:init_miner_version: miner ID : 805445801c20881c
1970-01-01 00:01:05 driver-btm-api.c:1096:init_miner_version: FPGA Version = 0xB031
1970-01-01 00:01:06 board.c:36:jump_and_app_check_restore_pic: chain[0] PIC jump to app
1970-01-01 00:01:08 board.c:40:jump_and_app_check_restore_pic: Check chain[0] PIC fw version=0x88
1970-01-01 00:01:08 thread.c:807:create_pic_heart_beat_thread: create thread
1970-01-01 00:01:12 power_api.c:228:set_higher_voltage_raw: higher_voltage_raw = 2100
1970-01-01 00:01:12 power_api.c:278:set_to_higher_voltage: Set to voltage raw 2100, one step.
1970-01-01 00:01:14 power_api.c:85:check_voltage_multi: retry time: 0
1970-01-01 00:01:15 power_api.c:40:_get_avg_voltage: chain = 0, voltage = 20.926828
1970-01-01 00:01:15 power_api.c:53:_get_avg_voltage: average_voltage = 20.926828
1970-01-01 00:01:15 power_api.c:71:check_voltage: target_vol = 21.00, actural_vol = 20.93, check voltage passed.
1970-01-01 00:01:15 uart.c:71:set_baud: set fpga_baud to 115200
1970-01-01 00:01:15 driver-hash-chip.c:245:dhash_chip_set_baud_v2: chain[0]: chip baud = 115200, chip_divider = 26
1970-01-01 00:01:26 driver-btm-api.c:1030:check_asic_number_with_power_on: Chain[0]: find 65 asic, times 0
1970-01-01 00:01:29 driver-hash-chip.c:266:set_uart_relay: set uart relay to 0x330003
1970-01-01 00:01:29 driver-btm-api.c:363:set_order_clock: chain[0]: set order clock, stragegy 3
1970-01-01 00:01:29 driver-hash-chip.c:502:set_clock_delay_control: core_data = 0x34
1970-01-01 00:01:29 driver-hash-chip.c:502:set_clock_delay_control: core_data = 0x34
1970-01-01 00:01:29 driver-hash-chip.c:517:set_clock_delay_control: singe chain mode
1970-01-01 00:01:30 temperature.c:320:calibrate_temp_sensor_one_chain: chain 0 temp sensor NCT218
1970-01-01 00:01:31 temperature.c:488:temp_statistics_show:   pcb temp 17~20  chip temp 18~20
1970-01-01 00:01:31 uart.c:71:set_baud: set fpga_baud to 12000000
1970-01-01 00:01:31 driver-hash-chip.c:245:dhash_chip_set_baud_v2: chain[0]: chip baud = 12000000, chip_divider = 3
1970-01-01 00:01:31 temperature.c:488:temp_statistics_show:   pcb temp 18~19  chip temp 19~21
1970-01-01 00:01:31 power_api.c:222:set_working_voltage_raw: working_voltage_raw = 1950
1970-01-01 00:01:31 frequency.c:808:inc_freq_with_fixed_vco: chain = 255, freq = 625, is_higher_voltage = true
1970-01-01 00:01:42 power_api.c:348:set_to_voltage_by_steps: Set to voltage raw 2070, step by step.
1970-01-01 00:01:44 power_api.c:85:check_voltage_multi: retry time: 0
1970-01-01 00:01:45 power_api.c:40:_get_avg_voltage: chain = 0, voltage = 20.217949
1970-01-01 00:01:45 power_api.c:53:_get_avg_voltage: average_voltage = 20.217949
1970-01-01 00:01:45 power_api.c:71:check_voltage: target_vol = 20.70, actural_vol = 20.22, check voltage passed.
1970-01-01 00:01:45 driver-btm-api.c:666:set_timeout: freq = 625, percent = 10, hcn = 4915, timeout = 7
1970-01-01 00:01:45 power_api.c:306:set_to_working_voltage_by_steps: Set to voltage raw 1950, step by step.
1970-01-01 00:01:50 power_api.c:85:check_voltage_multi: retry time: 0
1970-01-01 00:01:52 power_api.c:40:_get_avg_voltage: chain = 0, voltage = 19.097436
1970-01-01 00:01:52 power_api.c:53:_get_avg_voltage: average_voltage = 19.097436
1970-01-01 00:01:52 power_api.c:71:check_voltage: target_vol = 19.50, actural_vol = 19.10, check voltage passed.
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 0, chip = 14, reg = 0
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:52 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 14, reg = 28
1970-01-01 00:01:53 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 0, chip = 14, reg = 1
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 1, chip = 10, reg = 0
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:53 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 10, reg = 28
1970-01-01 00:01:54 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 1, chip = 10, reg = 1
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 2, chip = 54, reg = 0
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:54 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 54, reg = 28
1970-01-01 00:01:55 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 2, chip = 54, reg = 1
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 50, reg = 0
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:55 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 register.c:185:read_asic_reg_with_addr: read asic reg timeout: expect chain = 0, chip = 50, reg = 28
1970-01-01 00:01:56 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 50, reg = 1
1970-01-01 00:01:56 single_board_test.c:1659:wait_warm_up: temper sensor bad

I have this same exact error for over 10 boards. Also the the error states chip 50, 54, 10, 14 as being bad chips but I get this same exact error on many boards. Cant all have the same exact chips with the same exact error. What could this be? My boards are in 100% perfect condition and all the sudden stopped working one day. Showed temp sensor errors and then stopped working completely. Running this test fixture is showing temo sensor is bad but that cant be either. Something is wrong and I dont know how to find the problem. Please advise. Thank you.
mikeywith
Legendary
*
Offline Offline

Activity: 2226
Merit: 6367


be constructive or S.T.F.U


View Profile
November 24, 2020, 10:06:31 AM
Last edit: November 24, 2020, 11:16:39 PM by frodocooper
 #259

Nothing is permanent, that arctic adhesive won't hold against a hammer or a heat gun direct to that heatsink, the same thing applies to any other solder you might use.

What matters in your kernel log is only these 4 lines:

Code:
1970-01-01 00:01:52 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 0, chip = 14, reg = 0

1970-01-01 00:01:54 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 1, chip = 10, reg = 1

1970-01-01 00:01:54 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 2, chip = 54, reg = 0

1970-01-01 00:01:55 temperature.c:744:get_temp_info: read temp sensor failed: chain = 0, sensor = 3, chip = 50, reg = 0

The kernel log can be a bit confusing, it isn't saying that those 4 chips are bad, it's only trying to tell you that the temp sensor next to those 4 chips is bad, each board has temp sensors located near the chips mentioned 10,14,50 and 54 something like this:



But this isn't even accurate either, because it's unlikely that 4 temp sensors would die, and the real actual cause must be one of two.

1- If all temp sensors across 3 hash boards (total of 12 temp sensors) show "failed" then the problem is the PSU
2- If one hash boards temp sensors show "failed" then one or more heatsink/chip isn't in 100% contact and needs replacement, and more often than not the first chip (chip 0) is the bad one

notice that, the PSU theory still stands even if 1 hash board is having a hard time reading the temp sensor, it's hard to explain but take it as is.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
CryptoLLC
Newbie
*
Offline Offline

Activity: 25
Merit: 11


View Profile
November 24, 2020, 08:23:56 PM
Merited by frodocooper (3), mikeywith (1)
 #260

I spent all night playing with the boards. So here is a some what of a fix. The fixture is not accurate at all. I put these same boards into my miner and put asic.to or bitmain firmware on it and ran it as is. I worked!!! It looks like the test fixture is showing that temp sensors are bad but that's not true. They are good. I would double check the chips to have their volts in order. Such as clock and RO, also check the rest as well but clock and ro are good way to find a bad chip. Anyway I found some chips that have solder balls next to it. Remove those balls. Once the chips all have good volts run asic.to firmware and it should work. The test fixture is lying. lol. Dont believe it. I fixed 6 boards last night once I stopped believing the test fixture. They are sill working as of this morning. I will update you when I fix more boards.
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 [13] 14 15 16 17 18 19 20 21 22 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!