Bitcoin Forum
November 02, 2024, 12:57:54 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Hacking Antminer S17's and T17's because.... are we up to these already????  (Read 514 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3178
Merit: 2260


I fix broken miners. And make holes in teeth :-)


View Profile
June 23, 2021, 05:40:58 PM
Merited by philipma1957 (4), vapourminer (3), mikeywith (1), n0nce (1)
 #1

Hi all!

Well, people are sending in T17's and other mining gear and I thought I'd start another thread for them. Oddly enough I even get occasional messages to help with KNC Neptunes, but that's another thread.

I'll cover some of the issues I encounter with these things here, along with my own tips and thoughts on fixing them, keeping them alive, the occasional rant about quality control, and so forth. With the neverending chip shortages in the world and the odd fact that even S9's and DragonMint T1's are still profitable I suspect these will be around for awhile.

And as always I will try to make information I find freely accessible. I do charge for actually fixing stuff but that is more for my skills than any info I have in my head. The more people who can fix these miners, the better is my thoughts. So feel free to contribute your own thoughts on what works and how.

Lightfoot
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3178
Merit: 2260


I fix broken miners. And make holes in teeth :-)


View Profile
June 23, 2021, 05:41:12 PM
 #2

Reserved for summaries
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3178
Merit: 2260


I fix broken miners. And make holes in teeth :-)


View Profile
June 23, 2021, 05:49:11 PM
Merited by mikeywith (1)
 #3

Ok, so a first thought: When shipping these boards I have a simple recommendation:

Don't.

Seriously: The design of the S17/19 series boards is different from the S7 and S9 of yore. The biggest reason are the new heat sinks: I see that Ant realized you need the biggest sinks furthest away from the intake fans, but the problem is the sinks are long and only supported in the middle by the chip. Thus if you press on one of them (or if your box gets deformed by the local postage dinosaur) the moment of the force applied will torque on the chips and quite probably break them off the board.

Another problem is adhesion: On old boards the top heat sinks were glued on with epoxy. This was a mixed bag for although the epoxy could be too thick (poor thermal transfer) or too thin (damn thing falls off and shorts the board next door) they would typically break the epoxy before breaking the chip off the board. However on these units the top of the chip is metal and the heat sink is soldered on. Nice idea, that ensures an excellent thermal connection to the chip and if the sink falls off it indicates a serious temp error. But at the same time you are just as likely to rip the pads off the board and fixing those sucks a *lot*.

Best way to ship is to put the board in a crush proof box, maybe wood or the like. Even with soft/firm packing material I think it's too easy to knock chips off. Best thing to ship them in would be a S17/s19 case. Yes, heavier but might be worth it.

Next thought: Power.
mikeywith
Legendary
*
Offline Offline

Activity: 2408
Merit: 6588


be constructive or S.T.F.U


View Profile
June 23, 2021, 09:45:34 PM
 #4

Let me chime in with another tip, do not attempt to clean these gears with a strong air compressor especially when they are hot, the other day I wanted to give the gears a good dusting, took one perfectly working S17 pro, took out the hashboards, put them on a clean table and I blew the life out of them, heatsinks starting flying like mosquitoes, I know since they fell that easily it was only a matter of time before they start falling off, but I could use a month or two of gears running stable.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3178
Merit: 2260


I fix broken miners. And make holes in teeth :-)


View Profile
June 23, 2021, 10:37:39 PM
 #5

Yeah, the solder can get a bit soft, and I have run across boards where the problem wasn't a bad chip so much as either extruded solder balls or the like.

Did the sinks go or sinks plus chips?
mikeywith
Legendary
*
Offline Offline

Activity: 2408
Merit: 6588


be constructive or S.T.F.U


View Profile
June 24, 2021, 12:26:03 AM
 #6

Did the sinks go or sinks plus chips?

For this particular gear, I can see the chips are still in place, was only the heatsinks that went, of course, I am not 100% sure if chips are not damaged since the gear isn't running now, but the kernel logs passed the chip count for all three hash boards, and before starting to mine it shows over-temp error in the kernel log and shuts down, which most likely indicates that all chips are fine, they just (obviously) can't run when their heatsinks are in a plastic bag.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3178
Merit: 2260


I fix broken miners. And make holes in teeth :-)


View Profile
June 24, 2021, 02:19:25 AM
 #7

Ok. I just took a few sinks off here with air heat (360c) and no pre-heat. They come off pretty easily, and the solder layer is fairly thin so you should be able to put them back on with hot air at around 300c and a bit of flux.
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3178
Merit: 2260


I fix broken miners. And make holes in teeth :-)


View Profile
June 26, 2021, 03:58:04 PM
 #8

Well, that wasn't too complex, this T17+ is back up and running at about 45.Th. So I thought I'd post a summary of what went wrong and how I fixed it.

Scenario: Board 1 and 2 would come up, then drop to zero, then come up, then down over and over. Board 3 was hard dead with 33 dies reporting.

Diagnosis: As always, try to isolate the problem. So I took out all three boards, labeled them (with the client order number and A/B/C), checked them over (no burns or loose heatsinks) and cleared any dust off them.

Then I put them in one at a time:
  • Board 1: This one hashed fine at about 14th using autotuner. Didn't drop out, worked well for 4 hours. Ok, it's probably fine.
  • Board2: This one was odd: It was starting, running for a minute, then stopping. Logs showed the auto tuner resetting the voltage every 2 minutes. Disabled autotune, works fine at 500mhz, but not at 600mhz. Fair enough, I can disable autotune.
  • Board 3: Dead as doornail. See below.

The next step was to run them two at a time: Put boards 1 and 2 into the unit, set the auto tune to off and 600mhz for board 1, 550mhz for board 2. Ran fine with about 32th for 12 hours. Ok, we have two boards running rather happily.

Now it's time for board 3. This one was reporting 33 dies, and since the T17 models are reporting via the chip serial bus we probably had a failed die 34. Fair enough, that happens normally people replace the chip. However sometimes it's not that and I'll post the results in the next write up on how to fix these chips.

(Hint: It's a lot better than the crummy S9 boards which is a serious improvement....)
philipma1957
Legendary
*
Online Online

Activity: 4298
Merit: 8768


'The right to privacy matters'


View Profile WWW
June 26, 2021, 11:50:59 PM
Last edit: June 27, 2021, 12:11:00 AM by philipma1957
 #9

Nice work  Grin

I want to talk about cleaning s17 .

Proper way involves 25-27 screws removed fully disassembled make sure you are anti static.

Depending on your room they will be dusty.

They will have a lot of dust under the controller case enough to micro short the psu.

The psu can take strong air blower to clean it since the small fans in the psu can spin really fast.


I have found my s17s get large dust right at the intake heat sinks.

I found a fast cleaning method. four screws on the intake fans expose the heat sinks on the boards.

the very heat sinks that get the air flow blocking dust bunnies.

I have a high quality shop vac from fein. it has a 1 ⅜ inch hose I use an adapter that allows a 1 ¼ inch. horse hair brush the brush and the adapter are 7 bucks on ebay.

Now these are aways the coolest heat sinks since the into fans are blowing cool air on them.
now these are almost always the dustiest heat sinks.
so four screws and assess to clean them
and if you want four more screws to blow clean the contoller.

the time saved is huge.


you only need clean the coolest heat sinks.

you dont have to do 25-27 screws and a full disassembly .

And with just the intake plate and of course the attached fans are removed to vacuum the lead heat sinks.

they are cool and the rest is all protected.

it is pretty easy to inspect the unit with that plate removed. simple use a flashlight and shine inside the unit.

you will be able to see any deep dust or bugs.

I find 90 to 95 of the dust and 97% of blocked air flow in right at the intake heat sinks.

I turned a two job at the farm into Four or five hours work.

I was inspired by lightfoot and this thread.
I will link the horse hair brush and adapter.

https://www.ebay.com/itm/384228713329?hash=item5975d0e371:g:rjUAAOSw4RRgzAs7

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3178
Merit: 2260


I fix broken miners. And make holes in teeth :-)


View Profile
June 28, 2021, 07:33:42 PM
 #10

Another note before I write up how to reflow these chips: Braains really, really wants to auto-tune your chips.

This is nice, but the auto-tuner doesn't always work right and winds up shutting down boards for long periods of time while it tries every combination of voltage and frequency. It also likes to spit out messages like "glitches is above threshold" without any sort of clue of definition of what a glitch is, or why there is a threshold.

Anyway when testing boards I like to turn auto-tuning off and run at nice comfortable fixed frequencies. 500mhz is a nice reasonable speed, 400mhz if it's really hot outside and 600mhz if it's cool. But as I mentioned Braiins loves to auto-tune and it will even try to tune if you turn the tuning to *off*

It will even be tuning when the web console explicitly says tuning is disabled....

From what I can see: If you have *any* configuration option in the autotuning or dynamic power scaling sexctions with any value it will assume tuning and will drive your boards nuts. Even if you have the checkboxes for tuning cleared....

This post brought to you by 2 hours of troubleshooting and wondering why boards started dropping offline for no reason :-)
philipma1957
Legendary
*
Online Online

Activity: 4298
Merit: 8768


'The right to privacy matters'


View Profile WWW
June 28, 2021, 09:45:50 PM
 #11

Another note before I write up how to reflow these chips: Braains really, really wants to auto-tune your chips.

This is nice, but the auto-tuner doesn't always work right and winds up shutting down boards for long periods of time while it tries every combination of voltage and frequency. It also likes to spit out messages like "glitches is above threshold" without any sort of clue of definition of what a glitch is, or why there is a threshold.

Anyway when testing boards I like to turn auto-tuning off and run at nice comfortable fixed frequencies. 500mhz is a nice reasonable speed, 400mhz if it's really hot outside and 600mhz if it's cool. But as I mentioned Braiins loves to auto-tune and it will even try to tune if you turn the tuning to *off*

It will even be tuning when the web console explicitly says tuning is disabled....

From what I can see: If you have *any* configuration option in the autotuning or dynamic power scaling sexctions with any value it will assume tuning and will drive your boards nuts. Even if you have the checkboxes for tuning cleared....

This post brought to you by 2 hours of troubleshooting and wondering why boards started dropping offline for no reason :-)

this is good news in the sense that it will be reported as a bug.

also I am not sure if it was the december the april or the june firmware.

they may have fixed it already.


▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3178
Merit: 2260


I fix broken miners. And make holes in teeth :-)


View Profile
June 28, 2021, 10:01:33 PM
 #12

this is good news in the sense that it will be reported as a bug.

also I am not sure if it was the december the april or the june firmware.

they may have fixed it already.

Well, I'm not up to reporting it as a bug, but even with *every* "Do not tune" option off, it still wants to shut down boards in the name of "tuning". When I run the boards at say 400mhz they will run at much slower hashrates even than what is "expected" (the nominal hashrate). This is enough to trip the tuner that runs every 30 minutes:

Code:
Jun 28 20:52:29.793 INFO Tune/all: ----- TUNER ITERATION -----
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: Jun 28 20:52:29.794 INFO Tune/1: Evaluated configuration result[iter=2]: voltage:[18.29 V] hashrate:[8645.45 GH/s / 11827.20 GH/s=0.731], chips[underperf/max_expected]:[26/0], power[now/required/limit]:[537 W/598.78/611 W], result:[REJECTED], reason:[Underperforming chip count is above threshold]
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: Jun 28 20:52:29.794 INFO Tune/all: --- 1 ==> RESULTS stage=6 iter=2 voltage=18.2875 ---
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: mcr=73.1% measured_hr=8645.45 calculated_hr=11827.20 avg_freq=400000000
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: power_limit=611 calculated_power=537 measured_power=None
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: chip_below_ratio=59% chip_count=44 chip_count_below_threshold=26
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: satisfactory=false config_from_iter=0
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  0   71        71        83        78     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  1   80        83        72        77     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  2   53        72        80        66     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  3   81        81        73        52     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  4   69        81        78        68     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  5   82        62        71        70     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  6   72        70        72        81     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  7   70        83        74        71     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  8   64        52        73        74     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  9   80        76        63        78     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400 10   83        72        73        81     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: Jun 28 20:52:29.799 INFO Tune/3: Evaluated configuration result[iter=2]: voltage:[18.29 V] hashrate:[8913.19 GH/s / 11827.20 GH/s=0.754], chips[underperf/max_expected]:[19/0], power[now/required/limit]:[537 W/598.78/611 W], result:[REJECTED], reason:[Underperforming chip count is above threshold]
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: Jun 28 20:52:29.799 INFO Tune/all: --- 3 ==> RESULTS stage=6 iter=2 voltage=18.2875 ---
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: mcr=75.4% measured_hr=8913.19 calculated_hr=11827.20 avg_freq=400000000
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: power_limit=611 calculated_power=537 measured_power=None
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: chip_below_ratio=43% chip_count=44 chip_count_below_threshold=19
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: satisfactory=false config_from_iter=0
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  0   71        78        72        82     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  1   83        74        81        70     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  2   73        78        82        82     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  3   83        71        84        73     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  4   73        78        76        82     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  5   82        80        77        64     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  6   72        76        76        82     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  7   70        73        79        63     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  8   67        74        77        79     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  9   75        81        74        78     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400 10   77        77        43        73     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: Jun 28 20:52:29.799 INFO Tune/all: Status: measuring performace for 1800 seconds

T17's tend to have "binned" chips that weren't good enough for the expensive S17's and the like. That's part of the reason they are cheaper and kind of a hard knock life sort of thing. But if I run the chips at 500mhz I get 13.51,13.41 out of 14.78 "Nominal" (11% off nominal) as opposed to 25% off "nominal" at 400mhz. Also I cut the voltage down to a fixed 17 volts instead of the default of 18.25 and it's running cooler. I might try 16v just to see how it works.

It is interesting to note that the boards have very limited ability to regulate voltage *on the board* so mismatched boards in a miner box can probably cause chaos. Voltage regulation is done once at the power supply which is nice (you can put big juicy FETs in the power supply and cool the hell out of them) but makes this thing much more of a unit than say an S9 (where the +12 is regulated to 9 or so on a per board basis).

philipma1957
Legendary
*
Online Online

Activity: 4298
Merit: 8768


'The right to privacy matters'


View Profile WWW
June 28, 2021, 10:13:15 PM
 #13

this is good news in the sense that it will be reported as a bug.

also I am not sure if it was the december the april or the june firmware.

they may have fixed it already.

Well, I'm not up to reporting it as a bug, but even with *every* "Do not tune" option off, it still wants to shut down boards in the name of "tuning". When I run the boards at say 400mhz they will run at much slower hashrates even than what is "expected" (the nominal hashrate). This is enough to trip the tuner that runs every 30 minutes:

Code:
Jun 28 20:52:29.793 INFO Tune/all: ----- TUNER ITERATION -----
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: Jun 28 20:52:29.794 INFO Tune/1: Evaluated configuration result[iter=2]: voltage:[18.29 V] hashrate:[8645.45 GH/s / 11827.20 GH/s=0.731], chips[underperf/max_expected]:[26/0], power[now/required/limit]:[537 W/598.78/611 W], result:[REJECTED], reason:[Underperforming chip count is above threshold]
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: Jun 28 20:52:29.794 INFO Tune/all: --- 1 ==> RESULTS stage=6 iter=2 voltage=18.2875 ---
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: mcr=73.1% measured_hr=8645.45 calculated_hr=11827.20 avg_freq=400000000
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: power_limit=611 calculated_power=537 measured_power=None
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: chip_below_ratio=59% chip_count=44 chip_count_below_threshold=26
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: satisfactory=false config_from_iter=0
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  0   71        71        83        78     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  1   80        83        72        77     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  2   53        72        80        66     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  3   81        81        73        52     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  4   69        81        78        68     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  5   82        62        71        70     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  6   72        70        72        81     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  7   70        83        74        71     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  8   64        52        73        74     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  9   80        76        63        78     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400 10   83        72        73        81     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: Jun 28 20:52:29.799 INFO Tune/3: Evaluated configuration result[iter=2]: voltage:[18.29 V] hashrate:[8913.19 GH/s / 11827.20 GH/s=0.754], chips[underperf/max_expected]:[19/0], power[now/required/limit]:[537 W/598.78/611 W], result:[REJECTED], reason:[Underperforming chip count is above threshold]
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: Jun 28 20:52:29.799 INFO Tune/all: --- 3 ==> RESULTS stage=6 iter=2 voltage=18.2875 ---
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: mcr=75.4% measured_hr=8913.19 calculated_hr=11827.20 avg_freq=400000000
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: power_limit=611 calculated_power=537 measured_power=None
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: chip_below_ratio=43% chip_count=44 chip_count_below_threshold=19
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: satisfactory=false config_from_iter=0
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  0   71        78        72        82     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  1   83        74        81        70     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  2   73        78        82        82     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  3   83        71        84        73     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  4   73        78        76        82     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  5   82        80        77        64     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  6   72        76        76        82     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  7   70        73        79        63     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  8   67        74        77        79     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400  9   75        81        74        78     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]:   400 10   77        77        43        73     
Mon Jun 28 20:52:29 2021 daemon.err bosminer[8761]: Jun 28 20:52:29.799 INFO Tune/all: Status: measuring performace for 1800 seconds

T17's tend to have "binned" chips that weren't good enough for the expensive S17's and the like. That's part of the reason they are cheaper and kind of a hard knock life sort of thing. But if I run the chips at 500mhz I get 13.51,13.41 out of 14.78 "Nominal" (11% off nominal) as opposed to 25% off "nominal" at 400mhz. Also I cut the voltage down to a fixed 17 volts instead of the default of 18.25 and it's running cooler. I might try 16v just to see how it works.

It is interesting to note that the boards have very limited ability to regulate voltage *on the board* so mismatched boards in a miner box can probably cause chaos. Voltage regulation is done once at the power supply which is nice (you can put big juicy FETs in the power supply and cool the hell out of them) but makes this thing much more of a unit than say an S9 (where the +12 is regulated to 9 or so on a per board basis).



Yeah I wondered if all three boards got the exact same volts.

I think you can do

 freq 500
freq 500
freq 400

if one board is crappy.

I have an s17 running that way.

and it cant auto tune at all.

as one board over heats on any auto tune.

but if it runs at

freq 500 for good board 1
freq 500 for good board 2
freq 400 for hot board 3

the gear will stay up.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3178
Merit: 2260


I fix broken miners. And make holes in teeth :-)


View Profile
June 28, 2021, 11:58:30 PM
 #14

Any voltage drop would probably be a good indicator of a loose power connection. That would result in hilarity pretty quickly; I see why they went with bolts instead of those tabs on the S15's. And why you find burned boards around the top on ole Ebay...

Anyway, just an update: Cutting the voltage really makes a nice difference in temps and power usage. Currently at 16 volts per board and seems a lot happier than 17. Now at 95% real to normal hash rate ratio (remember back at 18.25v I was at 75%) and running 75c with fans at 50% instead of the earlier 100%.

Wonder if anyone else figured this out. Ok, back to writing about how to fix the boards....
philipma1957
Legendary
*
Online Online

Activity: 4298
Merit: 8768


'The right to privacy matters'


View Profile WWW
June 29, 2021, 12:18:14 AM
 #15

Any voltage drop would probably be a good indicator of a loose power connection. That would result in hilarity pretty quickly; I see why they went with bolts instead of those tabs on the S15's. And why you find burned boards around the top on ole Ebay...

Anyway, just an update: Cutting the voltage really makes a nice difference in temps and power usage. Currently at 16 volts per board and seems a lot happier than 17. Now at 95% real to normal hash rate ratio (remember back at 18.25v I was at 75%) and running 75c with fans at 50% instead of the earlier 100%.

Wonder if anyone else figured this out. Ok, back to writing about how to fix the boards....

I have six units with pc access on brains.

all s17 pros.

I will try doing this asap.

So freq 400 set to 17 volts was doing 37.5 th at 1434 watts

now have freq 400 set to 16.1 volts and it is doing? 38.1 th at 1310 watts with large temp drop from 86 c to 81 c


next unit

freq at 480
volts at 17.1
temps are 86.6c
watts are 1671

going to only change the volts to 16.1


▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3178
Merit: 2260


I fix broken miners. And make holes in teeth :-)


View Profile
June 29, 2021, 12:33:03 AM
 #16

Ok, so how did I fix that board? It was reporting only 33 chips and was dead in the water.

The first clue is the chip count. The board was able to start at the bottom left and count its way up on chips. The count is in a serpentine pattern from the big heat sink on the bottom (1) then up, then over, then down, then over then up.... If you count carefully you see that chip 33 is not at the edge of the board on the big side heat sinks, but one chip in.

Given that the hottest spots on the board are the chips where the exhaust fans are (that's why they have bigger heat sinks than the ones by the intake) it's a pretty safe bet to assume that chip 34 is probably open. So we remove the heat sink on chip 34.

To pull a heat sink you want to heat the sink but not the chip under it. So you do not use pre-heating and instead set your air tool to about 350c, low air flow (so the air doesn't just rush off the sink) and heat the sink for about 60 to 90 seconds. When you tap the heat sink lightly with a pliers and it moves you're done.

When you pull a sink, mark its position on the heat sink (say the chip number). That way when you put it back you have exactly the same amount of solder on the chip top and the sink. I'd recommend against adding solder, too much and it will drop onto the chip with the usual results.

With the chip off, let the board cool down and take a look at the chip with your loupe or 10x magnifying goggles. You'll probably see crud on the pins of the chip on the side, clean that off with isopropyl alcohol and a wooden splinter. Don't use metal tools you will scratch the board or damage the traces.

Once the chip is nice and clean, look at it. If it doesn't look burned or cracked you can try a reflow. Now I see lots of people pulling chips with a set of tweezers and an air tool blasting hot air straight down.

I think this sucks.

You have to heat the chip up *and* the board underneath. Blasting it with air like that will heat the chip up well beyond the solder melting point while it is trying to heat the pads underneath. Because the pads are cooler the solder will not flow as well and you will probably BBQ the chip. Not to mention blowing the smaller components off.

So what do you do?
First, you flux. After cleaning those pin landing pads and getting all the grunge out from the spaces between the pads you put on a *small* amount of flux on each side of the chip. Think "top of toothpick" amount of flux per side. Don't leave a lot, it's supposed to conduct heat, burn off any impurities and help the solder flow.

Next step: You warm the board up first with a pre-heater.

For a board like this I recommend a nice Aoyue 863 IR preheater. Not only is it big enough to hold the whole board, not only is it nice looking, it also has two external temp sensors so you can measure the *board* temperature and keep it from getting too hot.

How hot?
Well, I like to put the control sensor (B) under the board and touching one of the lower heat sinks. The other sensor (C) I put on top of the board under a heat sink near the chip I want to reflow. That way I control the temperature to not get the bottom too hot (which can cause heat sinks to drop off, embarrassing) while watching the top temp to come close to my set point (the heater will cycle on and off but the heat will soak through the board.

So how hot?
I like to pre-heat the board to 100c. It's well below the melting point of the solder, but at the same time it allows the air tool to only have to bring everything up 150 or so C (peak temperature of solder) quickly and evenly. Plus since the board is already warm the solder will easily flow onto both the board pads and the chip pads to make a nice proper connection that Antminer just can't seem to do right all the time...

Then you set your air tool to a low airflow (you don't need to blow things off, more make a little bubble of warm above the chip), about 320c for about a minute and a half tops. Watch the top temp sensor, it's under a nearby heat sink so it's not getting the blast but it should climb up to around 200c or so.

S17's and such have a nice little extra feature: They have that copper top that has solder on it, remember? When it starts turning shiny then the solder is flowing and since you have preheated the board the solder on the landings will turn shiny as well. Hold for a few seconds, then let off the heat.

Let the board cool down with it still on the preheater. Then turn off the preheater after a minute or two and leave the board alone. Don't touch it, screw with it, etc. It will take a good 15-20 minutes for the board to cool down, let it be.

Then set you miner to 50mhz speeds, put the board in without the heat sink, and try it. You will probably see 44 chips, if you do pull the plug immediately. Then take out the board and put the heat sink on. Remember, no pre-heat, put a bit of flux on the center of the heat sink, line up the sink perfectly (the flux will hold it in place) then low flow heat from the air tool to secure the heat sink again.

That's it. Yes it takes time and you really want a pre-heater. I found the value of those when I was fixing KNC titan and neptune boards: The amount of copper in there would wick away the heat from a blowtorch. Pre-heat is your special friend.

NotFuzzyWarm
Legendary
*
Offline Offline

Activity: 3808
Merit: 2697


Evil beware: We have waffles!


View Profile
June 29, 2021, 12:40:26 AM
 #17

<snip>
Anyway, just an update: Cutting the voltage really makes a nice difference in temps and power usage. Currently at 16 volts per board and seems a lot happier than 17. Now at 95% real to normal hash rate ratio (remember back at 18.25v I was at 75%) and running 75c with fans at 50% instead of the earlier 100%.

Wonder if anyone else figured this out. Ok, back to writing about how to fix the boards....
If you mean the 'one Vcore for all' boards, I've mentioned that several several time in the past. Using the PSU to do all regulation saves several % eff vs each board having final on-board Vcore regulators and is a large part of the higher power eff of modern miners.

I will try doing this asap.

So freq 400 set to 17 volts was doing 37.5 th at 1434 watts
now have freq 400 set to 16.1 volts and it is doing? 38.1 th at 1310 watts with large temp drop from 86 c to 81 c

next unit

freq at 480
volts at 17.1
temps are 86.6c
watts are 1671

going to only change the volts to 16.1
I assume what is happening is that like Canaan does the software is slightly tweaking the the speed per-chip based on tracking some majik error-rate parameter. Canaan likened it to fiddling with a radio receiver s/n ratio

- For bitcoin to succeed the community must police itself -    My info useful? Donations welcome!  3NtFuzyWREGoDHWeMczeJzxFZpiLAFJXYr
 -Sole remaining active Primary developer of cgminer, Kano's repo is here
-Support Sidehacks miner development. Donations to:   1BURGERAXHH6Yi6LRybRJK7ybEm5m5HwTr
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3178
Merit: 2260


I fix broken miners. And make holes in teeth :-)


View Profile
June 29, 2021, 01:32:31 AM
 #18

If you mean the 'one Vcore for all' boards, I've mentioned that several several time in the past. Using the PSU to do all regulation saves several % eff vs each board having final on-board Vcore regulators and is a large part of the higher power eff of modern miners.

I was thinking dropping the voltages manually results in better efficiency. I did wonder if they had additional regulators on the board itself to fine tune the voltage for each chip bank, but whatever they have on there is exceptionally small compared to a real power supply (I should buzz that out someday).

A Higher base voltage to the boards is always great because it means less current, at the same time the little TO series fets they put on boards were always a weak point. Using a nice big-o TO247 in the power supply allows you to have higher switching frequencies, better multi-phase buckers, and overall a lot less problems.

Quote
I assume what is happening is that like Canaan does the software is slightly tweaking the the speed per-chip based on tracking some majik error-rate parameter. Canaan likened it to fiddling with a radio receiver s/n ratio

Probably: We long ago noticed on the Titans (which were great, they had 8 nice power supplies on the board running in an imploder mode to power each corner of the die in a split phase interleave) that if you cut voltage on the die the efficiency went up right to the point where the chip would start throwing a lot of errors. Tarkin made his mark by writing some code that would step the voltage down on a die a notch, then watch the error rate. If the errors went over 1% it backed the voltage up a notch and called it a day.
Then you bump up the clock rates up a tick and see how it works at a higher rate.  Properly tuned a pile of Titans are *STILL* mining at a profit 7 or so years later.

Lower voltage at the chip=less heat which means the chips can run at a higher clock rate. What's weird is a higher base voltage causes the chip to mine *slower*. Maybe there is some internal regulation in the chip that turns off dies if the voltage is too high or something; wish I could go into Brains' mining code and have it display dies and cores like BFGMINER did. Maybe I can.....
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3178
Merit: 2260


I fix broken miners. And make holes in teeth :-)


View Profile
June 29, 2021, 01:35:20 AM
 #19

I have six units with pc access on brains.

all s17 pros.

I will try doing this asap.

So freq 400 set to 17 volts was doing 37.5 th at 1434 watts

now have freq 400 set to 16.1 volts and it is doing? 38.1 th at 1310 watts with large temp drop from 86 c to 81 c


next unit

freq at 480
volts at 17.1
temps are 86.6c
watts are 1671

going to only change the volts to 16.1



Hurray! I'm helping reduce global warming by making miners more efficient with this thread! When I get the Nobel prize I'll be sure to mention this forum...
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3178
Merit: 2260


I fix broken miners. And make holes in teeth :-)


View Profile
June 30, 2021, 09:28:54 PM
Last edit: June 30, 2021, 10:44:11 PM by lightfoot
 #20

Meantime I've been watching the logs on this T17 and noticed that board 1 will occasionally throw this error:

Code:
syslog.old.6:Wed Jun 30 01:24:59 2021 daemon.err bosminer[5684]: Jun 30 05:24:59.640 INFO CHAIN/1: Discovered 14 chips

Looks like the chip after 14 is having an intermittent connection. Great, I love intermittent connections. So took the board out, put it on the preheater, took off the heat sink and here's a few pics:

On the preheater with the sink for chip 15 removed:


Close up of the chip. The side closest to the intake fans had crud on the pads and of course solder balls from a bad factory flow. Bad factory!


After a reflow: The pads no longer have solder balls as the solder is adhered to both the chip *and* the board. Preheat will do that...


And now the board is back in and running with 44 chips. I'm letting it tune as all three board are now in good shape.
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!