Bitcoin Forum
April 26, 2024, 08:05:16 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3] 4 5 »  All
  Print  
Author Topic: Hacking Bitmain Antminers (S7 & S9) because man a lot of these break......  (Read 2182 times)
sixxkilur
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile WWW
January 06, 2018, 06:53:15 AM
 #41

Bitmain should go back to the drawing board when it comes to heat sinks. They should utilize the same type Gridseed blades use. Gridseed and others used a full aluminum block heat sink for each hashing board.
It made them very heavy but their blades were reliable and stable.
 
1714161916
Hero Member
*
Offline Offline

Posts: 1714161916

View Profile Personal Message (Offline)

Ignore
1714161916
Reply with quote  #2

1714161916
Report to moderator
1714161916
Hero Member
*
Offline Offline

Posts: 1714161916

View Profile Personal Message (Offline)

Ignore
1714161916
Reply with quote  #2

1714161916
Report to moderator
1714161916
Hero Member
*
Offline Offline

Posts: 1714161916

View Profile Personal Message (Offline)

Ignore
1714161916
Reply with quote  #2

1714161916
Report to moderator
If you want to be a moderator, report many posts with accuracy. You will be noticed.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714161916
Hero Member
*
Offline Offline

Posts: 1714161916

View Profile Personal Message (Offline)

Ignore
1714161916
Reply with quote  #2

1714161916
Report to moderator
1714161916
Hero Member
*
Offline Offline

Posts: 1714161916

View Profile Personal Message (Offline)

Ignore
1714161916
Reply with quote  #2

1714161916
Report to moderator
1714161916
Hero Member
*
Offline Offline

Posts: 1714161916

View Profile Personal Message (Offline)

Ignore
1714161916
Reply with quote  #2

1714161916
Report to moderator
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
January 06, 2018, 09:50:16 PM
 #42

Those have their advantages, but I don't know if they do quite as well on airflow/heat transfer. Properly lined up individual heal sinks can dissipate a lot of heat, but when they shadow each other you get a turbulent mess.

Avalon A6's for example have a full plate heat sink and don't drop heat quite as well.

I think Bitmain just needs to pay more attention to quality control. Maybe it's not such an issue when they have no competition and people are literally shoving money at them, but a crappy product is still a crappy product. And some of these seem to be lesser in terms of quality.

C
dslr11
Newbie
*
Offline Offline

Activity: 15
Merit: 3


View Profile
January 11, 2018, 12:05:43 AM
 #43

KIND OF related to this topic (Antminers breaking and then hacking them...):

I didn't realize until now that different versions of the S9 and their hashing boards MAY not be interchangeable due to the fixed frequency vs. auto tune versions, as well as other minor changes.

I searched all over the net to try to figure out the version history of the S9, and what's compatible and what isn't but I came up empty.

Does anyone know how the different versions of the S9 control boards and the hashing boards match up? E.g. my S9s are version 1.6. and I just bought a version 3.75 hashing board to replace a failed one, but realizing my ignorance I thought I would find out if they are compatible before I drop in the new hashing board.

I think this info (like a compatibility chart) would be very useful for anyone who needs to replace a hashing board or a control board. They come up for sale on eBay every once in a while but it's better to be sure in advance if what you are buying will actually work...

Any words of wisdom on this would be greatly appreciated.
p7mining
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
January 20, 2018, 05:19:08 PM
 #44

I have a new antminer - hashing board will start hashing and then around 8-48 hrs later one of them just stops hashing with all asics reporting healthy. - i swapped the controller cable with another hashing board and the problem followed the hashing board.  I pulled the hashing board and everything looked new and aligned with no solder breaks (that i could see) - any ideas lightfoot

https://raw.githubusercontent.com/blockopsmining/minerimages/master/hashingissues.png

The only way to recover it now is either a reboot, which works sometimes, or a factory reset
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
January 20, 2018, 06:06:13 PM
 #45

I have a new antminer - hashing board will start hashing and then around 8-48 hrs later one of them just stops hashing with all asics reporting healthy. - i swapped the controller cable with another hashing board and the problem followed the hashing board.  I pulled the hashing board and everything looked new and aligned with no solder breaks (that i could see) - any ideas lightfoot



The only way to recover it now is either a reboot, which works sometimes, or a factory reset
Hm. Can you watch the temps with the board running, or try moving the board to the middle and see if the problem is in the position as opposed to the board.
p7mining
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
January 20, 2018, 06:56:09 PM
 #46

I have several screen shot and temps are between 69-77 degrees reported at the time of the failure...i will try moving the hashing board to the middle slot now
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
February 02, 2018, 04:10:11 AM
Last edit: February 03, 2018, 05:29:36 PM by lightfoot
 #47

Anyway, been awhile, been working on things. Some thoughts about S9's:

There are 7 strings of chips in the S9, each string has 9 chips for a total of 63 total.

When you power up a S9 board without any controller, the voltage between neutral and the choke should be about 9 volts or so. This is the resting voltage, and comes out to be about 1v per hashing chip. About right.

Now, when the board is connected to a controller and you power up, you should see only 200mv or so while the controller boots. Thus the controller keps the FETs in a mostly off condition. Not totally off, as we shall see, but mostly off.

As the unit starts to boot you will see the voltage on the choke go to about 9.5 volts, then stay there. The led on the edge should flash (this is the reset command) then flash quickly for a few seconds (loading the hashes) then go solid on once running.

If you see the voltage go to 9.5, then drop to zero 7 times this is the controller trying to reset the chips. Problem is the chips are not responding, which is the most common failure.

Once the strings have checked out (each time a string is checked the LED flickers briefly) the voltage goes to about 9.6v, the lights stay on, and the unit starts hashing.

Note: You need solid power supplies to keep an S9 going and to let it start; if it doesn't see a solid 12v voltage it will refuse to start.

Note: You need two fans running. if you have only one it will start to hash then shut down within a few seconds.

Note: Slushpool works fine with S9 miners.
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
February 03, 2018, 12:50:55 AM
 #48

Other interesting things:

I'm wondering if the white data/comm plugs are being damaged and not making contact: The soldering job for those headers is really poor to be honest, and I've noticed that it seems to be easy to deform if you plug and unplug a few times. Plus when I try to reflow the solder I often find one or two pins that wiggle around when the solder is molten.

Likewise on one board finger pressure on the cable is enough to have the led light to come on 7 times (when it is testing each string) and stay on for awhile.

Anyone know the digi key part number for that plug? I'm thinking of pulling one and seeing what happens. Wouldn't be the first time something like this has happened......
HagssFIN
Legendary
*
Offline Offline

Activity: 2422
Merit: 1706


Electrical engineer. Mining since 2014.


View Profile WWW
February 03, 2018, 11:01:32 AM
 #49

Anyone know the digi key part number for that plug? I'm thinking of pulling one and seeing what happens. Wouldn't be the first time something like this has happened......

I remembered seeing this old post by MarkAz.

It is most likely some connector model manufactured by JST.

Quote from: MarkAz
I can't tell you the exact model, because there are literally a ton, but I can tell you who is probably the manufacturer, and that is JST - here's some of the family of products that I thought looked like close matches:

http://www.jst-mfg.com/product/detail_e.php?series=583
http://www.jst-mfg.com/product/detail_e.php?series=645
http://www.jst-mfg.com/product/detail_e.php?series=105
http://www.jst-mfg.com/product/detail_e.php?series=191
http://www.jst-mfg.com/product/detail_e.php?series=275

If you try contacting them and sending pictures, they might be able to tell you for sure...
https://bitcointalk.org/index.php?topic=1077661.msg11516859#msg11516859

lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
February 03, 2018, 05:32:44 PM
 #50

Interesting. I took the plug off a wrecked S7 spare, then put it on this S9 I'm working on. Now the unit flashes the led 7 times then goes solid for a minute, then goes out, never hashes. This is an improvement, but not perfect. I did however clear the pins 100%, so unless we have broken vias in there this is not fully the problem.

However reflowing the pins did get one other board working, so I now have a reference point. Next step is to check all resistances to hot and ground, to see if there are differences between a good board, a sorta not working board, and a dead board.

Drat.
the_electronrancher
Jr. Member
*
Offline Offline

Activity: 112
Merit: 4


View Profile
February 03, 2018, 06:04:02 PM
 #51

The short heatsinks are soldered to the chips ground planes, meaning that after the first tier they are "hot".

Don't set a board on a metal workbench, lol.

But the main point is I wanted to tell you, lightfoot, that you might measure all the backside heatsinks to get yourself a voltage map of the board.  I don't agree that it's 7x9, nor that 28nm chips have a 1v core voltage - that's a little high for 28nm.

Flip a board so the short sinks are up, apply 12v with the ribbon disconnected so you get the 9.6v idle voltage from DC-DC, and measure all the heatsinks.  You should get a nice voltage map of the tiers which could help you debug if you get a board with blown tiers someday.

PM me if your numbers aren't consistent, I can send it but since you are digging in to these devices I think it will be a useful exercise and good technique to practice.

Best!
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
February 03, 2018, 06:58:12 PM
 #52

Yep, you're right, I forgot that the bottom (power plane) heat sinks are hot (pulls heat from the chip through the board). I'll make a map, and will post that and the ground fuzzing as a start.

Ultimate question is of course what is shutting down the board? We have three items:

1) hashing chips themselves
2) Power circuitry
3) Signal and support circuitry.

If it's the chip itself then it would fail either open or short. Short can be found using the map technique: Look at the voltages and find the one that reads 0 between adjacent chips. Open is easier, one string will show voltage on the first chip in the chain but no others. Finding the exact chip would then be done by measuring chip to chip resistance, one of them is going to read zero.

A side question in my brain is what's the clock and signaling circuitry like: If they all share a common clock signal then a shorted chip would ground out the clock, which could be measured at the 25mhz crystal. Likewise if they daisy chain the data signal, then a shorted chip will not pass the signal or will ground it.

Back to the drawing board after I get some other stuff done. I'll see if these other two boards have a dead chip.

If it is a chip, then it's possible to remove with air tools and a fair bit of preheat. These look a bit easier than normal QFN chips, as they are thinner (warm up more quickly) and they have those nice big power strips on the side and center which should auto-center them. Hm.

Pulling the heat sinks is not too hard, just warm up the board then use air at a lower temp to soften the glue, then pull sink then clean top of chip to remove. Do you have a pin map of the chip itself, I could hot wire a diode and try it out.

63 chips would be 9*7 or 3*3*3. So we either have 9 strings of 7 (no), 7 strings of 9 (maybe) or 3 strings of 21 (don't know about that). One way to find out....
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
February 03, 2018, 08:53:18 PM
 #53

Ok, it's 3 strings of 21. If you follow the chips from the 3 on the left all the way around you see voltages like (all referenced to ground)

8.58, 8.15, 7.75, 7.27, 6.44, 6.02, 5.605, 5.164, 4.74, 4.2, 3.7, 3.2, 2.8, 2.4, 1.6, 1.2 .8, .4, 0.
(9.1 volts at source)

Or each chip pulling about .4 volts. Makes sense.

Likewise it looks like the three chips are run in parallel, so you get .5 ohms from one chip to another in the three chip set. Have to think before I do a chip to chip test, I don't want my multimeter to back-feed voltage and damage anything...

However when we fire up the board we get:
Miner Type = S9
set_reset_allhashboard = 0x0000ffff
set_reset_allhashboard = 0x00000000
set_reset_allhashboard = 0x0000ffff
set_reset_allhashboard = 0x0000ffff
Check chain[5] PIC fw version=0x03
Fix freq=550 Chain[5] voltage_pic=6 value=940
set_reset_allhashboard = 0x0000ffff
set_reset_allhashboard = 0x00000000
Chain[J6] has 0 asic
set_reset_hashboard = 0x00000020
set_reset_hashboard = 0x00000000
retry Chain[J6] has 0 asic

So if the chips are not shorted then they probably are not the source of the problem. A dead shorted chip would also raise the voltages around the string and would probably blow up pretty quickly. An open chip would not be spotted by this test, as the ground planes are probably wired together and would mask an open chip.

I did leave one board powered up for a bit, and the heat sinks eventually warm up. Didn't see a temp differential on the top or bottom heat sinks from sink to sink, so all engines are probably up and idle.

Hm..... Next question is it's either the support chips, or the signal line is cut. But if cut, where.....
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
February 03, 2018, 09:02:06 PM
 #54

This is interesting. On one of the boards from last night's test I see that it did come up once briefly....

Code:
Check chain[7] PIC fw version=0x03
Fix freq=550 Chain[7] voltage_pic=6 value=940
set_reset_allhashboard = 0x0000ffff
set_reset_allhashboard = 0x00000000
Chain[J8] has 62 asic
set_reset_hashboard = 0x00000080
set_reset_hashboard = 0x00000000
retry Chain[J8] has 62 asic
set_reset_hashboard = 0x00000080
set_reset_hashboard = 0x00000000
retry Chain[J8] has 62 asic
set_reset_hashboard = 0x00000080
set_reset_hashboard = 0x00000000
retry Chain[J8] has 62 asic
set_reset_hashboard = 0x00000080
set_reset_hashboard = 0x00000000
retry Chain[J8] has 62 asic
set_reset_hashboard = 0x00000080
set_reset_hashboard = 0x00000000
retry Chain[J8] has 62 asic
set_reset_hashboard = 0x00000080
set_reset_hashboard = 0x00000000
retry Chain[J8] has 62 asic
Chain[J8] has no freq in PIC, set default freq=550M
Chain[J8] has no core num in PIC

Miner fix freq ...
read PIC voltage=940 on chain[7]
Chain:7 chipnum=62
Asic[ 0]:550
Asic[ 1]:550 Asic[ 2]:550 Asic[ 3]:550 Asic[ 4]:550 Asic[ 5]:550 Asic[ 6]:550 Asic[ 7]:550 Asic[ 8]:550
Asic[ 9]:550 Asic[10]:550 Asic[11]:550 Asic[12]:550 Asic[13]:550 Asic[14]:550 Asic[15]:550 Asic[16]:550
Asic[17]:550 Asic[18]:550 Asic[19]:550 Asic[20]:550 Asic[21]:550 Asic[22]:550 Asic[23]:550 Asic[24]:550
Asic[25]:550 Asic[26]:550 Asic[27]:550 Asic[28]:550 Asic[29]:550 Asic[30]:550 Asic[31]:550 Asic[32]:550
Asic[33]:550 Asic[34]:550 Asic[35]:550 Asic[36]:550 Asic[37]:550 Asic[38]:550 Asic[39]:550 Asic[40]:550
Asic[41]:550 Asic[42]:550 Asic[43]:550 Asic[44]:550 Asic[45]:550 Asic[46]:550 Asic[47]:550 Asic[48]:550
Asic[49]:550 Asic[50]:550 Asic[51]:550 Asic[52]:550 Asic[53]:550 Asic[54]:550 Asic[55]:550 Asic[56]:550
Asic[57]:550 Asic[58]:550 Asic[59]:550 Asic[60]:550 Asic[61]:550
Chain:7 max freq=550
Chain:7 min freq=550

max freq = 550
set baud=2
Chain[J8] set working voltage=940 [6]
setStartTimePoint total_tv_start_sys=167 total_tv_end_sys=168
restartNum = 2 , auto-reinit enabled...
do read_temp_func once...
do check_asic_reg 0x08

get RT hashrate from Chain[7]: (asic index start from 1-63)
Asic[01]=71.4200 Asic[02]=58.5860 Asic[03]=63.6860 Asic[04]=63.3500 Asic[05]=64.3570 Asic[06]=61.4880 Asic[07]=60.3810 Asic[08]=61.5380
Asic[09]=63.5520 Asic[10]=56.8910 Asic[11]=59.5420 Asic[12]=63.0320 Asic[13]=60.8500 Asic[14]=57.7470 Asic[15]=62.7970 Asic[16]=64.5750
Asic[17]=64.4070 Asic[18]=66.4200 Asic[19]=56.5890 Asic[20]=64.8940 Asic[21]=62.4950 Asic[22]=63.6860 Asic[23]=60.8840 Asic[24]=59.5080
Asic[25]=61.9070 Asic[26]=64.7930 Asic[27]=60.9180 Asic[28]=63.6350 Asic[29]=58.8710 Asic[30]=60.3810 Asic[31]=63.7700 Asic[32]=65.3300
Asic[33]=59.5750 Asic[34]=60.9340 Asic[35]=58.5020 Asic[36]=65.6150 Asic[37]=67.2430 Asic[38]=63.7700 Asic[39]=69.1550 Asic[40]=67.3430
Asic[41]=63.2830 Asic[42]=66.5380 Asic[43]=64.1890 Asic[44]=61.3540 Asic[45]=59.8100 Asic[46]=65.2960 Asic[47]=67.3770 Asic[48]=61.2700
Asic[49]=61.7560 Asic[50]=61.7560 Asic[51]=61.7230 Asic[52]=65.2800 Asic[53]=64.1720 Asic[54]=65.3470 Asic[55]=64.3400 Asic[56]=60.3300
Asic[57]=59.3910 Asic[58]=63.2660 Asic[59]=67.0080 Asic[60]=66.5710 Asic[61]=60.9340 Asic[62]=62.4610 Check Chain[J8] ASIC RT error: (asic index start from 1-63)
Done check_asic_reg
do read temp on Chain[7]
Done read temp on Chain[7]
set FAN speed according to: temp_highest=0 temp_top1[PWM_T]=0 temp_top1[TEMP_POS_LOCAL]=0 temp_change=0 fix_fan_steps=0
set full FAN speed...
FAN PWM: 100
read_temp_func Done!
CRC error counter=6567
In other words it came up with 62 Asics briefly, and a high CRC error number. So maybe this is a chip problem. If so, which one.......

Hm.
^.^
Member
**
Offline Offline

Activity: 81
Merit: 10


View Profile
February 03, 2018, 10:55:25 PM
 #55

Here's one for you I think the temperature sensor on my s7 board has gone faulty. Will this stop it mining and more importantly where the hell is the damn thing, I can find it on the older boards but not on this one. Is there also a way to by pass it? Plenty of cooling so it's not going to overheat
the_electronrancher
Jr. Member
*
Offline Offline

Activity: 112
Merit: 4


View Profile
February 04, 2018, 01:32:22 AM
 #56

Nice work!

I had a board like that with intermittent 0 and not 0 asic. 

So if the chips have core vcc (which they do, your number is right) but they don't talk, what's next?

They need IO vcc, they need clock, and they need an unbroken connection to rx and tx on the header.

And they need to be alive, but since they came up once it's likely they are, and are just suffering an intermittent issue with one of the other items.

How good's your scope?
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
February 04, 2018, 09:19:09 PM
 #57

Nice work!

I had a board like that with intermittent 0 and not 0 asic. 

So if the chips have core vcc (which they do, your number is right) but they don't talk, what's next?

We're assuming they do (well, at least 62 did); my guess is the backplane is series/parallel with all three chips in parallel on the power and ground plane as opposed to three true serial strings tied together at both ends. Nice because the power plane is more stable and uniform, bitch because an open chip would be masked by its' neighbors (although you might see this in heat maps, as the chip would not be running at idle and its partners would be a bit warmer because they are carrying more current through them to the next series of three. Hm, where is my peek....)

Quote
They need IO vcc, they need clock, and they need an unbroken connection to rx and tx on the header.
Hm. Is each chip wired to rx/tx on the header, or do they daisy chain between the chips? There's advantages to either way, but if they were all in parallel and one chip grounded it would sink the whole line (and rx/tx would read zero). If series any one chip could sink the string if it went open. Hm.

Quote
And they need to be alive, but since they came up once it's likely they are, and are just suffering an intermittent issue with one of the other items.
Maybe. If one of the 63 put the tx/rx signal to ground or if it broke the chain that would show up as a dead board. The question is which one is doing it?

On titans as a comparison, the 4 main dies on each chip are connected to a common signal bus that can be isolated per chip by removing a 0 ohm jumper. However the hotel power and ground cannot, therefore if a die shorts hard the board is junk. If it shorts soft you can isolate the signal, and if it fails open you just have three dies running.

Back to the S9, there's also a second supply on this board, looks to be a 14.5 volt supply, I was wondering if that was series shared hotel power for the hashing chips.

Quote
How good's your scope?
Pretty good, it's an older Tektronix T922. Main problem is it's only a 15mhz scope, I should upgrade it one of these days.
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
February 04, 2018, 09:27:10 PM
 #58

So anyway, time to break out the real fun debugging tools. As we used to say in thermodynamics, heat is the ultimate bullshit generator. Thus if you have unusual amounts of heat or lack thereof somewhere on a board, something is up.....

So let's plug our 62 chip board into a power supply with no connection to a controller board (steady voltage, nothing hashing) and then take a look at our board here under the eye of a Predator.....



Look at that. Yes the chips on the inside will be warmer than the outside ones, and yes that big blob of heat is the FETs for the power supply. Normal, but what the hell is that heat blob over on the left there. It looks like one chip is not like the others.....

Time to remove that heat sink and see what's going on there. It's one of the chips that doesn't have a second sink on the bottom (they have a delta V of chips without sinks, maybe airflow improvement but very stupid in a series/parallel arrangement) and see what is going on there.

Mr. Thermal is your friend.
Dave64
Newbie
*
Offline Offline

Activity: 8
Merit: 0


View Profile
February 05, 2018, 05:15:49 AM
 #59

Hey lightfoot, my account won't let me send more that 1 pm an hour, please text me on the number I pm'ed you a few weeks ago.  My zip is 74008.
Thanks,
Dave
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
February 05, 2018, 03:10:00 PM
 #60

Gotcha Dave.

In the meantime here is the predator-vision view of an S7 board that powers up the chips but doesn't hash.....



Note the chips glowing normally, and the one chip glowing red. One of these things is not like the others, and in this case it's probably a shorted chip. I've pulled it for review, will swap in another chip this week and see if that fixes it.

Note: The orientation of the chips is weird, they alternate 180 degrees as you go from chip to chip on the board. Probably to better line up signal pins, but a bit confusing regardless.
Pages: « 1 2 [3] 4 5 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!