Bitcoin Forum
December 18, 2018, 06:14:53 PM *
News: Latest Bitcoin Core release: 0.17.0 [Torrent].
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [16] 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 »
  Print  
Author Topic: Hacking KNC Titan / Jupiter / Neptune miners back to life. Why not?  (Read 75612 times)
lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 04, 2016, 05:50:42 AM
 #301

Sure. I can take a look at it. No guarantees I can get it working, but the fact that pressure on the chip makes differences is something worth taking a deeper look into. I'll PM you information, working on getting a few things out today.
1545156894
Hero Member
*
Offline Offline

Posts: 1545156894

View Profile Personal Message (Offline)

Ignore
1545156894
Reply with quote  #2

1545156894
Report to moderator
1545156894
Hero Member
*
Offline Offline

Posts: 1545156894

View Profile Personal Message (Offline)

Ignore
1545156894
Reply with quote  #2

1545156894
Report to moderator
1545156894
Hero Member
*
Offline Offline

Posts: 1545156894

View Profile Personal Message (Offline)

Ignore
1545156894
Reply with quote  #2

1545156894
Report to moderator
You can see the statistics of your reports to moderators on the "Report to moderator" pages.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
FineHash
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
March 04, 2016, 05:58:43 AM
 #302

Sure. I can take a look at it. No guarantees I can get it working, but the fact that pressure on the chip makes differences is something worth taking a deeper look into. I'll PM you information, working on getting a few things out today.


Thank you.  Immensely appreciated.  Worst case scenario, you send it back to me and it gets introduced to fireworks.   Smiley

Bump question:
Anyone found a source for huge .6mm pitch reballing stencils?
lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 04, 2016, 08:09:22 PM
 #303

Sent out the unit with 3/4 dies restored, that's nice. It's actually not that bad, you can run the 3 dies to 60mh total without drawing 200w from the supply so you still won't blow up anything as long as the DC-DC temps are reasonable.

And doing it for bitcoin works, it's fast, reliable, works across borders, etc. Maybe I should also accept litecoin...

Now to spend the weekend tackling these two other Titans. I'm going to try to clear the short on the first one after seeing if there is any way I can disconnect the line from the bottom of the board. The onto replacing all the signal components on this second board to see if I can get more dies working (it's one die and minimal quality at that. We'll see).

C
lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 05, 2016, 02:03:02 AM
 #304

So in the meantime I figured out two other things:

1) How to fix a completely wedged Beaglebone: The normal recovery image doesn't work, the thing does nothing but light the one blue LED and you feel sad. Solution is to build a beaglebone/debian image, set it to flash to the emmc. It will pound for 30 mins with the cylon light pattern, then go solid. Reboot, then install the recovery image and restart. Presto, emmc is back, beaglebone is up, miller time!

2) Fix a display. If you have a display that's crapped out but lights check the little connectors on the bottom that go to the board. On this one it turned out that the connectors had literally not been soldered down properly to the board's pads. A little work with flux first then a soldering iron at 700f (something C) and a quick touch to each pad reflows the solder and makes for a happy display.

On to other stuff.
edgar
Legendary
*
Offline Offline

Activity: 1722
Merit: 1001


View Profile
March 05, 2016, 03:53:26 AM
 #305

Impressive!!
lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 05, 2016, 05:33:36 PM
 #306

Interesting development on the pin 6 short thing this morning.

Got another board, this time it's a Neptune that had a destroyed power plug that someone repaired... poorly. The cube does not run. So as a part of my inspection I checked the resistance of the pins.

Pin 6 is VERY unusual: It is supposed to read 10k or so ohms, Titans that have had power plug issues can read 0 ohms (kills the controller board). This one doesn't read 0 ohms.

It reads 17 ohms. *VERY* *VERY* interesting. Every time I have seen this, it's been on a unit with a burned power connector. Not all burns do this, but I haven't found a unit that wasn't in a burned situation with a short on those lines.

So I take a chance, clean up the plug solder connections (cold solder joints do not conduct properly, you NEED board heat to do this...) and plugged it in. Get this at 100mhz clock:

KNC 0a:       |   0.0/  0.0/  0.0 h/s | A:  0 R:0+0(none) HW:  0/none
KNC 0b:       | 34.42/36.08/36.08Gh/s | A:  4 R:0+0(none) HW:  1/.02%
KNC 0c:       | OFF  /  0.0/  0.0 h/s | A:  0 R:0+0(none) HW:  0/none
KNC 0d:       |  7.70/14.28/128.5Mh/s | A:  0 R:0+0(none) HW:128/ 88%

0b is running, 0d is trying to run but wrecked, but look at 0c. It's "off".

I think the short here is in cube die 0c. I think in this case the short isn't to ground, it's to the chip engine's +vcc so chip control is shorting to chip engine power. And I think this is happening on die "C"....

Now to rip apart anything around die C to see if I can clear this. Finally I have a location on the chip, I can't get this with Titans because they short the controller. But in this case I have a chance to find this.

Food time.

(Edit: Look at this insanity at 200mhz)
 KNC 0a:       |   0.0/  0.0/  0.0 h/s | A:0 R:0+0(none) HW:   0/none
 KNC 0b:       | 114.2/173.7/346.7Gh/s | A:0 R:0+0(none) HW:4750/ 59%
 KNC 0c:       | OFF  /  0.0/  0.0 h/s | A:0 R:0+0(none) HW:   0/none
 KNC 0d:       | OFF  /  0.0/  0.0 h/s | A:0 R:0+0(none) HW:   0/none

It was at 400+gh for a second there. All through one die? Nope, short.....
lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 05, 2016, 06:16:32 PM
 #307

Close. I can see how to clear the lines for the pin 6 connection to the chip. There's a reason why I can't blow this open with voltage, it's eight connections into the chip off the transfer line. Given the chance it will destroy the board transfer lines before the chip core opens.

Unfortunately it's a one-way trip to clear the lines so I need to be able to find the *right* die. Back to work.

lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 05, 2016, 09:22:36 PM
 #308

Question: Does anyone know how many engines are in each core of a Titan? Got a theory here....
Maxumark
Full Member
***
Offline Offline

Activity: 139
Merit: 104



View Profile
March 05, 2016, 09:58:35 PM
 #309

Question: Does anyone know how many engines are in each core of a Titan? Got a theory here....

Each miner will include four ASICs featuring a "record-breaking" 2,284 cores each. Each chip will be capable of running 18,727 threads.

Is this the information you are looking for?

Mining LTC and other alts since 2014 when I thought I missed the BTC train.
lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 06, 2016, 01:25:16 AM
 #310

Question: Does anyone know how many engines are in each core of a Titan? Got a theory here....

Each miner will include four ASICs featuring a "record-breaking" 2,284 cores each. Each chip will be capable of running 18,727 threads.

Is this the information you are looking for?
That's interesting. 571 (2284/4) is a prime number.

Why would they ever have a prime number of cores per die?

Hm.

Ok, here's the thought: Pin 6 is power to the chip's housekeeping circuits. Each die has capacitors on the bottom that go to ground and to this common line 6. The line comes out to a pad on the bottom, then goes via two small traces to two pins on the chip. There are 5 pads per die, the outside two power two pins each, and the inside three power 5 pins together making 9 connections to the chip.

This is different from the actual engine power, which is supplied by the big DC-DC's. Think of this power as what runs the signal thing that polls the engines, puts work in the engine memory, that sort of thing. Each housekeeper is responsible for some number of engines, and they all then communicate on the low voltage spi bus which goes back to the fpga for figuring out.

What I think happens is that when the chip shorts due to a bad surge everything welds to ground on that die. die power, as well as the engine power. This is what we see when a board burns. Worse is if one board loses its' ground, then the ground gets pulled over the lines from the other chips which makes a real mess of either burned ribbon cables or blown dies.

So what to do?

Well, isolating the dead die is the right thing. Since there exists no stencil on earth for us to re-ball the die and remove the balls for the blown connections (the simple solution) we have to cut the lines.

The common line for those dies is very thick and runs all along the back of the board. However with a very sharp knife we should be able to cut the 9 traces to the pin vias. I'm going to need to order some very sharp surgical knives with fine points; copper dulls steel quickly and this will require very precise cuts so as not to hit the layer underneath.

Hm.
lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 06, 2016, 01:27:09 AM
 #311

And while 571 is prime, 63*9 is 567 which leaves four dies left over. So they could have 9 strings of 63 per die. Hm And once again it would REALLY be helpful to have ANY specs on these things....
junkspam
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
March 07, 2016, 09:38:27 PM
 #312

Can this neptune be fixed? It worked perfectly until 1 week ago.

Can new custom firmware help?
What voltage/frequency is best to use for long life?
Do I start with lowering the voltage or frequency to get temps down? I have no clue...
is temp 70 ok for DC/DC?

ASIC 2 - die 2 dead? Possible to revive?

DC/DC   Voltage (V)   Current (A)   Power (W)   Temperature (°C)
0   0.8093   6.4766   5.242   35.875
1   0.8104   6.3125   5.116   35.875
2   0.7855   35.7500   28.082   48.375
3   0.7881   35.8125   28.224   47.375
4   0.8131   1.3066   1.062   30.594
5   0.8113   0.8281   0.672   32.313
6   0.7871   35.3125   27.794   45.000
7   0.7864   35.1250   27.622   46.125


ASIC 4 - OK?

DC/DC   Voltage (V)   Current (A)   Power (W)   Temperature (°C)
0   0.7866   34.3125   26.990   56.688
1   0.7839   34.8750   27.339   58.125
2   0.7797   44.8750   34.989   68.250
3   0.7802   44.0625   34.378   69.000
4   0.7792   43.8125   34.139   61.688
5   0.7775   44.2500   34.404   66.375
6   0.7780   42.9375   33.405   66.500
7   0.7803   43.0000   33.553   63.438


ASIC 6 - all dies dead? What could be wrong?

DC/DC   Voltage (V)   Current (A)   Power (W)   Temperature (°C)
0   0.8250   0.8779   0.724   24.688
1   0.8257   0.8369   0.691   24.000
2   0.8251   0.8252   0.681   24.031
3   0.8226   1.5840   1.303   23.156
4   0.8243   0.8311   0.685   23.031
5   0.8254   1.0566   0.872   25.563
6   0.8246   1.7578   1.449   24.313
7   0.8231   1.2168   1.002   23.969
lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 09, 2016, 01:52:32 AM
 #313

Try running all dies at 50mhz, then 100mhz. See what happens then.

lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 09, 2016, 01:59:33 AM
 #314

And in the "Take some pictures of what's up" mode, we have the following:


This is the connector on a display module that had some of the pins not soldered properly so it worked intermittently. Notice the tilt. Bit of love with a soldering iron and some flux and all is well again.


A bad repair job that was sent to me. Note the bottom right pin on the molex and how it is distended. That via is blown clear of the board, so this board would only use 2 +12v pins. Note the black mess. That is due to using a big soldering iron without pre-heat. Sad. However this is the one where the chip core is shorted to +hashing power instead of fucking ground, so there is a chance of a snowball in hell that I can figure this out.


A nice old Jupiter controller board retrofitted with two new 10 pin plugs to even it out. Take your time, desolder all the holes, then use hot air to allow the plugs to come through without pressure.


Yet another reason not to push cubes to the limit. That is a stock KNC splitter, the problem is the damn plug isn't rated for that kind of power draw....

Now I have a Titan controller on the bench that blew up. Oddly enough looks to be the Rpi went bad as the controller baseboard is happily hashing with a bbb board on it.
lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 09, 2016, 02:31:27 AM
 #315

Well, here is how the Titan bridge boards blow up:



I can fix that More to the point the failure chain is this:

Raspberry Pi shorts out it's internal DC-DC converter circuits.

Results in high resistance on +5/gnd on pins 1/2, but as soon as power is applied the board shorts as it tries to bring up the low voltage supplies.

Big +5 supply from the controller board shorts, blows up via line.

Fixable.
FineHash
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
March 09, 2016, 03:13:27 AM
 #316

One of my bridge boards did this, exactly as you described.  I've got a new (identical) replacement pi.  Would you say that the copper line repair is more of a novice or advanced job?  One of my copper lines seems to have survived, the other smoked.
lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 09, 2016, 03:27:08 AM
 #317

One of my bridge boards did this, exactly as you described.  I've got a new (identical) replacement pi.  Would you say that the copper line repair is more of a novice or advanced job?  One of my copper lines seems to have survived, the other smoked.
Eh, not that bad. To do it right you need to remove one of the sockets with air heat, then clear the fault and run a nice 24 gauge wire-wrap wire between that post and pin 3 on the bottom pin set. But you could just glue down the bad bits and jumper directly with said wire if you're in a hurry. And who isn't in a hurry.

But you need to keep that burned line secured with glue or something in the center of the connector, otherwise it is going to short to a gpio line and all hell *will* break loose.
Sweminer777
Hero Member
*****
Offline Offline

Activity: 656
Merit: 500

Because your are good, you are treated bad."Jebus"


View Profile
March 09, 2016, 04:50:37 PM
 #318

How can i flash Firmware from 1.06 to 1.00.

The Current Neptune firmware persist everytime.


How can i can flash it via SSH ?. "downgrade" for jupiter.

16evXY9azcbYtLdKCAn1SeNN7gGxcRGRi6

Save the Chickens!
lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 10, 2016, 02:35:31 AM
 #319

Not sure. Why would you want to go to 1.00? 1.06 runs fine.
lightfoot
Legendary
*
Offline Offline

Activity: 1694
Merit: 1052


I fix broken miners. And make holes in teeth :-)


View Profile
March 13, 2016, 03:45:22 AM
 #320

Had to rebuild another board with a burned +12 supply line. One pin was gone, one pin had blown the inside open, the third pin had 20 ohms of resistance. Not too great.

Up and running right now at 200mhz, purrs along. Tomorrow the new Rpi comes in, so we can test if the transfer boards can be fixed. Should be do-able, just need to find my wire-wrap wire.....

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [16] 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!