Bitcoin Forum
November 15, 2024, 05:29:35 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: 3 of 8 Rigs down - GPU Fault Detected 147. HELP PLEASE!!!  (Read 347 times)
dsomc6 (OP)
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
July 11, 2018, 02:45:49 AM
 #1

Hi guys,

Need some help - have two rigs down, and another friends down. All went down around the same time - all with same issue GPU Fault Detected 147 0x03ca8802, GPU Fault Detected 147 0xof824802, etc.

Have tried the following:

1. Format USB drive, and refresh SMOS
2. Flash original bios back to GPU
3. Tried loading single card on PC with fresh SMOS stick

All with same results.......Anyone know a solution or did I just lose 18 cards to some kind of surge?

Thanks in advance for the help!
Metroid
Sr. Member
****
Offline Offline

Activity: 2142
Merit: 353


Xtreme Monster


View Profile
July 11, 2018, 05:05:11 AM
 #2

And they do not honor the warranty if there was a surge, mining is extremely dangerous, you can theoretically lose it all if a short circuit happens in your house, even if you have all the protection x everything, still possible to fry everything, my friend has an amazing ground line protection and yet is not 100% fryproof.

Try the gpu on a windows pc.

BTC Address: 1DH4ok85VdFAe47fSVXNVctxkFhUv4ujbR
dsomc6 (OP)
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
July 11, 2018, 03:52:56 PM
 #3

And now a 4th down with same issue.......all have been running for months, no issues.

Same problems - GPU Fault Detected. Do I have bad PSU's? Faults driven by power distribution issues?
AIO Inc
Copper Member
Jr. Member
*
Offline Offline

Activity: 62
Merit: 2

AIOMiner.com


View Profile WWW
July 11, 2018, 04:01:02 PM
 #4

And now a 4th down with same issue.......all have been running for months, no issues.

Same problems - GPU Fault Detected. Do I have bad PSU's? Faults driven by power distribution issues?

How long have they been up and running previously? Or are you trying to get this set up? It's unlikely that you destroyed all of those GPUs at the same time with a power surge as your PSU would be the first thing to short out before it gets to everything else.
dsomc6 (OP)
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
July 11, 2018, 05:07:07 PM
 #5

All of the rigs had been working flawlessly for over 6 months at this point. Had three go down yesterday, all with the GPU Fault 147 issue. Now have another one down today with same issue......

Would be weird to lose 4 PSU's or all 28 GPU's within 24 hours right?
damba_
Jr. Member
*
Offline Offline

Activity: 150
Merit: 3


View Profile
July 11, 2018, 06:22:10 PM
 #6

I found this video explaining same error you are getting https://www.youtube.com/watch?v=lflY1BzE5JY
He said he changed core clock frequency and problem was solved.
This guy said he solved his problem by updating ethminer https://community.amd.com/thread/203158
You can try this two solutions, maybe it helps you.
Does this error appears as soon as you start the miner or it appears after some time?
NetopyrMan2
Newbie
*
Offline Offline

Activity: 11
Merit: 0


View Profile
July 11, 2018, 06:57:13 PM
 #7

this is realy wierd

i know the issue of gpu lost error ... it seems like you have similar problem at all .... but at 28x at "same time" ?

did you try to use NVSMI? its just monitoring tool for gpu (its easy .bat) ... if this tool writes gpu lost, you can only rma it and HOPE they WILL accept RMA ...
and there is still a problem: i tried to check those bad gpu in common test like Furmark, Heaven Bench, 3D mark (here i need to mention that you need to setup defaults clocks and tdp!!! cz seller can send you to hell when they discover if the gpu was OC and/or under/overvolted) ... all of these programs ran without any error or artefacts ...

the only solution for "subscribing an error" and hope for a positive RMA, was to subscribe as something, they are "unable" to test like:"through AI testing with heavy using of CUDA cores makes an error  GPU Lost" and i think you can use it those words for "positive rma" on all cards at once ... this procedure i made about 3 weeks back, waiting for RMA (it will be in another week)

but still check PSUs and check those bad gpus in any pc with furmark etc ... if those cards will makes artefacts or makes similar issues, then rma in common way (but dont forget to change clocks and tdp to default)
Xazax310
Member
**
Offline Offline

Activity: 246
Merit: 24


View Profile
July 11, 2018, 07:50:25 PM
 #8

Need to apply basic trouble shooting with your rigs, Take one rig getting that GPU FAULT error (Assuming HiveOS or LINUX?) and test the basics, Integrated Graphics, RAM/MOBO etc no GPUs and if it boots, then move on Add 1 GPU see if you get the error and so on. IMHO it seems like you may be having Risers Issues/faults.
swogerino
Legendary
*
Offline Offline

Activity: 3346
Merit: 1248


Bitcoin Casino Est. 2013


View Profile
July 11, 2018, 07:57:09 PM
 #9

Need to apply basic trouble shooting with your rigs, Take one rig getting that GPU FAULT error (Assuming HiveOS or LINUX?) and test the basics, Integrated Graphics, RAM/MOBO etc no GPUs and if it boots, then move on Add 1 GPU see if you get the error and so on. IMHO it seems like you may be having Risers Issues/faults.

I think it is totally riser fault. I had one rig of some of my colleagues which I am managing and after checking the risers, 007c was the versions they were using, I saw strange behaviour, some of the GPU-s couldn't mine, Nicehash ended always with a terminated and not being able to benchmark.

At first I thought a bad bios, or a card fault but after testing everything , decided to try changing the risers to the latest version and after that, everything went back to normal.

███▄▀██▄▄
░░▄████▄▀████ ▄▄▄
░░████▄▄▄▄░░█▀▀
███ ██████▄▄▀█▌
░▄░░███▀████
░▐█░░███░██▄▄
░░▄▀░████▄▄▄▀█
░█░▄███▀████ ▐█
▀▄▄███▀▄██▄
░░▄██▌░░██▀
░▐█▀████ ▀██
░░█▌██████ ▀▀██▄
░░▀███
▄▄██▀▄███
▄▄▄████▀▄████▄░░
▀▀█░░▄▄▄▄████░░
▐█▀▄▄█████████
████▀███░░▄░
▄▄██░███░░█▌░
█▀▄▄▄████░▀▄░░
█▌████▀███▄░█░
▄██▄▀███▄▄▀
▀██░░▐██▄░░
██▀████▀█▌░
▄██▀▀██████▐█░░
███▀░░
deskless
Jr. Member
*
Offline Offline

Activity: 279
Merit: 1


View Profile
July 11, 2018, 09:48:18 PM
 #10

Check air circulation. Half of my GPUs are down due to that reason.
bitrpc.com
Newbie
*
Offline Offline

Activity: 6
Merit: 0


View Profile
July 11, 2018, 10:21:44 PM
 #11

What OS are you using and what mining software?
dsomc6 (OP)
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
July 11, 2018, 10:38:36 PM
 #12

SMOS. So think the problem is solved - needed to move up a few versions of Claymore.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!