Bitcoin Forum
May 07, 2024, 10:06:30 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2] 3 »  All
  Print  
Author Topic: Im convinced mining or some settings while mining destroys 5970's VRAM... =*(  (Read 4733 times)
Hippie Tech
aka Amenstop
Legendary
*
Offline Offline

Activity: 1624
Merit: 1001


All cryptos are FIAT digital currency. Do not use.


View Profile WWW
December 21, 2012, 06:55:54 AM
 #21

Its not the mining or the settings. Its the heat.. x2.. lol

There is also a good chance that either the sensors are off and/or there are parts of the gpu that have no sensor and are much hotter than any reading you see.

I've called AMD and MSI tech support and asked if underclocking the memory was harmful. Both said no. Smiley

Happy end of the Mayan calander ! pEACe

1715076390
Hero Member
*
Offline Offline

Posts: 1715076390

View Profile Personal Message (Offline)

Ignore
1715076390
Reply with quote  #2

1715076390
Report to moderator
1715076390
Hero Member
*
Offline Offline

Posts: 1715076390

View Profile Personal Message (Offline)

Ignore
1715076390
Reply with quote  #2

1715076390
Report to moderator
"Your bitcoin is secured in a way that is physically impossible for others to access, no matter for what reason, no matter how good the excuse, no matter a majority of miners, no matter what." -- Greg Maxwell
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
zvs
Legendary
*
Offline Offline

Activity: 1680
Merit: 1000


https://web.archive.org/web/*/nogleg.com


View Profile WWW
December 23, 2012, 02:51:04 PM
 #22

do we have to worry about nvidia cards too :O  Cry Shocked

Well so far Ive only seen this behaviour in 5970's .. my 5850's & 5830 seem to be fine still, as well as my 6950.
Who knows bout nv cards =P

yeah, same.  i only have two 5970s left.  not sure about the second one, but the first one artifices all over the place.  it'll mine fine, but i wouldn't want to do anything else on it.  i  actually reduced the res mode on that monitor to make it less irritating

i've never had a problem with 5830's or 5870's being 'destroyed' like that

oh, i run mine at 205 memory also
niko
Hero Member
*****
Offline Offline

Activity: 756
Merit: 501


There is more to Bitcoin than bitcoins.


View Profile
December 23, 2012, 04:45:56 PM
 #23

I used to mine on 5830s, all of them ended up doing this during normal (non-mining) use:



Temperature when mining was typically 62-68C. Sometime they would be ok for days or weeks, then go crazy with checkerboard artifacts. Hardware acceleration in Firefox or 3D applications typically make things worse. ATI driver crashes, etc.

Not sure how reliable it is, but MemtestCL reports

Code:
Test summary:
-----------------------------------------
50 iterations over 128 MiB of memory on device Cypress
      Moving inversions (ones and zeros): 0 failed iterations
                                         (0 total incorrect bits)
                 Memtest86 walking 8-bit: 0 failed iterations
                                         (0 total incorrect bits)
              True walking zeros (8-bit): 0 failed iterations
                                         (0 total incorrect bits)
               True walking ones (8-bit): 0 failed iterations
                                         (0 total incorrect bits)
              Moving inversions (random): 0 failed iterations
                                         (0 total incorrect bits)
             True walking zeros (32-bit): 0 failed iterations
                                         (0 total incorrect bits)
              True walking ones (32-bit): 0 failed iterations
                                         (0 total incorrect bits)
                           Random blocks: 3 failed iterations
                                         (2961 total incorrect bits)
                     Memtest86 Modulo-20: 0 failed iterations
                                         (0 total incorrect bits)
                           Integer logic: 0 failed iterations
                                         (0 total incorrect bits)
                 Integer logic (4 loops): 0 failed iterations
                                         (0 total incorrect bits)
            Integer logic (local memory): 0 failed iterations
                                         (0 total incorrect bits)
   Integer logic (4 loops, local memory): 0 failed iterations
                                         (0 total incorrect bits)
Final error count: 3 test iterations with at least one error; 2961 errors total

and stuff like

Code:
Error at [3864886C]: must be 00000004, but found 04000004 (bits: 00000100000000000000000000000000)
Error at [0009EB60]: must be 00000100, but found 44060100 (bits: 01000100000001100000000000000000)
Error at [0009EB64]: must be 00000100, but found 12100100 (bits: 00010010000100000000000000000000)
Error at [0009EB68]: must be 00000100, but found 22060100 (bits: 00100010000001100000000000000000)
Error at [0009EB6C]: must be 00000100, but found 74020100 (bits: 01110100000000100000000000000000)

I've sold locally all but one card, and never heard back from the buyers, even though the deal was to hold onto their money for a week until they check if cards work in their systems. Is it VRAM? Is it something about my motherboard or power supply? My system RAM tests ok.  No idea. I'll RMA this card and see what they find out.


They're there, in their room.
Your mining rig is on fire, yet you're very calm.
Hippie Tech
aka Amenstop
Legendary
*
Offline Offline

Activity: 1624
Merit: 1001


All cryptos are FIAT digital currency. Do not use.


View Profile WWW
December 24, 2012, 04:40:51 AM
 #24

What method are you guys using to downclock the gpu's ram ?

With voltage control/monitoring enabled, I've noticed that MSI Afterburner does not set the volts to the number I dialed in.

Use HWmonitor, GPUz and/or Speedfan to monitor the gpus volts to see if they are where they should be under load.

crazyates
Legendary
*
Offline Offline

Activity: 952
Merit: 1000



View Profile
December 24, 2012, 06:09:18 AM
 #25

I always used CGMiner to underclock the RAM, as it works great on 5xxx cards. I always kept it at the stock 300MHz tho, and I never had any problems.

The idle speeds for those cards was 150MHz core, and 300MHz mem. I know it doesn't make any sense, but I never liked going lower than the stock minimum, so I kept it there. Worked well for me, I guess, as I never had any cards go bad.

Tips? 1crazy8pMqgwJ7tX7ZPZmyPwFbc6xZKM9
Previous Trade History - Sale Thread
Gatorhex
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
December 26, 2012, 10:41:18 PM
 #26

Quote
had to replace a few fans, but no other issue

Are your fans blowing down into the cards? No air-flow over the board components would cause this.
Other than that, you can get mini RAM heatsinks off ebay.

Also happens to cards with water blocks...

Video card water block and copper RAM heatsinks
http://www.youtube.com/watch?v=VXJ0u1J9mRU
witherworth
Full Member
***
Offline Offline

Activity: 155
Merit: 100


View Profile
December 27, 2012, 02:02:18 AM
 #27

do we have to worry about nvidia cards too :O  Cry Shocked

Well so far Ive only seen this behaviour in 5970's .. my 5850's & 5830 seem to be fine still, as well as my 6950.
Who knows bout nv cards =P

I run 14 5850 cards, and not a single one of them has had video/display issues. 2 of those I use for active gaming while mining. (too lazy to turn off mining when I want to play games, so they've got really low priorities set) I'm selling off a few of the cards to more gamers, so I guess they'll let me know if they've found issues through prolonged gaming sessions.
Littleshop
Legendary
*
Offline Offline

Activity: 1386
Merit: 1003



View Profile WWW
December 27, 2012, 11:46:57 AM
 #28

do we have to worry about nvidia cards too :O  Cry Shocked

Well so far Ive only seen this behaviour in 5970's .. my 5850's & 5830 seem to be fine still, as well as my 6950.
Who knows bout nv cards =P

I run 14 5850 cards, and not a single one of them has had video/display issues. 2 of those I use for active gaming while mining. (too lazy to turn off mining when I want to play games, so they've got really low priorities set) I'm selling off a few of the cards to more gamers, so I guess they'll let me know if they've found issues through prolonged gaming sessions.

I can now after a long period agree with the OP.  I run all kinds of cards and only the 5970's have done this.  I have pretty conservative settings, underclocked on most and undervolted.  Cards now fail on stock settings, fans are good.  I am going to attempt to re-pad and replace thermal compound. 

Hippie Tech
aka Amenstop
Legendary
*
Offline Offline

Activity: 1624
Merit: 1001


All cryptos are FIAT digital currency. Do not use.


View Profile WWW
December 27, 2012, 05:15:29 PM
 #29

This type of hardware failure is to be expected when you consider how hot the vrms run on the HD 59/5800 series.

Anything over 65C will severely shorten your gear's lifespan. It is a complete farse that the parts makers want us to believe that 70C+ is acceptable.

420
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500



View Profile
December 27, 2012, 08:51:18 PM
 #30

Anyone mine LTC and get the same result?

Donations: 1JVhKjUKSjBd7fPXQJsBs5P3Yphk38AqPr - TIPS
the hacks, the hacks, secure your bits!
hardcore-fs
Full Member
***
Offline Offline

Activity: 196
Merit: 100


View Profile WWW
December 28, 2012, 12:37:37 AM
 #31

Like I said, the damage is not seen in mining btc. It shows up when trying to do gaming =/

note to self: may be limited to bitcointalk.org marketplace when re-selling 5970's when mining with them becomes unprofitable

Run them at lower speeds and voltages and they will be fine even for gaming so long as they run in a hot environment and you keep the fans working.

Im thinking the lower speeds + the constant heat may be whats killing the RAM... I dont know, its just weird and really sucks.
Ive tested my other 2 5970's and their RAM is still good. So, thats 2 thats been bad and 2 thats been good - for me.

Ive never, NEVER, overvolted them either. Ran them stock or lower voltage.

did you follow Anti-static procedures/ power procedures when installing/removing them?

BTC:1PCTzvkZUFuUF7DA6aMEVjBUUp35wN5JtF
Eastwind
Hero Member
*****
Offline Offline

Activity: 896
Merit: 1000



View Profile
May 27, 2013, 02:26:16 PM
 #32

Anyone mine LTC and get the same result?

My 5970 works on BTC, but not on LTC.
crashoveride54902
Hero Member
*****
Offline Offline

Activity: 784
Merit: 504


Dream become broken often


View Profile
May 28, 2013, 01:37:52 AM
 #33

I've bought 2 5970 cards off ebay with the knowledge that they were already artifacting...he said he mined on them and they wouldn't hold up anymore...so i got them n put them in a bamt box...1 card mined for about a month before it was throwing off bamt so much i had to remove it...other card just kept chugging away...for giggles i put it in a machine and sure enough...couldn't make out anything on the screen trying to play a game...bought them around the new years i think...

well finally couple days ago i noticed my bamt box was acting funny...restarts...hangs...took out the other 5970 and it works like a charm again Sad but least i got some life outta them before they bit the dust...I don't think any video card is meant to be run 24/7 and with the 5970 so thats 2 vid cards crammed into a tiny space...just like ppl that laptop mine kill their laptops...just ain't meant to be abused like that...oh well...time to put them up on ebay n hope they sell for a decent price Cheesy

Dreams of cyprto solving everything is slowly slipping away...Replaced by scams/hacks Sad
niko
Hero Member
*****
Offline Offline

Activity: 756
Merit: 501


There is more to Bitcoin than bitcoins.


View Profile
May 28, 2013, 02:08:56 AM
 #34

did you follow Anti-static procedures/ power procedures when installing/removing them?
This may very well explain the syndrome. Periodic removal for cleaning without careful ESD management will eventually lead to damage. 

They're there, in their room.
Your mining rig is on fire, yet you're very calm.
ISAWHIM
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500



View Profile
May 28, 2013, 02:24:56 AM
 #35

Undervolting can "potentially" cause issues, because...

Lower voltage = higher amps @ "any wattage"
Regulators direct voltage to "dump" it as heat. The higher the voltage, the less it is "dumping", thus, cooler regulators.

However... using any card for "x-hours" is what ultimately kills it. When you use it in a game-system, you are using it intermittently. Thus, it may take three years to get one years worth of "circuit-wear" on the board. As opposed to using it "constant duty cycle", where a year of wear happens in 9-12 months.

Also, these are CMT chips. They use small amounts of "low lead" solder-paste, to mount the chips to the board. This type of solder is prone to "shearing", or "pulling-off", from cold-shock cycling. Thus, try applying a SMT solder-temp heat source to the chip to "rebond" the "cold-welds" that have pulled off the contacts. (That is the "x-box" oven trick. However, do NOT put these in your oven. Not all the components are "oven safe". The boards are later populated with non SMT components that will be destroyed if you attempt to oven-bake them.)

Also, as stated above... the heat from the GPU, is stupidly spread to the VRM's by the heat-sink, and the hot air is also blown directly onto all the components of the board. That will degrade capacitors, resistors, and any other non-sunk transistor that normally would not be exposed to these higher and constant heat levels.

Also, there is the issue of "bios corruption". Bios flashes are not that great. They still use cheap programmable memory, which decays in heat. (Thus the use of many "removable" and "reflashable" bios roms.) If the bios is corrupt, it can make it seem as if portions of the card are not functioning correctly, when they are perfectly fine.

However, this is the first time I have ever heard (with serious conviction), of "one card" having this issue. So, it may be realistically related to a design fault, or physical stress fault. As it has been confirmed over and over, that "mining", does not "cause damage". You don't have that much actual control over the cards components. With the exception of physical failures, which would kill an unmodified card, without any discrimination to user settings.

If there was a 100% "yes, this killed my card", then I would worry. However, I have barely seen 5% with this issue, which is actually indicative of "manufacturing error" and "physical hardware failure". Completely unrelated to "mining", other than the issue about "constant duty" and "cold shock", which happens when you play a game for 4-20 hours in a row also.

Not to mention, you have no idea if the cards had a previous bios-mod, or "driver glitches", that could have contributed to the shortness of life. (On anything other than the cards you purchased from the store directly.)

The number one killer of video-cards, is running them trapped inside of a heat-trap box, called a CPU tower. (That and water-coolers, which do not adequately cool the "rest of the components", and contribute to heat-stress-shearing, which fans reduce by keeping everything a constant temperature.)
ReCat
Sr. Member
****
Offline Offline

Activity: 406
Merit: 250



View Profile WWW
May 28, 2013, 03:57:11 AM
 #36

I think the lesson learned here is: a hot GPU is a dying GPU, Keep your GPU as cool as freaking possible by all means. Even if it doesn't seem to be running too hot.

BTC: 1recatirpHBjR9sxgabB3RDtM6TgntYUW
Hold onto what you love with all your might, Because you can never know when - Oh. What you love is now gone.
420
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500



View Profile
June 04, 2013, 06:26:10 AM
 #37

I think the lesson learned here is: a hot GPU is a dying GPU, Keep your GPU as cool as freaking possible by all means. Even if it doesn't seem to be running too hot.

how hot is too hot

Donations: 1JVhKjUKSjBd7fPXQJsBs5P3Yphk38AqPr - TIPS
the hacks, the hacks, secure your bits!
Eastwind
Hero Member
*****
Offline Offline

Activity: 896
Merit: 1000



View Profile
June 04, 2013, 06:31:37 AM
 #38

Undervolting can "potentially" cause issues, because...

Lower voltage = higher amps @ "any wattage"
Regulators direct voltage to "dump" it as heat. The higher the voltage, the less it is "dumping", thus, cooler regulators.


The wattage is determined by voltage. The lower the voltage, the lower the wattage. I agree with other parts of your arguments.
HellDiverUK
Hero Member
*****
Offline Offline

Activity: 1246
Merit: 501



View Profile
June 04, 2013, 01:13:04 PM
 #39

As I said in another thread, my 5870 finally gave up yesterday.  It was an old, used Powercolor I got cheap off eBay.  I had to replace the TIM on it when I got it, as the person who owned it before had used some crap "silver" compound that basically ran off when it was heated.

After about 4 months of solid mining, I noticed the temp going up to 92C.  Checked the video output and it was suffering massive corruption.  Recleaned the TIM, made sure everything was 100%, but it still went to 92C almost instantly.  When cgminer stopped mining, temps dropped quickly, proving the heat sink was working OK.

GenTarkin (OP)
Legendary
*
Offline Offline

Activity: 2450
Merit: 1002


View Profile
June 04, 2013, 02:57:09 PM
 #40

As I said in another thread, my 5870 finally gave up yesterday.  It was an old, used Powercolor I got cheap off eBay.  I had to replace the TIM on it when I got it, as the person who owned it before had used some crap "silver" compound that basically ran off when it was heated.

After about 4 months of solid mining, I noticed the temp going up to 92C.  Checked the video output and it was suffering massive corruption.  Recleaned the TIM, made sure everything was 100%, but it still went to 92C almost instantly.  When cgminer stopped mining, temps dropped quickly, proving the heat sink was working OK.



Wow! now, Ive never seen that type of behaviour when a GPU / video card "dies" .. that suxxors!
As of a few months ago, I managed to get that severely damaged GPU workin somewhat reliably again on the 5970 - I lowered the threads count in cgminer to 1 and that GPU now only gets HW errors 10% of the time on avg rather than 50% of the time(think cuz its using slightly less RAM on one thread)
Also, my other 5970 which I thought still had good RAM is getting faulty now, had to lower its RAM from 300mhz after 2 years of mining down to 200mhz to prevent getting HW errors =( on one of its GPU's...

So, the official failure rate for my 5970's - 100% and these are on temps that are way more than acceptable(VRMs consistently below 90C, all other temps below 70c) NEVER overvolted, always undervolted GPU.
As for my 5800's - in my period of owning they have not deteriorated at all.

GenTarkin's MOD Kncminer Titan custom firmware! v1.0.4! -- !!NO LONGER AVAILABLE!!
Donations: bitcoin- 1Px71mWNQNKW19xuARqrmnbcem1dXqJ3At || litecoin- LYXrLis3ik6TRn8tdvzAyJ264DRvwYVeEw
Pages: « 1 [2] 3 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!