Bitcoin Forum
November 06, 2024, 10:05:37 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 [4] 5 6 7 »  All
  Print  
Author Topic: Nvidia GPU Mining Problems  (Read 6997 times)
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3206
Merit: 1003



View Profile
July 27, 2016, 12:13:00 PM
 #61


EDIT: I do have the temperatures of the cards set at a max.. to not exceed 79c ect.
And I use to get this  but was going to talk about it later.
Going to myr-gr  also on neoscrypt  so not related to algo but memory or ccminer? Maybe intensity setting.


Looks like a null pointer dereference. That's usually software but in your case it could be excess heat in the CPU or RAM.
How is the ventilation around the mobo? Maybe heat from the GPUs is destabilizing the CPU.

Edit: It could also be bad RAM. Make note if they are always the same, especially the instruction address.
It could be a bug in ccminer for neoscrypt lyra2v2 ect. not sure.
A fan at high speed but the temperatures were so high, room temp, that I turned off the rig at 1pm to 7pm some times...cooler temp on its way.
There's not much I can do until temps drop... it broke records.
joblo
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
July 27, 2016, 04:11:16 PM
 #62


EDIT: I do have the temperatures of the cards set at a max.. to not exceed 79c ect.
And I use to get this  but was going to talk about it later.
Going to myr-gr  also on neoscrypt  so not related to algo but memory or ccminer? Maybe intensity setting.


Looks like a null pointer dereference. That's usually software but in your case it could be excess heat in the CPU or RAM.
How is the ventilation around the mobo? Maybe heat from the GPUs is destabilizing the CPU.

Edit: It could also be bad RAM. Make note if they are always the same, especially the instruction address.
It could be a bug in ccminer for neoscrypt lyra2v2 ect. not sure.
A fan at high speed but the temperatures were so high, room temp, that I turned off the rig at 1pm to 7pm some times...cooler temp on its way.
There's not much I can do until temps drop... it broke records.

If it was a bug others would likely also see it, but AFAIK no one else has seen this crash. You're getting corruption in the CPU domain
(core, cache, ram), either due to a HW fault or heat induced. If there is a pattern to the fault addresses it's probably a HW fault.
If the're random, probably heat induced.

Due to the extremely high ambient temperature your sensors may not detect overheating in places where it isn't expected.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3206
Merit: 1003



View Profile
July 27, 2016, 07:46:31 PM
 #63


EDIT: I do have the temperatures of the cards set at a max.. to not exceed 79c ect.
And I use to get this  but was going to talk about it later.
Going to myr-gr  also on neoscrypt  so not related to algo but memory or ccminer? Maybe intensity setting.


Looks like a null pointer dereference. That's usually software but in your case it could be excess heat in the CPU or RAM.
How is the ventilation around the mobo? Maybe heat from the GPUs is destabilizing the CPU.

Edit: It could also be bad RAM. Make note if they are always the same, especially the instruction address.
It could be a bug in ccminer for neoscrypt lyra2v2 ect. not sure.
A fan at high speed but the temperatures were so high, room temp, that I turned off the rig at 1pm to 7pm some times...cooler temp on its way.
There's not much I can do until temps drop... it broke records.

If it was a bug others would likely also see it, but AFAIK no one else has seen this crash. You're getting corruption in the CPU domain
(core, cache, ram), either due to a HW fault or heat induced. If there is a pattern to the fault addresses it's probably a HW fault.
If the're random, probably heat induced.

Due to the extremely high ambient temperature your sensors may not detect overheating in places where it isn't expected.
What is AFAIK? joblo  and when that neoscrypt or lyra2v2 that it crashes on once in a great while showing this sign, some people have reported that I think... but also happens at room temp too. There maybe to many cards for windows memory to handle. But will look into what you said.
Someone did say they were having problems with those private releases.
Thx
joblo
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
July 27, 2016, 10:45:07 PM
 #64


What is AFAIK? joblo  and when that neoscrypt or lyra2v2 that it crashes on once in a great while showing this sign, some people have reported that I think... but also happens at room temp too. There maybe to many cards for windows memory to handle. But will look into what you said.
Someone did say they were having problems with those private releases.
Thx

http://www.urbandictionary.com/define.php?term=afaik

This the first time you posted this symptom, has it always crashed this way?
If you think it's the miner try a different one.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1166


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
July 28, 2016, 01:15:33 PM
 #65

AFAIK=as far as I know

If its a temperature problem, theres a system level component that's having an issue I would think.

Run a motherboard monitor and CPU monitor.... but a voltage issue could still be the case.   I hope you are figuring your power availability per rig @ 120-160% expected draw?  If I ran a 4 card machine I would be for sure running a 1600w power supply..... 

I wonder if you are drawing too much +12V off the same rail that supplies the processor and are causing this all to happen.

I have seen many strange configurations once opening up power supplies and seeing what is tapping which available rail.  Many PC power supplies have 3-4 independent +12V power supply circuits at roughly 50A each...  give or take....    As far as knowing how they are distributed...... you have to open the power supply often to know the real truth...   

You dont happen to be CPU mining at the same time are you?

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3206
Merit: 1003



View Profile
July 28, 2016, 04:25:12 PM
 #66


What is AFAIK? joblo  and when that neoscrypt or lyra2v2 that it crashes on once in a great while showing this sign, some people have reported that I think... but also happens at room temp too. There maybe to many cards for windows memory to handle. But will look into what you said.
Someone did say they were having problems with those private releases.
Thx

http://www.urbandictionary.com/define.php?term=afaik

This the first time you posted this symptom, has it always crashed this way?
If you think it's the miner try a different one.
Thx This happens once every  2 weeks unpredictable and I think it maybe the app ccminer not sure.
I just wanted to post that as one of the secondary issues for now.
Right now I'm mining a straight algo and no crashes at all... very very smooth at 93f room temp.
klondike_bar_recovery
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
July 28, 2016, 04:34:28 PM
 #67

 I hope you are figuring your power availability per rig @ 120-160% expected draw?  If I ran a 4 card machine I would be for sure running a 1600w power supply..... 

I wonder if you are drawing too much +12V off the same rail that supplies the processor and are causing this all to happen.

I have seen many strange configurations once opening up power supplies and seeing what is tapping which available rail.  Many PC power supplies have 3-4 independent +12V power supply circuits at roughly 50A each...  give or take....    As far as knowing how they are distributed...... you have to open the power supply often to know the real truth...   


I dont really agree with the above (per-se):

1) I agree with giving about 20% headroom, but a 1600W PSU for 4 cards isnt necesary (unless they are 290/390 cards or some other varient that would be drawing 300w/card). the rx480 or gtx1070 you could run 6 cards with a 1200-1300W PSU just fine.

2) most quality power supplies have a single 12V rail, and the ones wit multiple rails normally are more like 2-3 rails at 30A each (3x30Ax12V=1080W)  Your suggested 3x50Ax12V PSU would be a 1800W+ beast

3) you DONT need to (or want to) open up your PSU and start poking around. youll void the warranty, risk damage, and waste your time. any half-decent power suply will have the power rating and rail ratings marked on it and also on its packaging. If not, use google.

pretty much any PSU that is gold-rated and costs >$100 should be a single 12V rail thats rated at about 95% of the actual PSU specification.

for example, the corsair ax1200 has 1202W on a single 12V rail: http://www.corsair.com/en/professional-series-gold-ax1200-80-plus-gold-certified-fully-modular-power-supply  (click on the technical specs tab)
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3206
Merit: 1003



View Profile
July 28, 2016, 04:37:40 PM
 #68

AFAIK=as far as I know

If its a temperature problem, theres a system level component that's having an issue I would think.

Run a motherboard monitor and CPU monitor.... but a voltage issue could still be the case.   I hope you are figuring your power availability per rig @ 120-160% expected draw?  If I ran a 4 card machine I would be for sure running a 1600w power supply..... 

I wonder if you are drawing too much +12V off the same rail that supplies the processor and are causing this all to happen.

I have seen many strange configurations once opening up power supplies and seeing what is tapping which available rail.  Many PC power supplies have 3-4 independent +12V power supply circuits at roughly 50A each...  give or take....    As far as knowing how they are distributed...... you have to open the power supply often to know the real truth...   

You dont happen to be CPU mining at the same time are you?
Yes I did have that issue ... I can't add another card 970 without sli the 2 psu's 1300 watt each.
I did try taking 1 980ti off that rail and add it to the 2nd psu along with the 970.
It crashed ... but I should try the new 970 all by itself. That maybe a bad card.. not sure..I did a rma on it and it also crashed.
I think the main problem is the asrock btc 81 pro cant handle that many high end cards running windows 8.1.
And the 2nd most thing was temp spin down time changing algo's which is fixed. I think.
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3206
Merit: 1003



View Profile
July 28, 2016, 04:40:01 PM
 #69

AFAIK=as far as I know

If its a temperature problem, theres a system level component that's having an issue I would think.

Run a motherboard monitor and CPU monitor.... but a voltage issue could still be the case.   I hope you are figuring your power availability per rig @ 120-160% expected draw?  If I ran a 4 card machine I would be for sure running a 1600w power supply.....  

I wonder if you are drawing too much +12V off the same rail that supplies the processor and are causing this all to happen.

I have seen many strange configurations once opening up power supplies and seeing what is tapping which available rail.  Many PC power supplies have 3-4 independent +12V power supply circuits at roughly 50A each...  give or take....    As far as knowing how they are distributed...... you have to open the power supply often to know the real truth...    

You dont happen to be CPU mining at the same time are you?
oooo on the overloaded rail... yes I did...now fixed.
That was when the psu was shutting off.
No cpu mining at all I have to use    --cpu-affinity 1 --cpu-priority 0
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3206
Merit: 1003



View Profile
July 28, 2016, 04:45:59 PM
 #70

 I hope you are figuring your power availability per rig @ 120-160% expected draw?  If I ran a 4 card machine I would be for sure running a 1600w power supply..... 

I wonder if you are drawing too much +12V off the same rail that supplies the processor and are causing this all to happen.

I have seen many strange configurations once opening up power supplies and seeing what is tapping which available rail.  Many PC power supplies have 3-4 independent +12V power supply circuits at roughly 50A each...  give or take....    As far as knowing how they are distributed...... you have to open the power supply often to know the real truth...   


I dont really agree with the above (per-se):

1) I agree with giving about 20% headroom, but a 1600W PSU for 4 cards isnt necesary (unless they are 290/390 cards or some other varient that would be drawing 300w/card). the rx480 or gtx1070 you could run 6 cards with a 1200-1300W PSU just fine.

2) most quality power supplies have a single 12V rail, and the ones wit multiple rails normally are more like 2-3 rails at 30A each (3x30Ax12V=1080W)  Your suggested 3x50Ax12V PSU would be a 1800W+ beast

3) you DONT need to (or want to) open up your PSU and start poking around. youll void the warranty, risk damage, and waste your time. any half-decent power suply will have the power rating and rail ratings marked on it and also on its packaging. If not, use google.

pretty much any PSU that is gold-rated and costs >$100 should be a single 12V rail thats rated at about 95% of the actual PSU specification.

for example, the corsair ax1200 has 1202W on a single 12V rail: http://www.corsair.com/en/professional-series-gold-ax1200-80-plus-gold-certified-fully-modular-power-supply  (click on the technical specs tab)
When the psu was shutting off I was trying to draw 1085 watts on a continuous 1300 psu which really equals 1200 available.
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1166


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
July 29, 2016, 01:17:21 AM
 #71

When the psu was shutting off I was trying to draw 1085 watts on a continuous 1300 psu which really equals 1200 available.

Yep;  that's what I was thinking.   When you start drawing too much, internal voltage drops and amperage turns up.  This causes more heat in the components and a snowball effect ensues.

My comment about several 50A rails was literal, and is a pretty close and very realistic situation.  When I give these values, i'm not joking.... They are coming from my Coolmax 1600w.    I know the math says there's ~1920W of just 12v power available (E/I*R) cause the sticker says 110A and 50A of 12V power:  But the sticker, shows that its only allowing 1600W total from the PSU; with a max of this, and a max of that on each designated supply line (12V#1 and 12V#2 are stickered to be a max draw of 1560W even though the math of E/I*R says more).  

I have seen a few people burn down S7's and S7 power supplies because the the two sets of PCIE plug rails were only capable of 110A total when they were under the appearance of being both separate circuits;  Yet the power supply said there was another ~40A of overhead. (1560W of 12V between the two is available total if you trust whats on the sticker)... This is not an 1800W power supply;  but it allows for a total of 1600W to be drawn from it across all voltages in a combination which allows a massive; nearly 1600w of +12V usage in theory.  The problem is 12v rail 1 only supplies PCIE (110A), and 12v rail 2 only supplies the motherboard plug, CPU plugs, and accessories such as SATA power and the like. You would never know this is true unless you took it apart.  One of my coolmax PSU's are from a person that burned down an S7 drawing too much from it.   I had to open it up, and hack-job-rewire the modular plugs in the back of the PSU to bypass and cut out the old melted PCIE cable plugs and fix the remaining good ones to have a good usable PSU again.   I will also rig up some PCIE plug adaptors to utilize the FDD/HDD power ports, and the motherboard connectors as well.

To finish:
You not only have to obey the sticker, but you have to use a little extra sense when trying to figure it all out.    Don't ever assume any one thing is correct unless its totally verified Wink

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3206
Merit: 1003



View Profile
July 30, 2016, 12:40:45 PM
 #72

When the psu was shutting off I was trying to draw 1085 watts on a continuous 1300 psu which really equals 1200 available.

Yep;  that's what I was thinking.   When you start drawing too much, internal voltage drops and amperage turns up.  This causes more heat in the components and a snowball effect ensues.

My comment about several 50A rails was literal, and is a pretty close and very realistic situation.  When I give these values, i'm not joking.... They are coming from my Coolmax 1600w.    I know the math says there's ~1920W of just 12v power available (E/I*R) cause the sticker says 110A and 50A of 12V power:  But the sticker, shows that its only allowing 1600W total from the PSU; with a max of this, and a max of that on each designated supply line (12V#1 and 12V#2 are stickered to be a max draw of 1560W even though the math of E/I*R says more).  

I have seen a few people burn down S7's and S7 power supplies because the the two sets of PCIE plug rails were only capable of 110A total when they were under the appearance of being both separate circuits;  Yet the power supply said there was another ~40A of overhead. (1560W of 12V between the two is available total if you trust whats on the sticker)... This is not an 1800W power supply;  but it allows for a total of 1600W to be drawn from it across all voltages in a combination which allows a massive; nearly 1600w of +12V usage in theory.  The problem is 12v rail 1 only supplies PCIE (110A), and 12v rail 2 only supplies the motherboard plug, CPU plugs, and accessories such as SATA power and the like. You would never know this is true unless you took it apart.  One of my coolmax PSU's are from a person that burned down an S7 drawing too much from it.   I had to open it up, and hack-job-rewire the modular plugs in the back of the PSU to bypass and cut out the old melted PCIE cable plugs and fix the remaining good ones to have a good usable PSU again.   I will also rig up some PCIE plug adaptors to utilize the FDD/HDD power ports, and the motherboard connectors as well.

To finish:
You not only have to obey the sticker, but you have to use a little extra sense when trying to figure it all out.    Don't ever assume any one thing is correct unless its totally verified Wink
Yes exactly.
I even called antec about that and they said they did a test run with 3 980ti and of course supplying power to the mb ect. is the max. 12 at 50amp rail is for 1 980ti only no more. They say they have it down as drawing 39amp. Even though it doesn't.
But now the problems as far as psu have been solved having 2 of them sli. Smiley
But at the moment they are not sli.
And the extra 970gtx is still in the box.
ps Been running an algo for days now no crashes no oc'ing ..if I do oc ....crash.
Since 2 problems maybe out of the way out of maybe 6 problems I may try again sli psu and put the other 970 doing this same algo  no oc. If that mines stable then on to test the other problems.
Thx  JK
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1166


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
July 30, 2016, 08:00:10 PM
 #73

Glad to help.

Ive noticed most people reporting issues have been with the middle model cards (970, etc)... so I tend to stay away from them myself.

Its almost as if the 70 model is a batch of high end chips that all the cuda cores didnt work on or something, so they make that die work at a lower rate with less cores and not loose manufactured but failed high end product....

Cant wait to get up to SF and pick up that 960 off the shelf and get it back to work.   We took that duck 960 machine apart to build out a system for a customer and the second video card had been sitting waiting for its new home.   He planned to go 10 series on the next machine, so its not a big deal for him to trade me 2 power supplies for the 1 card.

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1166


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
July 30, 2016, 08:19:52 PM
 #74

https://www.youtube.com/watch?v=gS1hyzkVk5w

Just saw this.  Still need to watch, but maybe might be something u want to look at given the title....

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3206
Merit: 1003



View Profile
August 02, 2016, 08:48:57 PM
 #75

https://www.youtube.com/watch?v=gS1hyzkVk5w

Just saw this.  Still need to watch, but maybe might be something u want to look at given the title....
Yes I can get money I believe ... have 2 970 ... thx
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3206
Merit: 1003



View Profile
August 03, 2016, 03:06:42 PM
Last edit: August 03, 2016, 05:55:17 PM by tbearhere
 #76

Ok On my 970's g1 trying now to add the 2nd one.
msi afterburner display wont show the extra 970 but it is recognized on the control panel.
And the clocks mining in the p2 normal state are showing 1400 core clock...so something is overclocking them causing them to crash? Going to delete all monitoring software to see if the clocks go to normal.
Also going to manually set the clocks to 1178 mhz where they are supposed to be if necessary.
Even though the rig has been mining for days now and no crashes without the extra card.
2 days ago I tried the extra card in and clock default to oc'ing 1413 to high.
With the extra card it will mine for 3 minutes exactly then crashes the drivers.
Reboot and mine and 3 minutes drivers crash.
Gigabyte says to download OC GURU and try the other bios non gaming to see if that helps.
Be back asap with the results.

This is what happens sometimes on a fast reboot, it is the 970 1st card 2nd card not installed.
After rebooting a 2nd time this doesn't happen.
I think I need to take fast boot off on my asrock 81 btc pro to normal boot.

This is the first 970 crash below.



This is with 5 cards installed 5th card not recognized by windows.


 
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1166


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
August 04, 2016, 02:44:40 AM
 #77

This is the problem with installing more than one GPU type per system:

You'd have to do a nvidia driver "clean install" for the device's specific driver to fix that issue.....  Which would remove the other versions of the driver for the other cards........

Any chance on getting the 2 970's in their own box by themselves to test this?

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1166


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
August 04, 2016, 02:46:36 AM
 #78

Also:   Any special edition cards such as "OC" or "SSC" should never be overclocked.   The manufacturers have already done the legwork for us for the max reliability/stability on their card build.   Keep this well in mind.....   I never overclock a SSC or OC edition card.

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
klondike_bar_recovery
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
August 04, 2016, 04:59:50 AM
 #79

Also:   Any special edition cards such as "OC" or "SSC" should never be overclocked.   The manufacturers have already done the legwork for us for the max reliability/stability on their card build.   Keep this well in mind.....   I never overclock a SSC or OC edition card.

probably why the "non-oc" versions can easily match or beat the OC cards wen actually tuned manually. my strix cards (base model) can exceed the specs of the OC version (~8% faster for a $50 premium)
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1166


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
August 04, 2016, 05:26:49 AM
 #80

Also:   Any special edition cards such as "OC" or "SSC" should never be overclocked.   The manufacturers have already done the legwork for us for the max reliability/stability on their card build.   Keep this well in mind.....   I never overclock a SSC or OC edition card.

probably why the "non-oc" versions can easily match or beat the OC cards wen actually tuned manually. my strix cards (base model) can exceed the specs of the OC version (~8% faster for a $50 premium)


agreed.  my GTX980 is a strix.

I just added a EVGA 960 SSC 2Gb card to my machine that has the Asus GTX 980.  Still solid on lbry, went from 154Mh to 248Mh.   Happy.  Driver install was painless.  Though on the X58 motherboard for some reason I can't use the PCI port when running two video cards.   I may try stacking the cards up (not on the optimal config) in different slots to see if I can get a PCI card to work with both of these video cards.   I use this machine day in and day out and it mines 24/7 weather I play games or not.


Been pretty solid so far.  Lets see how this mix of video cards treats me in the near future.

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
Pages: « 1 2 3 [4] 5 6 7 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!