Bitcoin Forum
November 10, 2024, 04:15:37 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3] 4 5 6 7 »  All
  Print  
Author Topic: Nvidia GPU Mining Problems  (Read 6998 times)
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
July 18, 2016, 12:14:10 AM
 #41

Doing good.

tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
July 18, 2016, 12:16:57 AM
 #42

I'm wondering if it is a heat issue.  Can you run a temp monitoring program and observe the temperatures up to and during a crash?  Try also not using miner control and just mine one select coin and see if you can make 12 hours.  One card at a time, repeat process.  If you're using a scrypt like JK's, then you aren't the only one getting crashes every 6 hours(mine does as well on 1.03).   My theory on the scrypt switching is it fails to close the non profitable miner before opening up the other, and with high intensities(which is another issue you might be running into(if you're mining with -i intensity try removing that altogether))it causes a memory overlap and crash.
I'm running lyra2v2 right now without the 970 gtx in the bat file. Running great. Take a look on the OP.
The 970 gtx is at 1413 core clock.. should be 1178..I think that is one of the problems.
I'm just running the 2 980ti and 1 750ti test run.



If the 970 is not mining, why is it so hot?
It's in a room.. temp 95 f 35 c  and next to the other cards..
At room temp that card mines very cool about 68c on some algo's.
If I run the other 970 gtx I have all by itself all cards removed .. it will crash immediately.
I think I have a bad batch of 970 gtx cards.

Which temp is for which card? It looks like the third card is at 79, isn't that ccminer GPU #2?
Anyway if the cards crash at stock settings and reasonable temps they are defective.
Double faults can easilly get you going in circles trying to troubleshoot.
3rd card is a 980ti 79c is what i set as max going to lower it.
antonio8
Legendary
*
Offline Offline

Activity: 1400
Merit: 1000


View Profile
July 18, 2016, 01:01:49 AM
 #43

Not sure if this will help you at all but I was having a issue similar.

I had  one 970, two 960 and one 750ti in a rig and I had one card that would always crash while mining certain algos. It was always the same 960 and sometimes a few minutes and sometimes a few hours.

I used Nvidia Inspector to determine this as it always showed the same card with the fan down to 0% after the crash. I tried new risers (powered usb) and the same thing. I was starting to think I had a bad card so I switched the 960's only in each riser to narrow it down. Low and behold the card that was having no issues started crashing and the card that was crashing would not anymore. Same thing after switching and same algos. The card that was crashing would not at all any more.

I was perplexed. ran out of ideas and I had one last thought. I just switched the riser connector on the mother board that was crashing with my 750ti pci-e slots. I looked in Nvidia Inspector and noticed my 960 that was crashing had switched PCI Interface numbers. Both 960 cards read 3.0@1.1x1 before the switch and after the card that was crashing read 3.0@2.0x1. Now it reads the same as my 750ti. So both 960's read different values and I have not had a single issue since.

I am no expert and have no idea why it fixed it but it did.

Hope that might help and hope it made sense.

If you are going to leave your BTC on an exchange please send it to this address instead 1GH3ub3UUHbU5qDJW5u3E9jZ96ZEmzaXtG, I will at least use the money better than someone who steals it from the exchange. Thanks Wink
joblo
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
July 18, 2016, 01:26:02 AM
 #44

Not sure if this will help you at all but I was having a issue similar.

I had  one 970, two 960 and one 750ti in a rig and I had one card that would always crash while mining certain algos. It was always the same 960 and sometimes a few minutes and sometimes a few hours.

I used Nvidia Inspector to determine this as it always showed the same card with the fan down to 0% after the crash. I tried new risers (powered usb) and the same thing. I was starting to think I had a bad card so I switched the 960's only in each riser to narrow it down. Low and behold the card that was having no issues started crashing and the card that was crashing would not anymore. Same thing after switching and same algos. The card that was crashing would not at all any more.

I was perplexed. ran out of ideas and I had one last thought. I just switched the riser connector on the mother board that was crashing with my 750ti pci-e slots. I looked in Nvidia Inspector and noticed my 960 that was crashing had switched PCI Interface numbers. Both 960 cards read 3.0@1.1x1 before the switch and after the card that was crashing read 3.0@2.0x1. Now it reads the same as my 750ti. So both 960's read different values and I have not had a single issue since.

I am no expert and have no idea why it fixed it but it did.

Hope that might help and hope it made sense.

I'm just guessing but it look like the notation means <slot version>@<running version><lanes>. That would mean the slots
were running at PCIe v1.1 even though the slots and cards support v3. I have no idea why swapping slots would change that.
Glad you got it fixed.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
July 18, 2016, 09:50:30 AM
 #45

Still doing great on lyra2v2 stright mining almost 20 hours. Next is to do Minercontrol without the 970 in the confg file.
Do to things I must attend to.. I may not be able to post until Thursday.
Thx

tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
July 18, 2016, 10:23:32 AM
 #46

Not sure if this will help you at all but I was having a issue similar.

I had  one 970, two 960 and one 750ti in a rig and I had one card that would always crash while mining certain algos. It was always the same 960 and sometimes a few minutes and sometimes a few hours.

I used Nvidia Inspector to determine this as it always showed the same card with the fan down to 0% after the crash. I tried new risers (powered usb) and the same thing. I was starting to think I had a bad card so I switched the 960's only in each riser to narrow it down. Low and behold the card that was having no issues started crashing and the card that was crashing would not anymore. Same thing after switching and same algos. The card that was crashing would not at all any more.

I was perplexed. ran out of ideas and I had one last thought. I just switched the riser connector on the mother board that was crashing with my 750ti pci-e slots. I looked in Nvidia Inspector and noticed my 960 that was crashing had switched PCI Interface numbers. Both 960 cards read 3.0@1.1x1 before the switch and after the card that was crashing read 3.0@2.0x1. Now it reads the same as my 750ti. So both 960's read different values and I have not had a single issue since.

I am no expert and have no idea why it fixed it but it did.

Hope that might help and hope it made sense.
Thank you antonio8.
Glad you found your problem. Smiley
I have done the switching but I need to look into it in detail.
But I have had 3 970, I think all from the same batch or lot, and all were not mining in the p2 state at 1178 core. They were at about 1423 core clocks. Going to call gigabyte again asap.
Will look into switching slots again.
It's as if they where oc'd without oc'ing them.
Thx
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
July 21, 2016, 08:41:20 AM
Last edit: July 22, 2016, 10:07:46 PM by tbearhere
 #47

A note  7-21-2016
My rig mined all day with all cards with MC. Smiley
Room temp 92f  humidity 35%
For the four days before that the rig crashed at only 87f humidity 85%
Possibility the humidity played some role in the crashing on with the 970gtx that mines in p2 clocks default 1413 core.
Maybe micro dust.
This weeks temp are supposed to be record breaking 100f......... humidity.?    
Slowly one thing at a time to pin down the 3 things contributing to the crashing.
Due to heat I will have to shut down the rig for sometime during the day for the next 5 days.

EDIT:7-22-16 rig mined all day today with MC room temp max 95f hum 45% the only change I did was turn off oc'ing and set  "delay": 15  to "delay": 30.
So one problem is heat related cards changing algo's going from the p2 state to p8 state in order to mine or some call it spin down time.
The hotter it is the longer it takes for spin down.
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1166


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
July 23, 2016, 11:33:12 AM
 #48

https://bitcointalk.org/index.php?topic=1260863.msg15551454#msg15551454

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
July 24, 2016, 10:35:14 AM
Last edit: July 24, 2016, 10:51:01 AM by tbearhere
 #49

Here's another useful cut and paste.    I was able to kick my live mining machine from P2 into P0, no interruptions.

I am thinking of building this command into my batch every time it re-launches the miner apps to make sure P-state is zero.

Code:
Run this in cmd in admin mode:
c:\progra~1\nvidia~1\NVSMI\
nvidia-smi -q -d SUPPORTED_CLOCKS | more
(that's a pipe before more)

Take the top number for memory, something like 3505 (GTX970), and the first number for graphics, something like 1531. Again in admin mode, enter your numbers in this format:

nvidia-smi -ac 3505,1531

You're card's memory will now run in compute mode (P0).

nvidia-smi.exe -ac 3505,1443  For my ASUS GTX 980.

Ive also noticed X11evo is a power-hungry algo.  One machine used to sit at 80-84*C when running only X11EVO.  Now Running Lyra2 full-time and it sits @65*C...


Thx JK I have done this.   Will that keep them in the p0 state on reboot?
EDIT: I'm going to give this a try again asap. thx
joblo
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
July 24, 2016, 11:48:14 AM
 #50

A note  7-21-2016
My rig mined all day with all cards with MC. Smiley
Room temp 92f  humidity 35%
For the four days before that the rig crashed at only 87f humidity 85%
Possibility the humidity played some role in the crashing on with the 970gtx that mines in p2 clocks default 1413 core.
Maybe micro dust.
This weeks temp are supposed to be record breaking 100f......... humidity.?    
Slowly one thing at a time to pin down the 3 things contributing to the crashing.
Due to heat I will have to shut down the rig for sometime during the day for the next 5 days.

EDIT:7-22-16 rig mined all day today with MC room temp max 95f hum 45% the only change I did was turn off oc'ing and set  "delay": 15  to "delay": 30.
So one problem is heat related cards changing algo's going from the p2 state to p8 state in order to mine or some call it spin down time.
The hotter it is the longer it takes for spin down.

Interesting observation. I never really considered that it would be more susceptible to crash when switching algos, or that
spin down time could mitigate.  Good to see you're making progress with your "heat chamber" testing.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
Greenbat
Newbie
*
Offline Offline

Activity: 39
Merit: 0


View Profile
July 24, 2016, 03:30:48 PM
 #51

so damn...
great idea. this is what I want first but until now they have not found a good example
my gpu need big upgrade for this Undecided
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1166


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
July 25, 2016, 12:51:43 AM
 #52

Thx JK I have done this.   Will that keep them in the p0 state on reboot?
EDIT: I'm going to give this a try again asap. thx

No.. this will have to have admin rights first deactivated (check zpool thread for command) then reboot, and it shouldn't need admin rights anymore.

then add the path to the NVSMI folder to the system PATH and you can just call it from any old command prompt/provision allotment =)

Issue the memory and clock set command every reboot, or better right before the miner app launches.

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
July 25, 2016, 10:32:12 AM
Last edit: July 25, 2016, 12:31:28 PM by tbearhere
 #53

A note  7-21-2016
My rig mined all day with all cards with MC. Smiley
Room temp 92f  humidity 35%
For the four days before that the rig crashed at only 87f humidity 85%
Possibility the humidity played some role in the crashing on with the 970gtx that mines in p2 clocks default 1413 core.
Maybe micro dust.
This weeks temp are supposed to be record breaking 100f......... humidity.?    
Slowly one thing at a time to pin down the 3 things contributing to the crashing.
Due to heat I will have to shut down the rig for sometime during the day for the next 5 days.

EDIT:7-22-16 rig mined all day today with MC room temp max 95f hum 45% the only change I did was turn off oc'ing and set  "delay": 15  to "delay": 30.
So one problem is heat related cards changing algo's going from the p2 state to p8 state in order to mine or some call it spin down time.
The hotter it is the longer it takes for spin down.

Interesting observation. I never really considered that it would be more susceptible to crash when switching algos, or that
spin down time could mitigate.  Good to see you're making progress with your "heat chamber" testing.
Thx joblo  yes still in a severe heat wave..never seen this before. One day I turned off the miners until the evening. Today same thing I think 100f  Undecided

EDIT: I do have the temperatures of the cards set at a max.. to not exceed 79c ect.
And I use to get this  but was going to talk about it later.
Going to myr-gr  also on neoscrypt  so not related to algo but memory or ccminer? Maybe intensity setting.
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
July 25, 2016, 10:38:20 AM
Last edit: July 25, 2016, 11:07:08 AM by tbearhere
 #54

Thx JK I have done this.   Will that keep them in the p0 state on reboot?
EDIT: I'm going to give this a try again asap. thx

No.. this will have to have admin rights first deactivated (check zpool thread for command) then reboot, and it shouldn't need admin rights anymore.

then add the path to the NVSMI folder to the system PATH and you can just call it from any old command prompt/provision allotment =)

Issue the memory and clock set command every reboot, or better right before the miner app launches.
Thx JaredKaragen  looking into it. And where is that command plz on keeping the rig in the administrative mode on reboot. Thx
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
July 25, 2016, 11:08:09 AM
 #55

And where is that command plz on keeping the rig in the administrative mode on reboot. Thx
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1166


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
July 26, 2016, 12:26:26 AM
 #56

And where is that command plz on keeping the rig in the administrative mode on reboot. Thx
"c:\progra~1\nvidia\NVSMI\nvidia-smi.exe -acp UNRESTRICTED" removes the admin requirements.....  Run as administrator in command prompt.

I havent tried rebooting to see if this needs repeated on boot.   if so, it will need to be automated at boot-up somehow.

But one p-state issue ive tracked to the neoscrypt algo.   Try removing it for a time and see how much it helps.

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
July 26, 2016, 10:48:19 AM
 #57

And where is that command plz on keeping the rig in the administrative mode on reboot. Thx
"c:\progra~1\nvidia\NVSMI\nvidia-smi.exe -acp UNRESTRICTED" removes the admin requirements.....  Run as administrator in command prompt.

I havent tried rebooting to see if this needs repeated on boot.   if so, it will need to be automated at boot-up somehow.

But one p-state issue ive tracked to the neoscrypt algo.   Try removing it for a time and see how much it helps.
Thx JK see my post on zpool plz.
https://bitcointalk.org/index.php?topic=1260863.msg15712991#msg15712991

But I still may give this a try.  Smiley
joblo
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
July 26, 2016, 09:28:33 PM
 #58


EDIT: I do have the temperatures of the cards set at a max.. to not exceed 79c ect.
And I use to get this  but was going to talk about it later.
Going to myr-gr  also on neoscrypt  so not related to algo but memory or ccminer? Maybe intensity setting.


Looks like a null pointer dereference. That's usually software but in your case it could be excess heat in the CPU or RAM.
How is the ventilation around the mobo? Maybe heat from the GPUs is destabilizing the CPU.

Edit: It could also be bad RAM. Make note if they are always the same, especially the instruction address.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1166


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
July 27, 2016, 04:04:29 AM
 #59


EDIT: I do have the temperatures of the cards set at a max.. to not exceed 79c ect.
And I use to get this  but was going to talk about it later.
Going to myr-gr  also on neoscrypt  so not related to algo but memory or ccminer? Maybe intensity setting.


Looks like a null pointer dereference. That's usually software but in your case it could be excess heat in the CPU or RAM.
How is the ventilation around the mobo? Maybe heat from the GPUs is destabilizing the CPU.

Edit: It could also be bad RAM. Make note if they are always the same, especially the instruction address.

Try a different power supply.

You'd be amazed how many are bad and have you point the finger somewhere else.

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
tbearhere (OP)
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
July 27, 2016, 12:08:10 PM
 #60


EDIT: I do have the temperatures of the cards set at a max.. to not exceed 79c ect.
And I use to get this  but was going to talk about it later.
Going to myr-gr  also on neoscrypt  so not related to algo but memory or ccminer? Maybe intensity setting.


Looks like a null pointer dereference. That's usually software but in your case it could be excess heat in the CPU or RAM.
How is the ventilation around the mobo? Maybe heat from the GPUs is destabilizing the CPU.

Edit: It could also be bad RAM. Make note if they are always the same, especially the instruction address.

Try a different power supply.

You'd be amazed how many are bad and have you point the finger somewhere else.
Thx  That is my 2nd psu and i have a third to sli for my extra 970 but it is hard for windows to recognize it and it crashes within 3 minutes. So since I made some changes I will try again soon.
I have a fan at high speed on the rig.
This only comes up once in a great while.
Pages: « 1 2 [3] 4 5 6 7 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!