Bitcoin Forum
May 13, 2024, 10:50:39 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Stability Issues  (Read 220 times)
N1kon (OP)
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
January 19, 2018, 02:47:57 AM
Last edit: January 19, 2018, 04:08:20 AM by N1kon
 #1

Hey All!

This forum has been super helpful while putting my rig together (thanks guys!) but, I still can't iron out these last few issues I am having.

It seems that every 3-4 hrs or so my rig hard crashes. The PC can only be recovered by completely unplugging it. With my luck, this usually happens as soon as I leave for work. I have had to resort to purchasing a Wifi outlet in the event this happens while I am away. Currently, the rig is mining 24/7 only with the aid of a BAT file I made that auto reboots every 2.5 hours. This seems to have happened shortly after adding the 6th card but, I also changed from Skunkhash to Neoscrypt around that time so I am not sure. The only thing that sticks out to me is I get weird power fluctuations (See pic) in MSI AB that seem to lower my hashrate while they happen. After about 5 mins, the flucuation goes away. Am I missing something?

The Satoshi Smasher
https://i.imgur.com/jdLW5f8.jpg

Fluctuation
https://i.imgur.com/QG8kmNf.png


MSI AfterBurner
Power 84%
Temp Limit 85%
1080-Core +45, Mem +450
1070-Core +70, Mem +375

Power:
~940W via Kill-a-Watt

Algo:
Neoscrypt ~6.15/MHs

Miner
Ccminer KlausT Cuda 8

System Specs:
Celeron G3920
Asus B250 Mining Expert MB
Crucial DDR4 2400Mhz 8GB
250GB HDD (SSD to come)
1200W 87% PSU
Virtual Mem= 98GB
Nvidia Driver-388.43
 
Cards:
1x Gigabyte 1080
3x Gigabyte 1070 Mini
1x Gigabyte G1 1070
1x Asus Strix Rog 1070
1715640639
Hero Member
*
Offline Offline

Posts: 1715640639

View Profile Personal Message (Offline)

Ignore
1715640639
Reply with quote  #2

1715640639
Report to moderator
1715640639
Hero Member
*
Offline Offline

Posts: 1715640639

View Profile Personal Message (Offline)

Ignore
1715640639
Reply with quote  #2

1715640639
Report to moderator
1715640639
Hero Member
*
Offline Offline

Posts: 1715640639

View Profile Personal Message (Offline)

Ignore
1715640639
Reply with quote  #2

1715640639
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715640639
Hero Member
*
Offline Offline

Posts: 1715640639

View Profile Personal Message (Offline)

Ignore
1715640639
Reply with quote  #2

1715640639
Report to moderator
1715640639
Hero Member
*
Offline Offline

Posts: 1715640639

View Profile Personal Message (Offline)

Ignore
1715640639
Reply with quote  #2

1715640639
Report to moderator
1715640639
Hero Member
*
Offline Offline

Posts: 1715640639

View Profile Personal Message (Offline)

Ignore
1715640639
Reply with quote  #2

1715640639
Report to moderator
gotminer
Member
**
Offline Offline

Activity: 644
Merit: 24


View Profile
January 19, 2018, 02:51:56 AM
 #2

What does the system event log tell you is going on at the time of a crash?

Ok, I want you to walk back in there and very calmly, very politely tell the risk assessors to fuck off! -Mark Baum
Junkbarman
Full Member
***
Offline Offline

Activity: 168
Merit: 100


View Profile
January 19, 2018, 03:06:51 AM
 #3

I use KlausT and had similar issues.

First, I had crashing issues as well when I combined different types (1070 with 1060's in my case ) in the same mining program.

Run them in separate instances and then see if one of them crashes. if it does, then you know which cards aren't liking the oc'ing you're doing. If not, then they simply didn't like being together, at least that's what I've come up with.

This tech has worked for me until I can fix the problem completely.

also, to add what I've found stable for my system, I have two rigs. I run 4 1060s in the same instance and no issues. On the 2nd rig i have 2 1060's running together and a single 1070 running in a solo instance. I found that the 1070 didn't like my aggressive oc'ing, so I backed it down a bit and so far 6 hours and no crashes.

Edit:
I just noticed you said the WHOLE rig locks up? damn, maybe a short or power issue?

N1kon (OP)
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
January 19, 2018, 03:19:00 AM
 #4

Besides the Virtual Memory Exhaustion issue which I have already addressed is:


The highlighted event is the approximate time it crashed
https://i.imgur.com/r5gJy4K.png




Error Description:

The computer restarted from a bugcheck  - (Im looking at the .DMP next.)

The server {AB8902B4-09CA-4BB6-B78D-A8F59079A8D5} did not register with DCOM within the required timeout.

The WarpJITSvc service terminated unexpectedly.  It has done this 1 time(s).
N1kon (OP)
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
January 19, 2018, 03:27:02 AM
 #5


Edit:
I just noticed you said the WHOLE rig locks up? damn, maybe a short or power issue?



Ya, I am lucky if the screen displays the cursor when it goes down. Concerning thing is that the card fans keep going at mining speed. 

I think the power issue could be from a bad riser or maybe I am pushing the PSU too far. Another PSU is already on the way.
jillscarbrough
Sr. Member
****
Offline Offline

Activity: 588
Merit: 335


Steady State Finance


View Profile
January 19, 2018, 03:31:32 AM
 #6

estimate power shortage to GPU, try check cable from PSU to RISER. or if you have other PSUs, try it. Can also corrupt OS. I've experienced the same thing. I try to do things like above and succeed.

N1kon (OP)
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
January 19, 2018, 07:18:55 PM
 #7

I have stopped all overclock and lowered TDP to 80% 

From what I can tell Windows appears more stable but, now ccminer keeps crashing (wtf?). I have switched to ccminer KlausT Cuda 9 to see if this clears things up.

You take one down issue and another pops up. Sheesh.
CryptoWatcher420
Sr. Member
****
Offline Offline

Activity: 462
Merit: 258

Small Time Miner, Rig Builder, Crypto Trader


View Profile
January 19, 2018, 07:40:16 PM
 #8

Besides the Virtual Memory Exhaustion issue which I have already addressed is:


The highlighted event is the approximate time it crashed





Error Description:

The computer restarted from a bugcheck  - (Im looking at the .DMP next.)

The server {AB8902B4-09CA-4BB6-B78D-A8F59079A8D5} did not register with DCOM within the required timeout.

The WarpJITSvc service terminated unexpectedly.  It has done this 1 time(s).

you've got a 98gb virtual memory, and your wondering about stability, maybe you need to look up what a pagefile does on a ssd or hard drive. any IT person knows a large page file affects performance really badly, to the point it can affect stability. theres NO need for anything larger than 16gb virtual memory PERIOD. if you need to increase it that means the system is using too much somewhere and you need to figure out what and fix it instead of just adding too the problem

6pin to EPS 12v 4+4pin w/pigtail & 2.5mm barrel plug for Pico Psu for SERVER PSU ONLY GPU MINING RIGS! | Donations: BTC-  | Join Me on Discord! https://discord.gg/VDwWFcK
N1kon (OP)
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
January 19, 2018, 09:13:34 PM
 #9



you've got a 98gb virtual memory, and your wondering about stability, maybe you need to look up what a pagefile does on a ssd or hard drive. any IT person knows a large page file affects performance really badly, to the point it can affect stability. theres NO need for anything larger than 16gb virtual memory PERIOD. if you need to increase it that means the system is using too much somewhere and you need to figure out what and fix it instead of just adding too the problem

Maybe I took some bad advice. From what I heard around here is, the sum of pagefile+PYS RAM>sum of gpu RAM. Changing this max allotment actually helped windows stability dramatically. The rig actually stays on now. All errors pertaining to memory have stopped. The miner application crash is the only thing happening now and it appears to be Nvidia driver related from what the event viewer is telling me.

What the heck, I'll try turning pagefile down too.
gotminer
Member
**
Offline Offline

Activity: 644
Merit: 24


View Profile
January 20, 2018, 12:19:37 AM
 #10

you've got a 98gb virtual memory, and your wondering about stability, maybe you need to look up what a pagefile does on a ssd or hard drive. any IT person knows a large page file affects performance really badly, to the point it can affect stability. theres NO need for anything larger than 16gb virtual memory PERIOD. if you need to increase it that means the system is using too much somewhere and you need to figure out what and fix it instead of just adding too the problem

98gb is a little much, but I get out of memory errors in pretty much any miner instantly on all of my rigs (6gpu vega 56, 6gpu 1070ti's, 6gpu 1080ti's), if I don't have my virtual memory set at 50gb.  I'm running 8gb of ddr4 in each.  The OS runs fine without that big of a page file, until I start the miner.

When I check to see what's using so much memory, it's the miner.  I don't know how I can get away with a 16gb page file, unless I upgrade them all to 32gb ram.

 

Ok, I want you to walk back in there and very calmly, very politely tell the risk assessors to fuck off! -Mark Baum
N1kon (OP)
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
January 20, 2018, 01:13:55 AM
 #11

you've got a 98gb virtual memory, and your wondering about stability, maybe you need to look up what a pagefile does on a ssd or hard drive. any IT person knows a large page file affects performance really badly, to the point it can affect stability. theres NO need for anything larger than 16gb virtual memory PERIOD. if you need to increase it that means the system is using too much somewhere and you need to figure out what and fix it instead of just adding too the problem

98gb is a little much, but I get out of memory errors in pretty much any miner instantly on all of my rigs (6gpu vega 56, 6gpu 1070ti's, 6gpu 1080ti's), if I don't have my virtual memory set at 50gb.  I'm running 8gb of ddr4 in each.  The OS runs fine without that big of a page file, until I start the miner.

When I check to see what's using so much memory, it's the miner.  I don't know how I can get away with a 16gb page file, unless I upgrade them all to 32gb ram.

 

After running Virtual @20 GB im getting memory exhaustion errors again. I'll try your magic number and see how that goes. I have a feeling this is going to turn into a balance act by the time i connect all 19 GPUs to this thing.
gotminer
Member
**
Offline Offline

Activity: 644
Merit: 24


View Profile
January 20, 2018, 01:37:11 AM
 #12

I just don't understand how CryptoWatcher420 is saying 16GB max virtual memory unless he meant you should be using way more than 8gb ram.  Maybe there is another issue going on here ... I just don't see what it is. 

My Vega 56 rig ran stable for 6 weeks with 4gb ram and a 60gb page file, but I've since upgraded it to 8gb ram just for the hell of it.  I left the page file at 60gb.  One of my 1070ti rigs ran stable for weeks at 4gb ram and a 60gb page file, but I've upgrade that one to 8gb as well.  I've never had a problem with either of those.

I do have a couple of problem rigs though.  They aren't really a huge problem, because they run, but I see disk controller errors in the event logs (which might be related to the large page file) and every once in awhile one will hang.  I just upgraded those to 8gb ram last night and neither one has crashed since, but I still have to have the page file at 50gb min or cuda throws up out of memory errors.  And I still see disk controller errors in the event log. 

I've done everything under the sun to the two problem rigs (one is six 1070ti's and one is six 1080ti's) and still can't get rid of the disk controller errors.  I've swapped the motherboard, sata cable, ssd, compared bios settings to my rigs that are running without errors, updated chipset drivers, reinstalled Win10 and started from scratch.  The only hardware that I haven't swapped out in those rigs are the risers and processor.

Ok, I want you to walk back in there and very calmly, very politely tell the risk assessors to fuck off! -Mark Baum
N1kon (OP)
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
January 22, 2018, 02:10:28 AM
 #13

I am still having issues with crashing. I tested skunkhash and it appeared to last a bit longer than Neocrypt but neither runs for more than 2.5 hours reliably. I have since swapped all cards to new risers from a different brand in hopes I had a bad riser in the mix.
nc50lc
Legendary
*
Offline Offline

Activity: 2408
Merit: 5601


Self-proclaimed Genius


View Profile
January 22, 2018, 02:59:40 AM
 #14

Don't blame the RAM or Pagefile if the system immediately crashes, it will freeze first if it was a memory problem.

It is either Temperature or Power failure, but most likely a bad GPU.
Start by removing the 6th GPU and monitor your rig.

I've seen the same issue that points to a faulty video card.
The symptoms are misleading, but after replacing that vcard, the dilenma of finding the culprit has ended.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
N1kon (OP)
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
January 23, 2018, 05:19:30 PM
Last edit: January 23, 2018, 06:08:58 PM by N1kon
 #15

It appears the system is now only soft crashing ccminer every so often due to non responding video drivers. Vid Driver updated to 390.65.
Fingers crossed...

Otherwise I will have to try pulling the newest GPU out of the mix to see what happens.

N1kon (OP)
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
January 30, 2018, 10:45:48 PM
 #16

UPDATE:

Since my last update I have changed my miner to neoscrypt-hsrminer, changed video TDRDelay to 10 seconds, moved all cards to new risers and moved 3 of the cards to my new DPS-1200 PSU. I was still having issues with windows freezing. Absolutely no change.


but wait, there's more.

After uninstalling my nvidia drivers and reinstalling using Windows 7 compatibility mode, the rig is finally showing improvement. I can actually leave it on all day with little to no human intervention. I still have a babysitter script that restarts hsrminer if the application closes (just in case) and a 3-hour reboot cycle on my wifi outlet. Hopefully I can ween the rig off of all my automated scripts and timers a little bit.

tadeus1
Member
**
Offline Offline

Activity: 140
Merit: 11


View Profile
January 30, 2018, 11:05:41 PM
 #17

I may not help you much, but I had similar issues on a mixed set of 1060/1070s  , ccminer klaust and neoscrypt.
Got frustrated and never found the issue. Just changed the algo in my case.

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!