Bitcoin Forum
September 14, 2024, 10:20:34 AM *
News: Latest Bitcoin Core release: 27.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 [387] 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 »
  Print  
Author Topic: [OS] nvOC easy-to-use Linux Nvidia Mining  (Read 417985 times)
papampi
Full Member
***
Offline Offline

Activity: 686
Merit: 140


Linux FOREVER! Resistance is futile!!!


View Profile WWW
April 29, 2018, 02:08:19 PM
 #7721

I am currently trying to get the best of nvOC into rxOC. I now this is nvOC thread but I am already very unusual doing and in rxOC thread it feels like no one reads so I have slight hope somebody has a key tip for me here.

I tried to implement WTM switch on my own into my rxOC rig. I failed because too many things to change regarding to watchdog. Then I had the idea to remain my rxOC image and simply copy all things from nvOC home/m1 folder into rxOC home/m1 folder, edited a couple of neccessary things. Now when I insert one card, everything works fine and WTM switches to profitable coin on rxOC!

Unfortunately, when inserting a second card I get this message:

https://ibb.co/ga8obx

I also got this once with one card. After a reboot everything worked fine again. I already switched off integrated gpu in bios. maybe somebody can point me in the right direction please?

btw: still no hashing on xmr on my original nvOC rig. is this working?


You can't use nvOC files in rxOC.
Copy rxOC 1bash, 3main, 2unix, watchdog and temp control to pastebin and send me the links
I will check and see if I can do anything for rxOC.

Heguli97
Full Member
***
Offline Offline

Activity: 223
Merit: 101


View Profile
April 29, 2018, 02:44:12 PM
 #7722

Hi, could someone instruct me how to mine XPM with this os?

The miner i'm looking at has a linux version and it's located here: https://bitcointalk.org/index.php?topic=831708.0
and this is presumably the linux version of the miner: http://coinsforall.io/distr/xpmclient-10.2.2-beta.tar.gz

ha5hi5h
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
April 30, 2018, 04:47:33 AM
 #7723

Hi,

I noticed that there are a few double periods in the 0miner file:

Line 883 onwards:
if [ $COIN == "PASC" ]
then
  HCD='/home/m1/pasc/sgminer'
  ADDR="$PASC_ADDRESS..$PASC_WORKER"


Is this right? Or should I reduce it to a single period?




fk1
Full Member
***
Offline Offline

Activity: 216
Merit: 100


View Profile
April 30, 2018, 08:37:19 AM
 #7724

I am currently trying to get the best of nvOC into rxOC. I now this is nvOC thread but I am already very unusual doing and in rxOC thread it feels like no one reads so I have slight hope somebody has a key tip for me here.

I tried to implement WTM switch on my own into my rxOC rig. I failed because too many things to change regarding to watchdog. Then I had the idea to remain my rxOC image and simply copy all things from nvOC home/m1 folder into rxOC home/m1 folder, edited a couple of neccessary things. Now when I insert one card, everything works fine and WTM switches to profitable coin on rxOC!

Unfortunately, when inserting a second card I get this message:

https://ibb.co/ga8obx

I also got this once with one card. After a reboot everything worked fine again. I already switched off integrated gpu in bios. maybe somebody can point me in the right direction please?

btw: still no hashing on xmr on my original nvOC rig. is this working?


You can't use nvOC files in rxOC.
Copy rxOC 1bash, 3main, 2unix, watchdog and temp control to pastebin and send me the links
I will check and see if I can do anything for rxOC.


oneBash: https://pastebin.com/CXGuffeU
2unix: https://pastebin.com/MPzG6tK5

That's all from rxOC. I guess one needs to seperate all the 0miner and 3main stuff from the oneBash file, otherwise things will not work the same as with nvOC and WTM switch. I tried but failed.
Stubo
Member
**
Offline Offline

Activity: 224
Merit: 13


View Profile
April 30, 2018, 09:08:33 AM
 #7725

Hi,

I noticed that there are a few double periods in the 0miner file:

Line 883 onwards:
if [ $COIN == "PASC" ]
then
  HCD='/home/m1/pasc/sgminer'
  ADDR="$PASC_ADDRESS..$PASC_WORKER"


Is this right? Or should I reduce it to a single period?






If all of the double periods follow that example where they are between the address and worker name, then yes. There should only be a single period between them.
papampi
Full Member
***
Offline Offline

Activity: 686
Merit: 140


Linux FOREVER! Resistance is futile!!!


View Profile WWW
April 30, 2018, 11:58:34 AM
 #7726

I am currently trying to get the best of nvOC into rxOC. I now this is nvOC thread but I am already very unusual doing and in rxOC thread it feels like no one reads so I have slight hope somebody has a key tip for me here.

I tried to implement WTM switch on my own into my rxOC rig. I failed because too many things to change regarding to watchdog. Then I had the idea to remain my rxOC image and simply copy all things from nvOC home/m1 folder into rxOC home/m1 folder, edited a couple of neccessary things. Now when I insert one card, everything works fine and WTM switches to profitable coin on rxOC!

Unfortunately, when inserting a second card I get this message:

https://ibb.co/ga8obx

I also got this once with one card. After a reboot everything worked fine again. I already switched off integrated gpu in bios. maybe somebody can point me in the right direction please?

btw: still no hashing on xmr on my original nvOC rig. is this working?


You can't use nvOC files in rxOC.
Copy rxOC 1bash, 3main, 2unix, watchdog and temp control to pastebin and send me the links
I will check and see if I can do anything for rxOC.


oneBash: https://pastebin.com/CXGuffeU
2unix: https://pastebin.com/MPzG6tK5

That's all from rxOC. I guess one needs to seperate all the 0miner and 3main stuff from the oneBash file, otherwise things will not work the same as with nvOC and WTM switch. I tried but failed.


Here is the wtm auto switch for rxoc
I dont have rxOC so I can not test
Check and let me know how it goes.



fk1
Full Member
***
Offline Offline

Activity: 216
Merit: 100


View Profile
April 30, 2018, 01:58:18 PM
 #7727

Thank you very much, i will try today after work
urnzwy
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
April 30, 2018, 02:33:35 PM
Last edit: April 30, 2018, 03:45:33 PM by urnzwy
 #7728

Hey guys,

Anyone else having an issue mining ETH (and ZCL now) and the rig crashing / freezing?

Seems that something happen on the ETH miner, watchdog thinks it lost a GPU and tries to restart 3main. But on both of my rigs, when this happens the system freezes and I have to manually restart.

Errors I have seen -

GPU Utilization is low: restarting 3main...

Thread exited with code: 29

Is there a way to disable 3main restarting without disabling watchdog? The miner would start mining again once connected, but this freezing is my issue.

I'm mining hush temporarily with zero issues. Was on ZCL before the fork without issues as well, same settings and whatnot. And was mining ETH for about 3 weeks with no issues. Now it won't stop crashing after being up 12-24 hours each time. It's driving me nuts.

Two separate rigs, both 13 card (1070 and 1070 Ti's)

Ahhhhhh This is driving me nuts...

On Hush, ZERO issues. Mined for over a week. Switched to ZenCash yesterday "figured Equihash works with Hush to why not".Freezes after 18 hours..

I seriously do not understand. These rigs went months running fine and now are nothing but problematic. Any idea's would be greatly appreciated.

Two rigs, go down at the same time. There has to be some answer as to why.

Can't seem to figure how to attach the photo to the thread.. But here is one of the rigs.

https://imgur.com/a/WRfTksJ
urnzwy
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
May 01, 2018, 05:42:14 AM
 #7729

One of my rigs info. The other won't let me in for some reason. I believe it's an access issue, which I figured out in the past but can't remember now.

ID,VENDOR,MODEL,PSTATE,TEMP,FAN,UTILIZATION,POWER,POWERLIMIT,MAXPOWER,GPUCLOCK,MEMCLOCK
--------------------------------------------------------------------------------
0, PNY, GeForce GTX 1070 Ti, P2, 65, 60, 100, 114.30, 120.00, 217.00, 1607, 3874
1, PNY, GeForce GTX 1070 Ti, P2, 72, 70, 100, 118.06, 120.00, 217.00, 1556, 3874
2, PNY, GeForce GTX 1070 Ti, P2, 69, 60, 99, 121.58, 120.00, 217.00, 1569, 3874
3, EVGA, GeForce GTX 1080 Ti, P2, 68, 60, 100, 158.63, 175.00, 300.00, 1544, 5078
4, EVGA, GeForce GTX 1070 Ti, P2, 62, 60, 99, 116.47, 120.00, 217.00, 1594, 3874
5, EVGA, GeForce GTX 1070 Ti, P2, 63, 60, 93, 115.74, 120.00, 217.00, 1594, 3874
6, PNY, GeForce GTX 1070 Ti, P2, 70, 75, 100, 119.80, 120.00, 217.00, 1544, 3874
7, PNY, GeForce GTX 1070 Ti, P2, 73, 80, 99, 119.72, 120.00, 217.00, 1620, 3874
8, EVGA, GeForce GTX 1080, P2, 74, 75, 100, 174.96, 175.00, 217.00, 1898, 4590
9, PNY, GeForce GTX 1070, P2, 72, 80, 100, 122.47, 120.00, 170.00, 1645, 3874
10, PNY, GeForce GTX 1070 Ti, P2, 70, 60, 97, 118.81, 120.00, 217.00, 1544, 3874
11, PNY, GeForce GTX 1070 Ti, P2, 66, 60, 99, 92.56, 120.00, 217.00, 1556, 3874
12, EVGA, GeForce GTX 1070, P2, 62, 60, 100, 121.06, 120.00, 170.00, 1759, 3874
urnzwy
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
May 01, 2018, 05:48:57 AM
 #7730

Hey guys,

Anyone else having an issue mining ETH (and ZCL now) and the rig crashing / freezing?

Seems that something happen on the ETH miner, watchdog thinks it lost a GPU and tries to restart 3main. But on both of my rigs, when this happens the system freezes and I have to manually restart.

Errors I have seen -

GPU Utilization is low: restarting 3main...

Thread exited with code: 29

Is there a way to disable 3main restarting without disabling watchdog? The miner would start mining again once connected, but this freezing is my issue.

I'm mining hush temporarily with zero issues. Was on ZCL before the fork without issues as well, same settings and whatnot. And was mining ETH for about 3 weeks with no issues. Now it won't stop crashing after being up 12-24 hours each time. It's driving me nuts.

Two separate rigs, both 13 card (1070 and 1070 Ti's)

The only time I have ever had a rig freeze was ultimately due to OC being too high. Lowering the GPU OC by just a bit (5 or 10) fixed my issue.

I am not sure I totally understand your question, but if you want to disable the the 3main restart, the only way to do this is to not run the watchdog at all. To do that, change this in 1bash:

MINER_WATCHDOG="YES"

to

MINER_WATCHDOG="NO"



Shouldn't be an overclock issue. Running Hush right now for two days straight no issue.

Running 50 core and 200 mem at 80% TDP.

Def do not want to turn off watchdog. I've ran with these settings fine for multiple coins and long periods of time. ETH was running for over a month with zero issues and now all the sudden with the same settings, same miner, different rigs crashes every 12ish hours within minutes of eachother. Even switched servers and same issue. Something else is at play here.

Like I mentioned, my personal desktop that mines I see ethminer restart randomly exactly when the rigs go down. It doesn't make much sense.

I had a similar issue to this a few months ago and was told it was an issue within NVOC. That when the miner switched to "donation" mode it would freeze and the solution was to switch miners. I've tried ETHMINER, GENOIL and CLAYMORE. All the same issue.

Oh, the miner you are referring to was an older version of the DSTM miner. The dev only had one pool configured for his donations and a network issue in Europe hosed a bunch of us for hours early one morning. That was fixed in a newer DSTM miner version, several versions ago. That was not a freeze. It was just the miner trying over and over to connect to something that it could not. When the watchdog saw that the GPUs were idle, it would restart the miner a few times and ultimately the box. This went on for hours and hours and even destroyed the boot USB drive for some folks running an older nvOC version.

Have you checked to see what the system logs (/var/logs) say? I am assuming when you say "freeze" that the entire host becomes unresponsive and has to be hard rebooted.

Yeah it completely freeze's and becomes unresponsive. Here is the error after turning on the logs.

tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
m1@m1-desktop:~$ tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
Watchdog for nvOC v0019-2.0 - Community Release
Version: v0019-2.0.011

LOG FILE: (Showing the last 10 recorded entries)
|  7  |    112W     |  3.87 Sol/W  |
|  8  |    116W     |  4.26 Sol/W  |
|  9  |     98W     |  3.82 Sol/W  |
| 10  |    120W     |  3.65 Sol/W  |
| 11  |    121W     |  3.68 Sol/W  |
| 12  |    118W     |  3.58 Sol/W  |
+-----+-------------+--------------+
CRITICAL: Sun Apr 29 09:37:52 MST 2018 - GPU Utilization is too low: restarting 3main...
WARNING: Mon Apr 30 02:25:19 MST 2018 - Internet is down, checking...
WARNING: Mon Apr 30 09:38:05 MST 2018 - Internet is down, checking...
 
Mon Apr 30 22:34:23 MST 2018 - No mining issues detected.
GPU UTILIZATION:  100 96 100 100 100 99 97 100 100 98 100 99 99
      GPU_COUNT:  13
 
Mon Apr 30 22:34:43 MST 2018 - GPU 2 under threshold found - GPU UTILIZATION:   59
Mon Apr 30 22:34:43 MST 2018 - GPU 3 under threshold found - GPU UTILIZATION:   0
Mon Apr 30 22:34:44 MST 2018 - GPU 4 under threshold found - GPU UTILIZATION:   0
Mon Apr 30 22:34:44 MST 2018 - GPU 5 under threshold found - GPU UTILIZATION:   0
Mon Apr 30 22:34:45 MST 2018 - GPU 6 under threshold found - GPU UTILIZATION:   0
Mon Apr 30 22:34:45 MST 2018 - GPU 7 under threshold found - GPU UTILIZATION:   0
Mon Apr 30 22:34:46 MST 2018 - GPU 8 under threshold found - GPU UTILIZATION:   10
Mon Apr 30 22:34:46 MST 2018 - GPU 9 under threshold found - GPU UTILIZATION:   0
Mon Apr 30 22:34:46 MST 2018 - GPU 10 under threshold found - GPU UTILIZATION:   0
Mon Apr 30 22:34:46 MST 2018 - GPU 11 under threshold found - GPU UTILIZATION:   0
Mon Apr 30 22:34:47 MST 2018 - GPU 12 under threshold found - GPU UTILIZATION:   0
Connection to google.com 443 port [tcp/https] succeeded!
Connection to google.com 443 port [tcp/https] succeeded!
WARNING: Mon Apr 30 22:34:47 MST 2018 - Found no miner, jumping to 3main restart


WARNING: Mon Apr 30 22:34:47 MST 2018 - Problem found: See diagnostics below:
Percent of GPUs bellow threshold: 84 %
name, pstate, temperature.gpu, fan.speed [%], utilization.gpu [%], power.draw [W], power.limit [W]
GeForce GTX 1070 Ti, P0, 63, 60 %, 27 %, 40.54 W, 120.00 W
GeForce GTX 1070 Ti, P0, 65, 75 %, 0 %, 32.57 W, 120.00 W
GeForce GTX 1070 Ti, P0, 68, 65 %, 0 %, 42.96 W, 120.00 W
GeForce GTX 1080 Ti, P0, 66, 60 %, 0 %, 60.21 W, 175.00 W
GeForce GTX 1070 Ti, P0, 60, 60 %, 0 %, 31.63 W, 120.00 W
GeForce GTX 1070 Ti, P0, 59, 60 %, 0 %, 27.96 W, 120.00 W
GeForce GTX 1070 Ti, P0, 65, 80 %, 0 %, 35.29 W, 120.00 W
GeForce GTX 1070 Ti, P0, 67, 90 %, 0 %, 41.77 W, 120.00 W
GeForce GTX 1080, P0, 61, 60 %, 0 %, 44.37 W, 120.00 W
GeForce GTX 1070, P0, 67, 80 %, 0 %, 36.41 W, 120.00 W
GeForce GTX 1070 Ti, P0, 67, 65 %, 0 %, 42.16 W, 120.00 W
GeForce GTX 1070 Ti, P0, 62, 60 %, 0 %, 34.22 W, 120.00 W
GeForce GTX 1070, P0, 60, 60 %, 0 %, 37.31 W, 120.00 W
+-----+-------------+--------------+
|  0  |    121W     |  3.50 Sol/W  |
|  1  |    122W     |  3.59 Sol/W  |
|  2  |    120W     |  3.69 Sol/W  |
|  3  |    138W     |  4.42 Sol/W  |
|  4  |    117W     |  3.83 Sol/W  |
|  5  |    118W     |  3.81 Sol/W  |
|  6  |    113W     |  3.81 Sol/W  |
|  7  |    125W     |  3.55 Sol/W  |
|  8  |    115W     |  4.23 Sol/W  |
|  9  |    120W     |  3.32 Sol/W  |
| 10  |    122W     |  3.49 Sol/W  |
| 11  |    118W     |  3.69 Sol/W  |
| 12  |    121W     |  3.40 Sol/W  |
+-----+-------------+--------------+
CRITICAL: Mon Apr 30 22:34:47 MST 2018 - GPU Utilization is too low: restarting 3main...
Mon Apr 30 22:37:07 MST 2018 - Back 'on watch' after miner restart
GPU UTILIZATION:  95 98 99 100 99 96 99 100 100 100 99 100 100
      GPU_COUNT:  13
urnzwy
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
May 01, 2018, 05:52:20 AM
 #7731

Error on rig 2 - Two different rigs, crashing within a minute of each other. Tell me that isn't weird.

tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
m1@m1-desktop:~$ tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
Watchdog for nvOC v0019-2.0 - Community Release
Version: v0019-2.0.011

LOG FILE: (Showing the last 10 recorded entries)
| 12  |    120W     |  3.42 Sol/W  |
+-----+-------------+--------------+
INFO 09:34:50: GPU3 Accepted share 186ms [A:454, R:1]
INFO 09:34:51: GPU7 Accepted share 187ms [A:477, R:1]
CRITICAL: Sun Apr 29 09:35:17 MST 2018 - GPU Utilization is too low: restarting 3main...
Mon Apr 30 22:35:29 MST 2018 - Lost GPU so restarting system. Found GPU's:
Unable to determine the device handle for GPU 0000:0F:00.0: GPU is lost.  Reboot the system to recover this GPU


Mon Apr 30 22:35:30 MST 2018 - reboot in 10 seconds
papampi
Full Member
***
Offline Offline

Activity: 686
Merit: 140


Linux FOREVER! Resistance is futile!!!


View Profile WWW
May 01, 2018, 06:11:45 AM
 #7732

Error on rig 2 - Two different rigs, crashing within a minute of each other. Tell me that isn't weird.

tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
m1@m1-desktop:~$ tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
Watchdog for nvOC v0019-2.0 - Community Release
Version: v0019-2.0.011

LOG FILE: (Showing the last 10 recorded entries)
| 12  |    120W     |  3.42 Sol/W  |
+-----+-------------+--------------+
INFO 09:34:50: GPU3 Accepted share 186ms [A:454, R:1]
INFO 09:34:51: GPU7 Accepted share 187ms [A:477, R:1]
CRITICAL: Sun Apr 29 09:35:17 MST 2018 - GPU Utilization is too low: restarting 3main...
Mon Apr 30 22:35:29 MST 2018 - Lost GPU so restarting system. Found GPU's:
Unable to determine the device handle for GPU 0000:0F:00.0: GPU is lost.  Reboot the system to recover this GPU


Mon Apr 30 22:35:30 MST 2018 - reboot in 10 seconds


If both rigs crash and freeze at the same time, it can be electrical problem
I had almost same issue a while back and some of my rigs were crashing all at the same time,
found out when one of the room venting fans was turning on it was making a high frequency noise in electricity and 3-4 rigs gets the lost gpu at the same time and reboot.
After a month of pulling my hairs to find the problem I changed that fan and problem solved.


Open 5watchdog
Change:
Code:
        echo "$(date) - Lost GPU so restarting system. Found GPU's:" | tee -a ${LOG_FILE}
To:
Code:
        echo "$(date) - Lost GPU $GPU so restarting system. Found GPU's:" | tee -a ${LOG_FILE}

So you can check GPU number that is lost, then check if it is the same GPU always get lost?
If its always the same GPU, remove it from the rig and check, may be a faulty GPU, riser or power cable.
If after removing the GPU, the problem jumps to another GPU then it could be a power problem.

urnzwy
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
May 01, 2018, 03:06:40 PM
 #7733

Error on rig 2 - Two different rigs, crashing within a minute of each other. Tell me that isn't weird.

tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
m1@m1-desktop:~$ tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
Watchdog for nvOC v0019-2.0 - Community Release
Version: v0019-2.0.011

LOG FILE: (Showing the last 10 recorded entries)
| 12  |    120W     |  3.42 Sol/W  |
+-----+-------------+--------------+
INFO 09:34:50: GPU3 Accepted share 186ms [A:454, R:1]
INFO 09:34:51: GPU7 Accepted share 187ms [A:477, R:1]
CRITICAL: Sun Apr 29 09:35:17 MST 2018 - GPU Utilization is too low: restarting 3main...
Mon Apr 30 22:35:29 MST 2018 - Lost GPU so restarting system. Found GPU's:
Unable to determine the device handle for GPU 0000:0F:00.0: GPU is lost.  Reboot the system to recover this GPU


Mon Apr 30 22:35:30 MST 2018 - reboot in 10 seconds


If both rigs crash and freeze at the same time, it can be electrical problem
I had almost same issue a while back and some of my rigs were crashing all at the same time,
found out when one of the room venting fans was turning on it was making a high frequency noise in electricity and 3-4 rigs gets the lost gpu at the same time and reboot.
After a month of pulling my hairs to find the problem I changed that fan and problem solved.


Open 5watchdog
Change:
Code:
        echo "$(date) - Lost GPU so restarting system. Found GPU's:" | tee -a ${LOG_FILE}
To:
Code:
        echo "$(date) - Lost GPU $GPU so restarting system. Found GPU's:" | tee -a ${LOG_FILE}

So you can check GPU number that is lost, then check if it is the same GPU always get lost?
If its always the same GPU, remove it from the rig and check, may be a faulty GPU, riser or power cable.
If after removing the GPU, the problem jumps to another GPU then it could be a power problem.

Hmmmm interesting about the electrical issue. I have two 8" hyperfans used as exhaust fans that run 100% 24/7. But could be a possibility.

Last night I recompiled the miners again, set the max fan limit to 90 from another one of your posts and set the power restore to 80 and changed EWBF to 3_3 from 3_4.

I will try this for a day and see if anything happens. It's just so strange that on Hush, it worked completely fine.

I had this kinda happen before and it was the mining server "disconnecting". Switched pools and all was good.

Any way to make watchdog wait an extended period of time for error's to clear themselves before trying to restart 3main?

Also, do the miners themselves if watch dog is disabled keep a max temp limit? I notice when starting EWBF it says max temp 90*. While I don't want temps that high, if it keeps the miner going and safe then I will consider it.
Stubo
Member
**
Offline Offline

Activity: 224
Merit: 13


View Profile
May 01, 2018, 06:28:29 PM
 #7734

Error on rig 2 - Two different rigs, crashing within a minute of each other. Tell me that isn't weird.

tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
m1@m1-desktop:~$ tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
Watchdog for nvOC v0019-2.0 - Community Release
Version: v0019-2.0.011

LOG FILE: (Showing the last 10 recorded entries)
| 12  |    120W     |  3.42 Sol/W  |
+-----+-------------+--------------+
INFO 09:34:50: GPU3 Accepted share 186ms [A:454, R:1]
INFO 09:34:51: GPU7 Accepted share 187ms [A:477, R:1]
CRITICAL: Sun Apr 29 09:35:17 MST 2018 - GPU Utilization is too low: restarting 3main...
Mon Apr 30 22:35:29 MST 2018 - Lost GPU so restarting system. Found GPU's:
Unable to determine the device handle for GPU 0000:0F:00.0: GPU is lost.  Reboot the system to recover this GPU


Mon Apr 30 22:35:30 MST 2018 - reboot in 10 seconds


If both rigs crash and freeze at the same time, it can be electrical problem
I had almost same issue a while back and some of my rigs were crashing all at the same time,
found out when one of the room venting fans was turning on it was making a high frequency noise in electricity and 3-4 rigs gets the lost gpu at the same time and reboot.
After a month of pulling my hairs to find the problem I changed that fan and problem solved.


Open 5watchdog
Change:
Code:
        echo "$(date) - Lost GPU so restarting system. Found GPU's:" | tee -a ${LOG_FILE}
To:
Code:
        echo "$(date) - Lost GPU $GPU so restarting system. Found GPU's:" | tee -a ${LOG_FILE}

So you can check GPU number that is lost, then check if it is the same GPU always get lost?
If its always the same GPU, remove it from the rig and check, may be a faulty GPU, riser or power cable.
If after removing the GPU, the problem jumps to another GPU then it could be a power problem.

Hmmmm interesting about the electrical issue. I have two 8" hyperfans used as exhaust fans that run 100% 24/7. But could be a possibility.

Last night I recompiled the miners again, set the max fan limit to 90 from another one of your posts and set the power restore to 80 and changed EWBF to 3_3 from 3_4.

I will try this for a day and see if anything happens. It's just so strange that on Hush, it worked completely fine.

I had this kinda happen before and it was the mining server "disconnecting". Switched pools and all was good.

Any way to make watchdog wait an extended period of time for error's to clear themselves before trying to restart 3main?

Also, do the miners themselves if watch dog is disabled keep a max temp limit? I notice when starting EWBF it says max temp 90*. While I don't want temps that high, if it keeps the miner going and safe then I will consider it.

The watchdog and the temp control are 2 different scripts so even if you disable the watchdog, the temp control will still do its thing. If you want to expand the time between checks for the watchdog, change the interval of the main loop. At the bottom of the script, you will see this line:
Code:
sleep 10

Change this to a larger value like 15 or 20. NOTE that increasing this value on a rig with a lot of GPUs will dramatically increase the amount of time before the watchdog bounces the miner in the event that a problem is detected on a single GPU.
papampi
Full Member
***
Offline Offline

Activity: 686
Merit: 140


Linux FOREVER! Resistance is futile!!!


View Profile WWW
May 01, 2018, 06:57:03 PM
 #7735

Error on rig 2 - Two different rigs, crashing within a minute of each other. Tell me that isn't weird.

tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
m1@m1-desktop:~$ tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
Watchdog for nvOC v0019-2.0 - Community Release
Version: v0019-2.0.011

LOG FILE: (Showing the last 10 recorded entries)
| 12  |    120W     |  3.42 Sol/W  |
+-----+-------------+--------------+
INFO 09:34:50: GPU3 Accepted share 186ms [A:454, R:1]
INFO 09:34:51: GPU7 Accepted share 187ms [A:477, R:1]
CRITICAL: Sun Apr 29 09:35:17 MST 2018 - GPU Utilization is too low: restarting 3main...
Mon Apr 30 22:35:29 MST 2018 - Lost GPU so restarting system. Found GPU's:
Unable to determine the device handle for GPU 0000:0F:00.0: GPU is lost.  Reboot the system to recover this GPU


Mon Apr 30 22:35:30 MST 2018 - reboot in 10 seconds


If both rigs crash and freeze at the same time, it can be electrical problem
I had almost same issue a while back and some of my rigs were crashing all at the same time,
found out when one of the room venting fans was turning on it was making a high frequency noise in electricity and 3-4 rigs gets the lost gpu at the same time and reboot.
After a month of pulling my hairs to find the problem I changed that fan and problem solved.


Open 5watchdog
Change:
Code:
        echo "$(date) - Lost GPU so restarting system. Found GPU's:" | tee -a ${LOG_FILE}
To:
Code:
        echo "$(date) - Lost GPU $GPU so restarting system. Found GPU's:" | tee -a ${LOG_FILE}

So you can check GPU number that is lost, then check if it is the same GPU always get lost?
If its always the same GPU, remove it from the rig and check, may be a faulty GPU, riser or power cable.
If after removing the GPU, the problem jumps to another GPU then it could be a power problem.

Hmmmm interesting about the electrical issue. I have two 8" hyperfans used as exhaust fans that run 100% 24/7. But could be a possibility.

Last night I recompiled the miners again, set the max fan limit to 90 from another one of your posts and set the power restore to 80 and changed EWBF to 3_3 from 3_4.

I will try this for a day and see if anything happens. It's just so strange that on Hush, it worked completely fine.

I had this kinda happen before and it was the mining server "disconnecting". Switched pools and all was good.

Any way to make watchdog wait an extended period of time for error's to clear themselves before trying to restart 3main?

Also, do the miners themselves if watch dog is disabled keep a max temp limit? I notice when starting EWBF it says max temp 90*. While I don't want temps that high, if it keeps the miner going and safe then I will consider it.


I have two 20 inches fans with a thermostat that start them when room temp goes above 15C and stop at 10C, and one of them causing problem at startup.
Anyway, I suggest you try out dstm zm miner, much better in my experience.
What are your PSUs? and how much you draw from them? and what about riser powers?

If both rigs restart 3main at same time when mining some coins and all is good with other coin try to change the pool.
Recently I'm getting lots of pool disconnect from MPH and all rigs restart 3main almost at same time.

urnzwy
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
May 01, 2018, 11:14:37 PM
 #7736

So it looks like the server is temp disconnecting and crashing the miners. This is going to happen. So I am going to disable watchdog and just let it re-connect on it's own.

I was on MPH and switched to miningspeed. Same issue. Maybe my rigs just don't like to be restarted and like to work all the time. lol




Latest Errors -

LOG FILE: (Showing the last 10 recorded entries)
CUDA: Device: 8 Thread exited with code: 46
CUDA: Device: 7 Thread exited with code: 46
CUDA: Device: 11 Thread exited with code: 46
CUDA: Device: 12 User selected solver: 0
CUDA: Device: 3 User selected solver: 0
CUDA: Device: 4 User selected solver: 0
CUDA: Device: 12 Thread exited with code: 46
CUDA: Device: 3 Thread exited with code: 46
CUDA: Device: 4 Thread exited with code: 46
CRITICAL: Tue May  1 15:17:20 MST 2018 - GPU Utilization is too low: restarting 3main...



LOG FILE: (Showing the last 10 recorded entries)
+-------------------------------------------------+
INFO: Server: mining.miningspeed.com:3062
INFO: Solver Auto.
INFO: Devices: All.
INFO: Temperature limit: 90
INFO: Api: Disabled
---------------------------------------------------
ERROR: Cannot connect to the server. 1
CRITICAL: Tue May  1 04:00:44 MST 2018 - GPU Utilization is too low: restarting 3main...
WARNING: Tue May  1 16:03:20 MST 2018 - Internet is down, checking...




ha5hi5h
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
May 02, 2018, 06:33:50 AM
 #7737

Hi,

I noticed that there are a few double periods in the 0miner file:

Line 883 onwards:
if [ $COIN == "PASC" ]
then
  HCD='/home/m1/pasc/sgminer'
  ADDR="$PASC_ADDRESS..$PASC_WORKER"


Is this right? Or should I reduce it to a single period?






If all of the double periods follow that example where they are between the address and worker name, then yes. There should only be a single period between them.

Thanks for the clarification - anyway to make sure this gets included in the next update? I was wondering why my worker didn't show up.
urnzwy
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
May 02, 2018, 04:27:39 PM
 #7738

So it looks like the server is temp disconnecting and crashing the miners. This is going to happen. So I am going to disable watchdog and just let it re-connect on it's own.

I was on MPH and switched to miningspeed. Same issue. Maybe my rigs just don't like to be restarted and like to work all the time. lol




Latest Errors -

LOG FILE: (Showing the last 10 recorded entries)
CUDA: Device: 8 Thread exited with code: 46
CUDA: Device: 7 Thread exited with code: 46
CUDA: Device: 11 Thread exited with code: 46
CUDA: Device: 12 User selected solver: 0
CUDA: Device: 3 User selected solver: 0
CUDA: Device: 4 User selected solver: 0
CUDA: Device: 12 Thread exited with code: 46
CUDA: Device: 3 Thread exited with code: 46
CUDA: Device: 4 Thread exited with code: 46
CRITICAL: Tue May  1 15:17:20 MST 2018 - GPU Utilization is too low: restarting 3main...



LOG FILE: (Showing the last 10 recorded entries)
+-------------------------------------------------+
INFO: Server: mining.miningspeed.com:3062
INFO: Solver Auto.
INFO: Devices: All.
INFO: Temperature limit: 90
INFO: Api: Disabled
---------------------------------------------------
ERROR: Cannot connect to the server. 1
CRITICAL: Tue May  1 04:00:44 MST 2018 - GPU Utilization is too low: restarting 3main...
WARNING: Tue May  1 16:03:20 MST 2018 - Internet is down, checking...






Well... Disabling watchdog didn't work. Still crashed Blah
urnzwy
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
May 02, 2018, 11:21:00 PM
 #7739

Error on rig 2 - Two different rigs, crashing within a minute of each other. Tell me that isn't weird.

tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
m1@m1-desktop:~$ tail -f  /home/m1/nvoc_logs/watchdog-screenlog.0
Watchdog for nvOC v0019-2.0 - Community Release
Version: v0019-2.0.011

LOG FILE: (Showing the last 10 recorded entries)
| 12  |    120W     |  3.42 Sol/W  |
+-----+-------------+--------------+
INFO 09:34:50: GPU3 Accepted share 186ms [A:454, R:1]
INFO 09:34:51: GPU7 Accepted share 187ms [A:477, R:1]
CRITICAL: Sun Apr 29 09:35:17 MST 2018 - GPU Utilization is too low: restarting 3main...
Mon Apr 30 22:35:29 MST 2018 - Lost GPU so restarting system. Found GPU's:
Unable to determine the device handle for GPU 0000:0F:00.0: GPU is lost.  Reboot the system to recover this GPU


Mon Apr 30 22:35:30 MST 2018 - reboot in 10 seconds


If both rigs crash and freeze at the same time, it can be electrical problem
I had almost same issue a while back and some of my rigs were crashing all at the same time,
found out when one of the room venting fans was turning on it was making a high frequency noise in electricity and 3-4 rigs gets the lost gpu at the same time and reboot.
After a month of pulling my hairs to find the problem I changed that fan and problem solved.


Open 5watchdog
Change:
Code:
        echo "$(date) - Lost GPU so restarting system. Found GPU's:" | tee -a ${LOG_FILE}
To:
Code:
        echo "$(date) - Lost GPU $GPU so restarting system. Found GPU's:" | tee -a ${LOG_FILE}

So you can check GPU number that is lost, then check if it is the same GPU always get lost?
If its always the same GPU, remove it from the rig and check, may be a faulty GPU, riser or power cable.
If after removing the GPU, the problem jumps to another GPU then it could be a power problem.

Hmmmm interesting about the electrical issue. I have two 8" hyperfans used as exhaust fans that run 100% 24/7. But could be a possibility.

Last night I recompiled the miners again, set the max fan limit to 90 from another one of your posts and set the power restore to 80 and changed EWBF to 3_3 from 3_4.

I will try this for a day and see if anything happens. It's just so strange that on Hush, it worked completely fine.

I had this kinda happen before and it was the mining server "disconnecting". Switched pools and all was good.

Any way to make watchdog wait an extended period of time for error's to clear themselves before trying to restart 3main?

Also, do the miners themselves if watch dog is disabled keep a max temp limit? I notice when starting EWBF it says max temp 90*. While I don't want temps that high, if it keeps the miner going and safe then I will consider it.


I have two 20 inches fans with a thermostat that start them when room temp goes above 15C and stop at 10C, and one of them causing problem at startup.
Anyway, I suggest you try out dstm zm miner, much better in my experience.
What are your PSUs? and how much you draw from them? and what about riser powers?

If both rigs restart 3main at same time when mining some coins and all is good with other coin try to change the pool.
Recently I'm getting lots of pool disconnect from MPH and all rigs restart 3main almost at same time.

I have two 1600W server PSU's for the cards (6 on each) one 850W EVGA ATX with one card / riser.

All using 6-8pin, risers are split once (1 cable per 2 risers)

Any chance it's a memory issue? I am running 4GB, my memory says 3.2GB / 4GB in use (85%).

Since disabling watchdog, it shouldn't be a server issue. The miner would just keep trying to reconnect.
Stubo
Member
**
Offline Offline

Activity: 224
Merit: 13


View Profile
May 03, 2018, 08:32:06 AM
 #7740

I have two 1600W server PSU's for the cards (6 on each) one 850W EVGA ATX with one card / riser.

All using 6-8pin, risers are split once (1 cable per 2 risers)

Any chance it's a memory issue? I am running 4GB, my memory says 3.2GB / 4GB in use (85%).

Since disabling watchdog, it shouldn't be a server issue. The miner would just keep trying to reconnect.

Check the logs (in /var/log). What does syslog say?
Pages: « 1 ... 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 [387] 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!