Bitcoin Forum
August 03, 2020, 01:08:00 PM *
News: Latest Bitcoin Core release: 0.20.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 [87] 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 ... 419 »
  Print  
Author Topic: [OS] nvOC easy-to-use Linux Nvidia Mining  (Read 416857 times)
Maxximus007
Full Member
***
Offline Offline

Activity: 154
Merit: 100


View Profile
July 10, 2017, 06:09:08 PM
 #1721

Hey fullzero, i have a question,

without a doubt my biggest problem right now is that when my miner crashes it takes the whole rig down with it, everything gets stuck, SSH barely works, average system load jumps to 14.5!! and Xorg takes up 100% of the CPU, its so bad that none of the standard reboot commands work, they just do nothing, the only thing that actually reboots the rig in this state is "echo b > /proc/sysrq-trigger" so i've set up a script that checks the average system load and if its over 2 it uses the command to reboot, and it works, but i dont like this "solution", yesterday after a reboot nvOC got corrupted somehow, lost my customized oneBash and the whole system became read-only (thankfully i had a oneBash backup that was only a few days behind).

so the question is, what can i do to relive this Xorg error, i run a 7 card rig and never plan on going for a higher number, what can i do with Xorg that would fix this?

Thanks.

@ tempgoga

It seems that whenever a soft crash occurs most of the cards drop to zero, so while the display/keyboard is unresponsive you can catch the soft crash from nvidia-smi. The script below checks card utilization, if it drops below 90% it counts down a minute and if mining hasn't resumed it reboots the system.
This seems to have worked at least once in my case (only got one soft crash this weekend) and the system recovered as expected.
the threshold values work for my setup but others may find different values optimal

Also if anyone knows a way to iterate the if && statements we can get the card count from "cards=$(nvidia-smi -L | wc -l); echo $cards" but the way below also works with manual editing to adjust the watchdog for the number of cards in you individual system.
___________
 
#!/bin/bash
#m1
threshold=90
while sleep 5
 do number=$(nvidia-smi |grep % |awk '{print $13}' |tr -d %)
 set -- $number
 echo -e "$@"
# The "if and" statements below need to be manually adjusted to match the number of cards in your system
# If you have 5 cards, leave is as, if a different number of cards remove or add the && statements as needed as in the example below
        if [[ "$1" -gt "$threshold" ]] && \
           [[ "$2" -gt "$threshold" ]] && \
           [[ "$3" -gt "$threshold" ]] && \
           [[ "$4" -gt "$threshold" ]] && \
           [[ "$5" -gt "$threshold" ]]
# && \
#          [[ "$6" -gt "$threshold" ]]
         then i=12
         echo OK
         else echo $((i--))
        fi
        if [ $i -le 0 ]
         then echo $(date) REBOOT due to soft crash >>~/watchdog.log
         sleep -5
         sudo shutdown now -r
        fi
done
___________

Hey thats funny I just made a script doing something similar, although it checks the powerdraw.
Here it is:
Code:
#!/bin/bash

# Miner restart script V001
# By Maxximus007
# for nvOC by fullzero
#
# POWERLIMIT MUST BE SET IN oneBash

#########################
### BELOW CODE, NO NEED FOR EDITING
#########################
echo "$(date) - Starting miner restart script." | tee -a ${LOG_FILE}
# Creating a log file to record restarts
LOG_FILE="/home/m1/restartlog.txt"
if [ ! -e "$LOG_FILE" ] ; then
    touch "$LOG_FILE"
fi

while true
do
sleep 60

GPUS=$(nvidia-smi --query-gpu=count --format=csv,noheader,nounits | tail -1)

gpu=0
COUNT_LOW_POWER=0

while [ $gpu -lt $GPUS ]
do
  { IFS=', ' read POWERDRAW POWERLIMIT; } < <( nvidia-smi -i $gpu --query-gpu=power.draw,power.limit --format=csv,noheader,nounits)

  let POWER_DIFF=$( printf "%.0f" $POWERLIMIT )-$( printf "%.0f" $POWERDRAW )

  # If current draw is 30 Watt lower than the limit count them:
  if [ "$POWER_DIFF" -gt "30" ]
  then
    let COUNT_LOW_POWER=COUNT_LOW_POWER+1
  fi

  let gpu=gpu+1
done

if [ $COUNT_LOW_POWER -eq $GPUS ]
then
  echo "$(date) - Power draw is too low: kill miner and oneBash" | tee -a ${LOG_FILE}
  # If miner runs in screen 'miner' kill the screen
  screen -X -S miner kill
  # Best to restart oneBash - settings might be adjusted already
  kill ps -ef | awk '$NF~"oneBash" {print $2}'
else
  echo "$(date) - All good! Will check again in 60 seconds"
fi

done

You can combine the above with your code, and find the utilization like this:
Code:
nvidia-smi -i 1 --query-gpu=utilization.gpu --format=csv,noheader,nounits
You have to iterate the GPU, starting at 0 to get them all
Okay I've combined the two, perhaps this will work for most of us:
Code:
#!/bin/bash

# Miner restart script V002
# By Maxximus007 && IAmNotAJeep
# for nvOC by fullzero
#

#########################
### BELOW CODE, NO NEED FOR EDITING
#########################
echo "$(date) - Starting miner restart script." | tee -a ${LOG_FILE}
# Creating a log file to record restarts
LOG_FILE="/home/m1/restartlog.txt"
if [ ! -e "$LOG_FILE" ] ; then
    touch "$LOG_FILE"
fi

MIN_UTIL=90
RESTART=0

while true
do
sleep 60

GPUS=$(nvidia-smi --query-gpu=count --format=csv,noheader,nounits | tail -1)

gpu=0
COUNT=0

while [ $gpu -lt $GPUS ]
do
  { IFS=', ' read UTIL; } < <( nvidia-smi -i $gpu --query-gpu=utilization.gpu --format=csv,noheader,nounits)

  let UTILIZATION=$( printf "%.0f" $UTIL )

  # If current utilizations lower than the limit count them:
  if [ $UTILIZATION -lt $MIN_UTIL ]
  then
    let COUNT=COUNT+1
  fi

  let gpu=gpu+1
done

if [ $COUNT -eq $GPUS ]
then
  if [ $RESTART -gt 1 ]
  then
    echo "$(date) - Utilization is too low: reviving did not work so restarting system" | tee -a ${LOG_FILE}
    sudo shutdown now -r
  fi
  echo "$(date) - Utilization is too low: kill miner and oneBash" | tee -a ${LOG_FILE}
  # If miner runs in screen 'miner' kill the screen
  screen -X -S miner kill
  # Best to restart oneBash - settings might be adjusted already
  kill ps -ef | awk '$NF~"oneBash" {print $2}'
  let RESTART=RESTART+1
else
  echo "$(date) - All good! Will check again in 60 seconds"
fi

done
1596460080
Hero Member
*
Offline Offline

Posts: 1596460080

View Profile Personal Message (Offline)

Ignore
1596460080
Reply with quote  #2

1596460080
Report to moderator
1596460080
Hero Member
*
Offline Offline

Posts: 1596460080

View Profile Personal Message (Offline)

Ignore
1596460080
Reply with quote  #2

1596460080
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1596460080
Hero Member
*
Offline Offline

Posts: 1596460080

View Profile Personal Message (Offline)

Ignore
1596460080
Reply with quote  #2

1596460080
Report to moderator
salfter
Hero Member
*****
Offline Offline

Activity: 650
Merit: 500


My PGP Key: 92C7689C


View Profile WWW
July 10, 2017, 06:16:09 PM
 #1722

I've done some testing with shell scripts that shows a way forward: stop the previous miner, stop X, unload the nVidia driver, reload the driver, restart X, and start the next miner.  This puts the GPUs back to a known-good state before getting back to mining.  I've switched back and forth between a known-troublesome pair of miners, and it hasn't failed yet.  I'm going to put these changes into the switcher next and see how it goes.

This part seems to be working well, but now I'm running into a problem where ccminer (used by several algos) doesn't want to start within a screen session.  It works fine if started at the command line by itself.  It works fine inside an already-running screen session.  It falls on its face when the invocation is preceded by "screen -dmS miner":

/home/m1/SPccminer/ccminer: error while loading shared libraries: libcudart.so.8.0: cannot open shared object file: No such file or directory

libcudart.so.8.0 is in /usr/local/cuda-8.0/lib64.

There appears to be something different in the environment when screen is starting up vs. the rest of the time.  Here's a quick hack which fixes it, but I suspect this really shouldn't be necessary:

Code:
sudo ln -s /usr/local/cuda-8.0/lib64/libcudart.so.8.0 /usr/lib/

With this fix in place, most miners respond well to the switch...except the Pascal miner.  It takes its time responding to SIGTERM, and there's a higher likelihood of a GPU still falling off the bus, locking up, or whatever, necessitating a reboot.  Given that it's moving well into negative territory WRT profitability anyway (currently -$0.23 on my rig), I might just disable it and continue testing with the other miners.


Tipjars: BTC 1TipsGocnz2N5qgAm9f7JLrsMqkb3oXe2 LTC LTipsVC7XaFy9M6Zaf1aGGe8w8xVUeWFvR | My Bitcoin Note Generator | Pool Auto-Switchers: zpool MiningPoolHub NiceHash
Bitgem Resources: Pool Explorer Paper Wallet
IAmNotAJeep
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
July 10, 2017, 06:32:34 PM
 #1723

Hey fullzero, i have a question,

without a doubt my biggest problem right now is that when my miner crashes it takes the whole rig down with it, everything gets stuck, SSH barely works, average system load jumps to 14.5!! and Xorg takes up 100% of the CPU, its so bad that none of the standard reboot commands work, they just do nothing, the only thing that actually reboots the rig in this state is "echo b > /proc/sysrq-trigger" so i've set up a script that checks the average system load and if its over 2 it uses the command to reboot, and it works, but i dont like this "solution", yesterday after a reboot nvOC got corrupted somehow, lost my customized oneBash and the whole system became read-only (thankfully i had a oneBash backup that was only a few days behind).

so the question is, what can i do to relive this Xorg error, i run a 7 card rig and never plan on going for a higher number, what can i do with Xorg that would fix this?

Thanks.

@ tempgoga

It seems that whenever a soft crash occurs most of the cards drop to zero, so while the display/keyboard is unresponsive you can catch the soft crash from nvidia-smi. The script below checks card utilization, if it drops below 90% it counts down a minute and if mining hasn't resumed it reboots the system.
This seems to have worked at least once in my case (only got one soft crash this weekend) and the system recovered as expected.
the threshold values work for my setup but others may find different values optimal

Also if anyone knows a way to iterate the if && statements we can get the card count from "cards=$(nvidia-smi -L | wc -l); echo $cards" but the way below also works with manual editing to adjust the watchdog for the number of cards in you individual system.
___________
 
#!/bin/bash
#m1
threshold=90
while sleep 5
 do number=$(nvidia-smi |grep % |awk '{print $13}' |tr -d %)
 set -- $number
 echo -e "$@"
# The "if and" statements below need to be manually adjusted to match the number of cards in your system
# If you have 5 cards, leave is as, if a different number of cards remove or add the && statements as needed as in the example below
        if [[ "$1" -gt "$threshold" ]] && \
           [[ "$2" -gt "$threshold" ]] && \
           [[ "$3" -gt "$threshold" ]] && \
           [[ "$4" -gt "$threshold" ]] && \
           [[ "$5" -gt "$threshold" ]]
# && \
#          [[ "$6" -gt "$threshold" ]]
         then i=12
         echo OK
         else echo $((i--))
        fi
        if [ $i -le 0 ]
         then echo $(date) REBOOT due to soft crash >>~/watchdog.log
         sleep -5
         sudo shutdown now -r
        fi
done
___________

Hey thats funny I just made a script doing something similar, although it checks the powerdraw.
Here it is:
Code:
#!/bin/bash

# Miner restart script V001
# By Maxximus007
# for nvOC by fullzero
#
# POWERLIMIT MUST BE SET IN oneBash

#########################
### BELOW CODE, NO NEED FOR EDITING
#########################
echo "$(date) - Starting miner restart script." | tee -a ${LOG_FILE}
# Creating a log file to record restarts
LOG_FILE="/home/m1/restartlog.txt"
if [ ! -e "$LOG_FILE" ] ; then
    touch "$LOG_FILE"
fi

while true
do
sleep 60

GPUS=$(nvidia-smi --query-gpu=count --format=csv,noheader,nounits | tail -1)

gpu=0
COUNT_LOW_POWER=0

while [ $gpu -lt $GPUS ]
do
  { IFS=', ' read POWERDRAW POWERLIMIT; } < <( nvidia-smi -i $gpu --query-gpu=power.draw,power.limit --format=csv,noheader,nounits)

  let POWER_DIFF=$( printf "%.0f" $POWERLIMIT )-$( printf "%.0f" $POWERDRAW )

  # If current draw is 30 Watt lower than the limit count them:
  if [ "$POWER_DIFF" -gt "30" ]
  then
    let COUNT_LOW_POWER=COUNT_LOW_POWER+1
  fi

  let gpu=gpu+1
done

if [ $COUNT_LOW_POWER -eq $GPUS ]
then
  echo "$(date) - Power draw is too low: kill miner and oneBash" | tee -a ${LOG_FILE}
  # If miner runs in screen 'miner' kill the screen
  screen -X -S miner kill
  # Best to restart oneBash - settings might be adjusted already
  kill ps -ef | awk '$NF~"oneBash" {print $2}'
else
  echo "$(date) - All good! Will check again in 60 seconds"
fi

done

You can combine the above with your code, and find the utilization like this:
Code:
nvidia-smi -i 1 --query-gpu=utilization.gpu --format=csv,noheader,nounits
You have to iterate the GPU, starting at 0 to get them all
Okay I've combined the two, perhaps this will work for most of us:
Code:
#!/bin/bash

# Miner restart script V002
# By Maxximus007 && IAmNotAJeep
# for nvOC by fullzero
#

#########################
### BELOW CODE, NO NEED FOR EDITING
#########################
echo "$(date) - Starting miner restart script." | tee -a ${LOG_FILE}
# Creating a log file to record restarts
LOG_FILE="/home/m1/restartlog.txt"
if [ ! -e "$LOG_FILE" ] ; then
    touch "$LOG_FILE"
fi

MIN_UTIL=90
RESTART=0

while true
do
sleep 60

GPUS=$(nvidia-smi --query-gpu=count --format=csv,noheader,nounits | tail -1)

gpu=0
COUNT=0

while [ $gpu -lt $GPUS ]
do
  { IFS=', ' read UTIL; } < <( nvidia-smi -i $gpu --query-gpu=utilization.gpu --format=csv,noheader,nounits)

  let UTILIZATION=$( printf "%.0f" $UTIL )

  # If current utilizations lower than the limit count them:
  if [ $UTILIZATION -lt $MIN_UTIL ]
  then
    let COUNT=COUNT+1
  fi

  let gpu=gpu+1
done

if [ $COUNT -eq $GPUS ]
then
  if [ $RESTART -gt 1 ]
  then
    echo "$(date) - Utilization is too low: reviving did not work so restarting system" | tee -a ${LOG_FILE}
    sudo shutdown now -r
  fi
  echo "$(date) - Utilization is too low: kill miner and oneBash" | tee -a ${LOG_FILE}
  # If miner runs in screen 'miner' kill the screen
  screen -X -S miner kill
  # Best to restart oneBash - settings might be adjusted already
  kill ps -ef | awk '$NF~"oneBash" {print $2}'
  let RESTART=RESTART+1
else
  echo "$(date) - All good! Will check again in 60 seconds"
fi

done

Pretty cool!  I'll try it tonight, lets hope this put the softcrash issues behind us.
Master89
Newbie
*
Offline Offline

Activity: 5
Merit: 0


View Profile
July 10, 2017, 08:38:54 PM
Last edit: July 10, 2017, 08:52:19 PM by Master89
 #1724

ubuntu server 16.4
driver nvidia_381

does not work
X server nvidia-settings

sudo nvidia-settings
Failed to connect to Mir: Failed to connect to server socket
Unable to init server: Could not connect

ERROR: The control display is undefined; please run `nvidia-settings --help`
for usage information.

Reinstall does not help
salfter
Hero Member
*****
Offline Offline

Activity: 650
Merit: 500


My PGP Key: 92C7689C


View Profile WWW
July 10, 2017, 09:46:34 PM
 #1725

With this fix in place, most miners respond well to the switch...except the Pascal miner.  It takes its time responding to SIGTERM, and there's a higher likelihood of a GPU still falling off the bus, locking up, or whatever, necessitating a reboot.  Given that it's moving well into negative territory WRT profitability anyway (currently -$0.23 on my rig), I might just disable it and continue testing with the other miners.

So far, it's switched at least once without issue, as it's now running an equihash miner instead of a daggerhashimoto miner.  If I can get through another 24 hours without having to reset the miner, I think we can call the current iteration good.  Current profitability:

neoscrypt: 0.00083226 BTC/day (1.95 USD/day)
lyra2rev2: 0.00033541 BTC/day (0.79 USD/day)
daggerhashimoto: 0.00145476 BTC/day (3.42 USD/day)
lbry: 0.00025065 BTC/day (0.59 USD/day)
equihash: 0.00157431 BTC/day (3.70 USD/day)
sia: 0.00019036 BTC/day (0.45 USD/day)

Tipjars: BTC 1TipsGocnz2N5qgAm9f7JLrsMqkb3oXe2 LTC LTipsVC7XaFy9M6Zaf1aGGe8w8xVUeWFvR | My Bitcoin Note Generator | Pool Auto-Switchers: zpool MiningPoolHub NiceHash
Bitgem Resources: Pool Explorer Paper Wallet
fullzero
Legendary
*
Offline Offline

Activity: 1246
Merit: 1009



View Profile
July 10, 2017, 10:14:01 PM
 #1726

Hey fullzero, i have a question,

without a doubt my biggest problem right now is that when my miner crashes it takes the whole rig down with it, everything gets stuck, SSH barely works, average system load jumps to 14.5!! and Xorg takes up 100% of the CPU, its so bad that none of the standard reboot commands work, they just do nothing, the only thing that actually reboots the rig in this state is "echo b > /proc/sysrq-trigger" so i've set up a script that checks the average system load and if its over 2 it uses the command to reboot, and it works, but i dont like this "solution", yesterday after a reboot nvOC got corrupted somehow, lost my customized oneBash and the whole system became read-only (thankfully i had a oneBash backup that was only a few days behind).

so the question is, what can i do to relive this Xorg error, i run a 7 card rig and never plan on going for a higher number, what can i do with Xorg that would fix this?

Thanks.

Some members are working on a watchdog to help with this; I will add making a build which uses the integrated graphics to the list.


fullzero
Legendary
*
Offline Offline

Activity: 1246
Merit: 1009



View Profile
July 10, 2017, 10:18:09 PM
 #1727

Hi,

I have few Gigabyte Z270-gaming K3 mobos...they have killer networks E2500 LAN and nvOC boots without LAN. I assume there is no driver support in this distribution. So can you tell me how to install LAN driver, manually or can it be inserted in next version (18) ?.

Or any other solution...

Let me know

Best regards

Personally I dislike Killer Ethernet NICs.

I would get one of these or similar for each mobo and never use the Killer NICs.

https://www.amazon.com/Cable-Matters-Ethernet-Network-Adapter/dp/B00ET4KHJ2

Any of the usb 2.0 adapters should be more than enough for a mining rig.

fullzero
Legendary
*
Offline Offline

Activity: 1246
Merit: 1009



View Profile
July 10, 2017, 10:23:16 PM
 #1728

@fullzero, great to see the new oneBash with lots of stuff integrated!

I made a small error in the autotemp code:
Code:
 { IFS=', ' read CURRENT_TEMP CURRENT_FAN POWERDRAW POWERLIMIT; } < <( nvidia-smi -i 1 --query-gpu=temperature.gpu,fan.speed,power.draw,power.limit --format=csv,noheader,nounits)
This needs to be:
Code:
 { IFS=', ' read CURRENT_TEMP CURRENT_FAN POWERDRAW POWERLIMIT; } < <( nvidia-smi -i $gpu --query-gpu=temperature.gpu,fan.speed,power.draw,power.limit --format=csv,noheader,nounits)

In the current code it only checks the second GPU, and applies it for all. Sorry for that.

I will change this; to clarify the typo is:

-i 1

should be:

-i $gpu

fullzero
Legendary
*
Offline Offline

Activity: 1246
Merit: 1009



View Profile
July 10, 2017, 10:26:40 PM
 #1729

I have had a lot of requests for this; so here is a new oneBash and modded switch file which implement full integration of SALFTER_NICEHASH_PROFIT_SWITCHING

see the OP for links:

Replace your current oneBash with the new one.

extract switch and move it to the:
Code:
/home/m1

directory

(the one which opens when you click the Files icon on the left)

configure the following in oneBash

Code:
SALFTER_NICEHASH_PROFIT_SWITCHING="YES"

# LOCAL will attach the mining process to the guake terminal
# REMOTE will leave it unattached / ready for SSH
LOCALorREMOTE="LOCAL"       # LOCAL  or  REMOTE

CURRENCY=USD
POWER_COST=0.10
MINIMUM_PROFIT=0.0
# this is salfters BTC address:
PAYMENT_ADDRESS=1TipsGocnz2N5qgAm9f7JLrsMqkb3oXe2
WORKER_NAME=nv$IP_AS_WORKER

daggerhashimoto_POWERLIMIT_WATTS=125
__daggerhashimoto_CORE_OVERCLOCK=100
daggerhashimoto_MEMORY_OVERCLOCK=100
_______daggerhashimoto_FAN_SPEED=75

equihash_POWERLIMIT_WATTS=125
__equihash_CORE_OVERCLOCK=100
equihash_MEMORY_OVERCLOCK=100
_______equihash_FAN_SPEED=75

neoscrypt_POWERLIMIT_WATTS=125
__neoscrypt_CORE_OVERCLOCK=100
neoscrypt_MEMORY_OVERCLOCK=100
_______neoscrypt_FAN_SPEED=75

lyra2rev2_POWERLIMIT_WATTS=125
__lyra2rev2_CORE_OVERCLOCK=100
lyra2rev2_MEMORY_OVERCLOCK=100
_______lyra2rev2_FAN_SPEED=75

lbry_POWERLIMIT_WATTS=125
__lbry_CORE_OVERCLOCK=100
lbry_MEMORY_OVERCLOCK=100
_______lbry_FAN_SPEED=75

pascal_POWERLIMIT_WATTS=125
__pascal_CORE_OVERCLOCK=100
pascal_MEMORY_OVERCLOCK=100
_______pascal_FAN_SPEED=75

remember to thank salfter if you use this  Smiley



i havent been on in awhile to see if this question has been asked but

how do we set individual limits with this

i dont want my 1060s running the same as a 1080ti in the same system

any help would be nice thanks

It doesn't currently support individual powerlimits.

salfter set this up as its own module; that being the case: it would be easiest if he added an individual powerlimit implementation to the switch file.

fullzero
Legendary
*
Offline Offline

Activity: 1246
Merit: 1009



View Profile
July 10, 2017, 11:18:09 PM
 #1730


1

First configure this section of oneBash:

Code:
SALFTER_NICEHASH_PROFIT_SWITCHING="YES"

# LOCAL will attach the mining process to the guake terminal
# REMOTE will leave it unattached / ready for SSH
LOCALorREMOTE="LOCAL"       # LOCAL  or  REMOTE

CURRENCY=USD
POWER_COST=0.10
MINIMUM_PROFIT=0.0
# this is salfters BTC address:
PAYMENT_ADDRESS=1TipsGocnz2N5qgAm9f7JLrsMqkb3oXe2
WORKER_NAME=nv$IP_AS_WORKER

daggerhashimoto_POWERLIMIT_WATTS=125
__daggerhashimoto_CORE_OVERCLOCK=100
daggerhashimoto_MEMORY_OVERCLOCK=100
_______daggerhashimoto_FAN_SPEED=75

equihash_POWERLIMIT_WATTS=125
__equihash_CORE_OVERCLOCK=100
equihash_MEMORY_OVERCLOCK=100
_______equihash_FAN_SPEED=75

neoscrypt_POWERLIMIT_WATTS=125
__neoscrypt_CORE_OVERCLOCK=100
neoscrypt_MEMORY_OVERCLOCK=100
_______neoscrypt_FAN_SPEED=75

lyra2rev2_POWERLIMIT_WATTS=125
__lyra2rev2_CORE_OVERCLOCK=100
lyra2rev2_MEMORY_OVERCLOCK=100
_______lyra2rev2_FAN_SPEED=75

lbry_POWERLIMIT_WATTS=125
__lbry_CORE_OVERCLOCK=100
lbry_MEMORY_OVERCLOCK=100
_______lbry_FAN_SPEED=75

pascal_POWERLIMIT_WATTS=125
__pascal_CORE_OVERCLOCK=100
pascal_MEMORY_OVERCLOCK=100
_______pascal_FAN_SPEED=75

ensure:

Code:
SALFTER_NICEHASH_PROFIT_SWITCHING="YES"

and replace salfters BTC address with your own:
Code:
PAYMENT_ADDRESS=1TipsGocnz2N5qgAm9f7JLrsMqkb3oXe2

salfter implemented this for nicehash only.  It makes a call to a nicehash api and receives the current profitability data. 

Using your input power cost (and what I am guessing are salfters benchmarks for each algo using 2x 1070s) it calculates which coin is currently the most profitable to mine.

It then stops any mining process, and starts a new one with the most profitable coin and your OC settings for that coin.

================================================================================

Code:
SALFTER_NICEHASH_PROFIT_SWITCHING="YES"

# LOCAL will attach the mining process to the guake terminal
# REMOTE will leave it unattached / ready for SSH
LOCALorREMOTE="LOCAL"       # LOCAL  or  REMOTE

CURRENCY=USD
POWER_COST=0.20
MINIMUM_PROFIT=2.5
# this is salfters BTC address:
PAYMENT_ADDRESS=1QJ6j3fY6fCRsN1WJqZ65U52Et4TVL9e7P
WORKER_NAME=$IP_AS_WORKER

daggerhashimoto_POWERLIMIT_WATTS=95
__daggerhashimoto_CORE_OVERCLOCK=150
daggerhashimoto_MEMORY_OVERCLOCK=1200
_______daggerhashimoto_FAN_SPEED=65

equihash_POWERLIMIT_WATTS=95
__equihash_CORE_OVERCLOCK=150
equihash_MEMORY_OVERCLOCK=1200
_______equihash_FAN_SPEED=65

neoscrypt_POWERLIMIT_WATTS=95
__neoscrypt_CORE_OVERCLOCK=150
neoscrypt_MEMORY_OVERCLOCK=1200
_______neoscrypt_FAN_SPEED=65

lyra2rev2_POWERLIMIT_WATTS=95
__lyra2rev2_CORE_OVERCLOCK=150
lyra2rev2_MEMORY_OVERCLOCK=1200
_______lyra2rev2_FAN_SPEED=65

lbry_POWERLIMIT_WATTS=95
__lbry_CORE_OVERCLOCK=150
lbry_MEMORY_OVERCLOCK=1200
_______lbry_FAN_SPEED=65

pascal_POWERLIMIT_WATTS=95
__pascal_CORE_OVERCLOCK=150
pascal_MEMORY_OVERCLOCK=1200
_______pascal_FAN_SPEED=65

I've done the above, mining hasn't started (have i missed to copy/include anything?), I'm using v0017 as is (My Mob is Asus Z270P with 8 GTX 6G 1060's)

Saw this as output :

Code:
m1@m1-desktop:~$ screen -r miner
There is no screen to be resumed matching miner.
m1@m1-desktop:~$ screen -r miner
There is no screen to be resumed matching miner.
m1@m1-desktop:~$

================================================================================

Set:

MINIMUM_PROFIT=0

and tell me if there is a difference.




2

You can use the SALFTER_NICEHASH_PROFIT_SWITCHING, or you can use the:

Code:
NICE_ETHASH

COIN selection

I still need to add all the other nicehash algos as normal COIN selections.

Nicehash does use a BTC payout address, when using the NICE_ETHASH COIN selection set this in this area of the oneBash settings:

Code:
# if YES ensure you update BTC_ADDRESS
VTC_AUTOCONVERT_TO_BTC="YES"        #YES  NO
VTC_WORKER="nv$IP_AS_WORKER"
VTC_ADDRESS="VsvtYL2mz3YFM3fpt5pb28zHodTbnJodRc"
VTC_POOL="stratum+tcp://lyra2v2.mine.zpool.ca:4533"

BTC_ADDRESS="18Y5HYe3BAwAhTAkFLbD52o8NqtrN3DtpF"

# NICE_ETHASH autoconverts to BTC: ensure you update BTC_ADDRESS if you use NICE_ETHASH
NICE_ETHASH_WORKER="nv$IP_AS_WORKER"
NICE_ETHASH_POOL="stratum+tcp://daggerhashimoto.usa.nicehash.com:3353"
GENOIL_NICE_ETHASH_POOL="daggerhashimoto.usa.nicehash.com:3353"
NICE_ETHASH_EXTENTION_ARGUMENTS=""   # add any additional claymore arguments desired here

this line:

Code:
BTC_ADDRESS="18Y5HYe3BAwAhTAkFLbD52o8NqtrN3DtpF"


================================================================================

I've added my BTC address, if I run the oneBash with 'NICE_ETHASH' coin selection, which coin does it mines? (I know you said there is no coin selection/algos currently, just curious to know what it selects and mine)
No matter what it mines, will it just coverts into BTC and pay to my BTC address?

I've tried it using my BTC address for a while, it has started mining but not sure what it was mining (it was using Genoil)!, but saw ETH share accepted message most of the times ("stratum+tcp://daggerhashimoto.usa.nicehash.com:3353")

Also where can I check how many shares per our stats related to my BTC address or Miner while mining with 'NICE_ETHASH'

================================================================================

It is mining whatever nicehash thinks is the most profitible Ethash Coin at the time.

It will payout in BTC to the
 
BTC_ADDRESS

assuming: 

1QJ6j3fY6fCRsN1WJqZ65U52Et4TVL9e7P 

is your BTC address

you would go to:

https://new.nicehash.com/miner/1QJ6j3fY6fCRsN1WJqZ65U52Et4TVL9e7P

to see your nicehash stats





3


at the top of oneBash ensure COIN is set to:
Code:
COIN="DUAL_ETH_DCR"

then set your ETH settings here:

Code:
ETHERMINEdotORG="NO"

CLAYMORE_VERSION="9_5"    # choose 9_5  or  9_4  or  8_0

GENOILorCLAYMORE="GENOIL"  # choose GENOIL  or  CLAYMORE


ETH_WORKER="nv$IP_AS_WORKER"
ETH_ADDRESS="0xe12bdd454997e443ec0cae6bebb6bb3c74242aae"
ETH_POOL="eth-us-east1.nanopool.org:9999"
ETH_EXTENTION_ARGUMENTS=""    # add any additional claymore arguments desired here


then set your DCR settings here:
Code:
DCR_WORKER="nv$IP_AS_WORKER"
DCR_ADDRESS="fullzero22"
DCR_POOL="stratum+tcp://dcr.suprnova.cc:3252"

Note that with supernova you need to set the workername beforehand, so I recommend changing the workername to whatever you have set already rather than making a new worker with the auto generated workername.

replace:  DCR_ADDRESS="fullzero22" with  your supernova username


Let me know if all this makes sense.



================================================================================

Yes it does makes sense, some of it, now I'm able to mine DCR with suprnova, using Claymore 9_7, Thanks for that Amigo Smiley

But I still have some questions regarding NICE HASH mining, can we mine any coin with nicehash pools? How can we check share rate while using nicehash pools (daggerhashimoto ?)?

I may still need some help to understand my 2nd question, will try to google and find some info,

Thank you so much for the replies mate, that really means a lot to people like me, wish I could give something back to community like you are doing Smiley

================================================================================

Answer to 3 is in the response to 2.

fullzero
Legendary
*
Offline Offline

Activity: 1246
Merit: 1009



View Profile
July 10, 2017, 11:25:07 PM
 #1731

Since I got problems with w10 I decided to give this distro a try and I am very surprised things went so easy, good job at this one!

I use this on a headlesss rig and atm I only know that this rig is running by checking pool URL or loginto it with SSH and using nvidia-smi for checking temps.

I am struggling with administrating, a couple of questions:

1. how can I get the terminal output of the automatic starting process (claymore) over SSH?

2. I configured teamviewer to start but how can I connectto it when I can't see the desktop?

3. On Windows,when mining monero with xmr-stak-cpu I was ableto achive about 200-250h/s with my amd fx6300. on the wolfminer its 100~150. Any tweaking options or possible to change to stak-miner?

keep up the good work!

2:  As far as I know: you have to setup teamviewer via its GUI; So you are going to need to connect a monitor for that.  Could be wrong about this; I don't use teamviewer myself; so other members would know better than I.

3:  The cpu miner is intended to be used in a minial capacity so as to not interfere with the GPU mining.  Depending on the GPU COIN you are mining; heavy CPU use will impact GPU mining.  I would look at the OP for the cpu client:  https://bitcointalk.org/index.php?topic=1326803.0 for optimal settings.



fullzero
Legendary
*
Offline Offline

Activity: 1246
Merit: 1009



View Profile
July 10, 2017, 11:39:28 PM
 #1732

Hi fullzero.
Thanks for your great work.
I am using your bash and I rewrite some parts for my own needs.
I have some suggestions you can use "export INDIVIDUAL_POWERLIMIT_0 ....." instead of using

Quote
echo $INDIVIDUAL_POWERLIMIT_0 > '/home/m1/p0'
and
POWER_LIMIT[0]=$(cat /home/m1/p0 | sed '/ /d')
and
rm /home/m1/p0

Just write "export ALL YOUR variables" that you need in next bash, before running bash file.
IN bash file you can just call them as always.


I added an email notification when certain conditions are being met.
Put all the control code in a separate file, the settings of wallets in a separate file, the settings of the cards are in another file, fans control ins a separate file, emails send is also a separate file.
Also made a web page on which using iframes and gotty https://github.com/yudai/gotty you can watch GPU mining process, CPU mining process and card info (temp, fans, power draw and so on).
If someone is interesting in these features - white, and I will post my code.

Yes; in the auto Temp I could use export instead.

In general I would like to offer as many options to members as possible and encourage members to share their customizations. 

With any contribution, so long as it is not a blatant copy of a prior contribution; I will integrate it and add acknowledgement and BTC donation address to the top of oneBash. 

If you use or make an executable; you must provide me with the source code and I will examine and compile it myself. 

Your contribution must be free and open source.

I would like to avoid solutions involving the installation of multiple programs or services; to so something that can be done via SSH, SFTP, or a socket.

fullzero
Legendary
*
Offline Offline

Activity: 1246
Merit: 1009



View Profile
July 10, 2017, 11:44:58 PM
 #1733

I've done some testing with shell scripts that shows a way forward: stop the previous miner, stop X, unload the nVidia driver, reload the driver, restart X, and start the next miner.  This puts the GPUs back to a known-good state before getting back to mining.  I've switched back and forth between a known-troublesome pair of miners, and it hasn't failed yet.  I'm going to put these changes into the switcher next and see how it goes.

This part seems to be working well, but now I'm running into a problem where ccminer (used by several algos) doesn't want to start within a screen session.  It works fine if started at the command line by itself.  It works fine inside an already-running screen session.  It falls on its face when the invocation is preceded by "screen -dmS miner":

/home/m1/SPccminer/ccminer: error while loading shared libraries: libcudart.so.8.0: cannot open shared object file: No such file or directory

libcudart.so.8.0 is in /usr/local/cuda-8.0/lib64.

There appears to be something different in the environment when screen is starting up vs. the rest of the time.  Here's a quick hack which fixes it, but I suspect this really shouldn't be necessary:

Code:
sudo ln -s /usr/local/cuda-8.0/lib64/libcudart.so.8.0 /usr/lib/

With this fix in place, most miners respond well to the switch...except the Pascal miner.  It takes its time responding to SIGTERM, and there's a higher likelihood of a GPU still falling off the bus, locking up, or whatever, necessitating a reboot.  Given that it's moving well into negative territory WRT profitability anyway (currently -$0.23 on my rig), I might just disable it and continue testing with the other miners.

When starting up, the environment should have:

sudo ldconfig /usr/local/cuda/lib64

this might get lost with a switch

I agree that Pascal is essentially useless right now.


fullzero
Legendary
*
Offline Offline

Activity: 1246
Merit: 1009



View Profile
July 10, 2017, 11:48:27 PM
 #1734

Hey fullzero, i have a question,

without a doubt my biggest problem right now is that when my miner crashes it takes the whole rig down with it, everything gets stuck, SSH barely works, average system load jumps to 14.5!! and Xorg takes up 100% of the CPU, its so bad that none of the standard reboot commands work, they just do nothing, the only thing that actually reboots the rig in this state is "echo b > /proc/sysrq-trigger" so i've set up a script that checks the average system load and if its over 2 it uses the command to reboot, and it works, but i dont like this "solution", yesterday after a reboot nvOC got corrupted somehow, lost my customized oneBash and the whole system became read-only (thankfully i had a oneBash backup that was only a few days behind).

so the question is, what can i do to relive this Xorg error, i run a 7 card rig and never plan on going for a higher number, what can i do with Xorg that would fix this?

Thanks.

@ tempgoga

It seems that whenever a soft crash occurs most of the cards drop to zero, so while the display/keyboard is unresponsive you can catch the soft crash from nvidia-smi. The script below checks card utilization, if it drops below 90% it counts down a minute and if mining hasn't resumed it reboots the system.
This seems to have worked at least once in my case (only got one soft crash this weekend) and the system recovered as expected.
the threshold values work for my setup but others may find different values optimal

Also if anyone knows a way to iterate the if && statements we can get the card count from "cards=$(nvidia-smi -L | wc -l); echo $cards" but the way below also works with manual editing to adjust the watchdog for the number of cards in you individual system.
___________
 
#!/bin/bash
#m1
threshold=90
while sleep 5
 do number=$(nvidia-smi |grep % |awk '{print $13}' |tr -d %)
 set -- $number
 echo -e "$@"
# The "if and" statements below need to be manually adjusted to match the number of cards in your system
# If you have 5 cards, leave is as, if a different number of cards remove or add the && statements as needed as in the example below
        if [[ "$1" -gt "$threshold" ]] && \
           [[ "$2" -gt "$threshold" ]] && \
           [[ "$3" -gt "$threshold" ]] && \
           [[ "$4" -gt "$threshold" ]] && \
           [[ "$5" -gt "$threshold" ]]
# && \
#          [[ "$6" -gt "$threshold" ]]
         then i=12
         echo OK
         else echo $((i--))
        fi
        if [ $i -le 0 ]
         then echo $(date) REBOOT due to soft crash >>~/watchdog.log
         sleep -5
         sudo shutdown now -r
        fi
done
___________

Hey thats funny I just made a script doing something similar, although it checks the powerdraw.
Here it is:
Code:
#!/bin/bash

# Miner restart script V001
# By Maxximus007
# for nvOC by fullzero
#
# POWERLIMIT MUST BE SET IN oneBash

#########################
### BELOW CODE, NO NEED FOR EDITING
#########################
echo "$(date) - Starting miner restart script." | tee -a ${LOG_FILE}
# Creating a log file to record restarts
LOG_FILE="/home/m1/restartlog.txt"
if [ ! -e "$LOG_FILE" ] ; then
    touch "$LOG_FILE"
fi

while true
do
sleep 60

GPUS=$(nvidia-smi --query-gpu=count --format=csv,noheader,nounits | tail -1)

gpu=0
COUNT_LOW_POWER=0

while [ $gpu -lt $GPUS ]
do
  { IFS=', ' read POWERDRAW POWERLIMIT; } < <( nvidia-smi -i $gpu --query-gpu=power.draw,power.limit --format=csv,noheader,nounits)

  let POWER_DIFF=$( printf "%.0f" $POWERLIMIT )-$( printf "%.0f" $POWERDRAW )

  # If current draw is 30 Watt lower than the limit count them:
  if [ "$POWER_DIFF" -gt "30" ]
  then
    let COUNT_LOW_POWER=COUNT_LOW_POWER+1
  fi

  let gpu=gpu+1
done

if [ $COUNT_LOW_POWER -eq $GPUS ]
then
  echo "$(date) - Power draw is too low: kill miner and oneBash" | tee -a ${LOG_FILE}
  # If miner runs in screen 'miner' kill the screen
  screen -X -S miner kill
  # Best to restart oneBash - settings might be adjusted already
  kill ps -ef | awk '$NF~"oneBash" {print $2}'
else
  echo "$(date) - All good! Will check again in 60 seconds"
fi

done

You can combine the above with your code, and find the utilization like this:
Code:
nvidia-smi -i 1 --query-gpu=utilization.gpu --format=csv,noheader,nounits
You have to iterate the GPU, starting at 0 to get them all
Okay I've combined the two, perhaps this will work for most of us:
Code:
#!/bin/bash

# Miner restart script V002
# By Maxximus007 && IAmNotAJeep
# for nvOC by fullzero
#

#########################
### BELOW CODE, NO NEED FOR EDITING
#########################
echo "$(date) - Starting miner restart script." | tee -a ${LOG_FILE}
# Creating a log file to record restarts
LOG_FILE="/home/m1/restartlog.txt"
if [ ! -e "$LOG_FILE" ] ; then
    touch "$LOG_FILE"
fi

MIN_UTIL=90
RESTART=0

while true
do
sleep 60

GPUS=$(nvidia-smi --query-gpu=count --format=csv,noheader,nounits | tail -1)

gpu=0
COUNT=0

while [ $gpu -lt $GPUS ]
do
  { IFS=', ' read UTIL; } < <( nvidia-smi -i $gpu --query-gpu=utilization.gpu --format=csv,noheader,nounits)

  let UTILIZATION=$( printf "%.0f" $UTIL )

  # If current utilizations lower than the limit count them:
  if [ $UTILIZATION -lt $MIN_UTIL ]
  then
    let COUNT=COUNT+1
  fi

  let gpu=gpu+1
done

if [ $COUNT -eq $GPUS ]
then
  if [ $RESTART -gt 1 ]
  then
    echo "$(date) - Utilization is too low: reviving did not work so restarting system" | tee -a ${LOG_FILE}
    sudo shutdown now -r
  fi
  echo "$(date) - Utilization is too low: kill miner and oneBash" | tee -a ${LOG_FILE}
  # If miner runs in screen 'miner' kill the screen
  screen -X -S miner kill
  # Best to restart oneBash - settings might be adjusted already
  kill ps -ef | awk '$NF~"oneBash" {print $2}'
  let RESTART=RESTART+1
else
  echo "$(date) - All good! Will check again in 60 seconds"
fi

done

Pretty cool!  I'll try it tonight, lets hope this put the softcrash issues behind us.


I will try this out as well; good work.  Smiley

pixelizedchaos
Newbie
*
Offline Offline

Activity: 18
Merit: 0


View Profile
July 10, 2017, 11:49:40 PM
 #1735

Suggestion, could we make a tutorial on adding new coins especially going towards the end of ETH. I feel like there will be a lot of these newer coins that might just be great to hodl, however it would also allow you to outsource the addition of coins a little bit, and you can fork it over from GitHub if the coin tests to be stable on nvOC.
fullzero
Legendary
*
Offline Offline

Activity: 1246
Merit: 1009



View Profile
July 10, 2017, 11:49:54 PM
 #1736

ubuntu server 16.4
driver nvidia_381

does not work
X server nvidia-settings

sudo nvidia-settings
Failed to connect to Mir: Failed to connect to server socket
Unable to init server: Could not connect

ERROR: The control display is undefined; please run `nvidia-settings --help`
for usage information.

Reinstall does not help

Is this with nvOC or vanilla Ubuntu?


fullzero
Legendary
*
Offline Offline

Activity: 1246
Merit: 1009



View Profile
July 10, 2017, 11:55:36 PM
 #1737

Suggestion, could we make a tutorial on adding new coins especially going towards the end of ETH. I feel like there will be a lot of these newer coins that might just be great to hodl, however it would also allow you to outsource the addition of coins a little bit, and you can fork it over from GitHub if the coin tests to be stable on nvOC.

If a coin uses the same algorithm as another; you can copy the code block for that COIN and change names.  Sometimes pools require changes in the syntax.

I think this is a good subject for a tutorial.

It would be nice if a member could make this tutorial; and I will link it on the OP.

There are also lots of other tutorials that would be helpful for new members / miners.

fk1
Full Member
***
Offline Offline

Activity: 216
Merit: 100


View Profile
July 11, 2017, 01:21:23 AM
Last edit: July 11, 2017, 01:37:17 AM by fk1
 #1738

Since I got problems with w10 I decided to give this distro a try and I am very surprised things went so easy, good job at this one!

I use this on a headlesss rig and atm I only know that this rig is running by checking pool URL or loginto it with SSH and using nvidia-smi for checking temps.

I am struggling with administrating, a couple of questions:

1. how can I get the terminal output of the automatic starting process (claymore) over SSH?

2. I configured teamviewer to start but how can I connectto it when I can't see the desktop?

3. On Windows,when mining monero with xmr-stak-cpu I was ableto achive about 200-250h/s with my amd fx6300. on the wolfminer its 100~150. Any tweaking options or possible to change to stak-miner?

keep up the good work!

2:  As far as I know: you have to setup teamviewer via its GUI; So you are going to need to connect a monitor for that.  Could be wrong about this; I don't use teamviewer myself; so other members would know better than I.

3:  The cpu miner is intended to be used in a minial capacity so as to not interfere with the GPU mining.  Depending on the GPU COIN you are mining; heavy CPU use will impact GPU mining.  I would look at the OP for the cpu client:  https://bitcointalk.org/index.php?topic=1326803.0 for optimal settings.




thx for your reply. on point 3: I mine eth/sia with GPUs and used 4/6 cores on windows. I will try to tweak some settings but I'd like to request this cpu miner for future releases: https://github.com/fireice-uk/xmr-stak-cpu

since i'd like to give something back:

I am currently using just 3x 1050 ti GPUs, one is only single-fan multimedia GPU, the other 2 are gaming with some extra power.

I run all of them at clocks 150/1300 stable. I get them to 13MH/s eth plus 280MH/s SIA per GPU @75w.

what I really prefer is setting the power limits to the cards possible lows:

The gaming cards can run at 66w and the multimedia edition even 53.5w limit and they do about 11MH/s ETH 270MH/s SIA still.

to do this individually for your card you can use 'nvidia-smi'. It will show stats like fan speed, temp, usage and power limit. set a power limit with 'nvidia-smi --power-limit=XX' where XX is power limit in W.

If you choose incorrect limit it will show allowed range. this will set limit for all cards, if you want to specify you can do it with '--id=$GPUID' where $GPUID you can find when overviewing with'nvidia-smi'

hope its useful for someone Smiley now I wonder how to tweak clocks with SSH without killing process, editing config and restart terminal session?
IAmNotAJeep
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
July 11, 2017, 01:53:12 AM
 #1739

Hey fullzero, i have a question,

without a doubt my biggest problem right now is that when my miner crashes it takes the whole rig down with it, everything gets stuck, SSH barely works, average system load jumps to 14.5!! and Xorg takes up 100% of the CPU, its so bad that none of the standard reboot commands work, they just do nothing, the only thing that actually reboots the rig in this state is "echo b > /proc/sysrq-trigger" so i've set up a script that checks the average system load and if its over 2 it uses the command to reboot, and it works, but i dont like this "solution", yesterday after a reboot nvOC got corrupted somehow, lost my customized oneBash and the whole system became read-only (thankfully i had a oneBash backup that was only a few days behind).

so the question is, what can i do to relive this Xorg error, i run a 7 card rig and never plan on going for a higher number, what can i do with Xorg that would fix this?

Thanks.

@ tempgoga

It seems that whenever a soft crash occurs most of the cards drop to zero, so while the display/keyboard is unresponsive you can catch the soft crash from nvidia-smi. The script below checks card utilization, if it drops below 90% it counts down a minute and if mining hasn't resumed it reboots the system.
This seems to have worked at least once in my case (only got one soft crash this weekend) and the system recovered as expected.
the threshold values work for my setup but others may find different values optimal

Also if anyone knows a way to iterate the if && statements we can get the card count from "cards=$(nvidia-smi -L | wc -l); echo $cards" but the way below also works with manual editing to adjust the watchdog for the number of cards in you individual system.
___________
 
#!/bin/bash
#m1
threshold=90
while sleep 5
 do number=$(nvidia-smi |grep % |awk '{print $13}' |tr -d %)
 set -- $number
 echo -e "$@"
# The "if and" statements below need to be manually adjusted to match the number of cards in your system
# If you have 5 cards, leave is as, if a different number of cards remove or add the && statements as needed as in the example below
        if [[ "$1" -gt "$threshold" ]] && \
           [[ "$2" -gt "$threshold" ]] && \
           [[ "$3" -gt "$threshold" ]] && \
           [[ "$4" -gt "$threshold" ]] && \
           [[ "$5" -gt "$threshold" ]]
# && \
#          [[ "$6" -gt "$threshold" ]]
         then i=12
         echo OK
         else echo $((i--))
        fi
        if [ $i -le 0 ]
         then echo $(date) REBOOT due to soft crash >>~/watchdog.log
         sleep -5
         sudo shutdown now -r
        fi
done
___________

Hey thats funny I just made a script doing something similar, although it checks the powerdraw.
Here it is:
Code:
#!/bin/bash

# Miner restart script V001
# By Maxximus007
# for nvOC by fullzero
#
# POWERLIMIT MUST BE SET IN oneBash

#########################
### BELOW CODE, NO NEED FOR EDITING
#########################
echo "$(date) - Starting miner restart script." | tee -a ${LOG_FILE}
# Creating a log file to record restarts
LOG_FILE="/home/m1/restartlog.txt"
if [ ! -e "$LOG_FILE" ] ; then
    touch "$LOG_FILE"
fi

while true
do
sleep 60

GPUS=$(nvidia-smi --query-gpu=count --format=csv,noheader,nounits | tail -1)

gpu=0
COUNT_LOW_POWER=0

while [ $gpu -lt $GPUS ]
do
  { IFS=', ' read POWERDRAW POWERLIMIT; } < <( nvidia-smi -i $gpu --query-gpu=power.draw,power.limit --format=csv,noheader,nounits)

  let POWER_DIFF=$( printf "%.0f" $POWERLIMIT )-$( printf "%.0f" $POWERDRAW )

  # If current draw is 30 Watt lower than the limit count them:
  if [ "$POWER_DIFF" -gt "30" ]
  then
    let COUNT_LOW_POWER=COUNT_LOW_POWER+1
  fi

  let gpu=gpu+1
done

if [ $COUNT_LOW_POWER -eq $GPUS ]
then
  echo "$(date) - Power draw is too low: kill miner and oneBash" | tee -a ${LOG_FILE}
  # If miner runs in screen 'miner' kill the screen
  screen -X -S miner kill
  # Best to restart oneBash - settings might be adjusted already
  kill ps -ef | awk '$NF~"oneBash" {print $2}'
else
  echo "$(date) - All good! Will check again in 60 seconds"
fi

done

You can combine the above with your code, and find the utilization like this:
Code:
nvidia-smi -i 1 --query-gpu=utilization.gpu --format=csv,noheader,nounits
You have to iterate the GPU, starting at 0 to get them all
Okay I've combined the two, perhaps this will work for most of us:
Code:
#!/bin/bash

# Miner restart script V002
# By Maxximus007 && IAmNotAJeep
# for nvOC by fullzero
#

#########################
### BELOW CODE, NO NEED FOR EDITING
#########################
echo "$(date) - Starting miner restart script." | tee -a ${LOG_FILE}
# Creating a log file to record restarts
LOG_FILE="/home/m1/restartlog.txt"
if [ ! -e "$LOG_FILE" ] ; then
    touch "$LOG_FILE"
fi

MIN_UTIL=90
RESTART=0

while true
do
sleep 60

GPUS=$(nvidia-smi --query-gpu=count --format=csv,noheader,nounits | tail -1)

gpu=0
COUNT=0

while [ $gpu -lt $GPUS ]
do
  { IFS=', ' read UTIL; } < <( nvidia-smi -i $gpu --query-gpu=utilization.gpu --format=csv,noheader,nounits)

  let UTILIZATION=$( printf "%.0f" $UTIL )

  # If current utilizations lower than the limit count them:
  if [ $UTILIZATION -lt $MIN_UTIL ]
  then
    let COUNT=COUNT+1
  fi

  let gpu=gpu+1
done

if [ $COUNT -eq $GPUS ]
then
  if [ $RESTART -gt 1 ]
  then
    echo "$(date) - Utilization is too low: reviving did not work so restarting system" | tee -a ${LOG_FILE}
    sudo shutdown now -r
  fi
  echo "$(date) - Utilization is too low: kill miner and oneBash" | tee -a ${LOG_FILE}
  # If miner runs in screen 'miner' kill the screen
  screen -X -S miner kill
  # Best to restart oneBash - settings might be adjusted already
  kill ps -ef | awk '$NF~"oneBash" {print $2}'
  let RESTART=RESTART+1
else
  echo "$(date) - All good! Will check again in 60 seconds"
fi

done

Pretty cool!  I'll try it tonight, lets hope this put the softcrash issues behind us.


I will try this out as well; good work.  Smiley

@ Maxximus007
Thanks for putting these together, great collab!
I'm not a bash expert, so maybe I'm reading this wrong, but here are some thoughts.
The combined code seems to be evaluating each gpu individually for the fault condition to be met, which means if one fails and you have say 5 other cards working then it keeps going until all the cards give reduced output since all of them have to fail individually to increment the counter?So if 5/6 fail we keep going? (Again just looking at it and tracing it in my head so maybe I'm reading wrong).
The way I was thinking about it, is that I wanted all the cards to work at above 90% efficiency and reboot as soon as any card strays beyond the threshold - this is why I did the "if and" statement and didn't iterate though "if" statements alone (I didn't know how to iterate "if and" based on an unknown number of cards lol). I had a version giving 6xOK and such but I think it's more efficient to just get 1xOK if ALL meet the 90% criteria and start the countdown as soon as anything is out of norm - and if the miner recovers, flush the counter. I observed a number of these conditions with Claymore where it recovers half the time, but then eventually craps out and the script kicks in. I haven't seen it on my Genoil rig yet since my other script has kept it in check without any softcrash for day 3 now.

A thought about the power draw as threshold measure - it is power limit/card specific and I guess people would need to tune their power threshold to their power limit so I agree it's best to use gpu util. (My cards are at 82W limit for example).
Thoughts?


  
achalmersman
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
July 11, 2017, 03:32:39 AM
 #1740

Maybe slightly off topic but I have Finally gotten my GTX 970 overclocked.  Even in Windows 10 with afterburner it took some creativity.  I am wondering about trying this in linux but I really don't want to take my Win10 rig down now since I am trying to test stability duration.  

If anybody wants to try a 970 overclock here is what finally worked in windows for me.  Not sure how the commands should be changed for linux.  This MUST be done BEFORE any overclocking done by afterburner and I would assume nvidia x server settings.  After running these commands I can then overclock the card.  If you try applying any overclock settings before running these commands it will not work and you must reset all overclock settings, apply these commands, and then try overclocking with afterburner again (i would assume nvidia x server would be treated the same?).  The following also must be run in elevated command prompt.  I created a scheduled task to run a .bat file with highest privileges on startup.

cd C:\Program Files\NVIDIA Corporation\NVSMI\
nvidia-smi -acp UNRESTRICTED
nvidia-smi -ac 3505,1455

The overclock values may not be compatible with other cards but it doesn't matter.  It will ignore it.  If nobody else tries this on Maxwell I will go back to nvOC, try it and report back.  Thanks for the work!!  
Pages: « 1 ... 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 [87] 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 ... 419 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!