Bitcoin Forum
November 18, 2017, 09:09:00 AM *
News: Latest stable version of Bitcoin Core: 0.15.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 ... 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 [102] 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 ... 284 »
  Print  
Author Topic: [OS] nvOC easy-to-use Linux Nvidia Mining v0019-1.4  (Read 322235 times)
fullzero
Legendary
*
Offline Offline

Activity: 1092



View Profile
July 19, 2017, 11:15:57 PM
 #2021

I wrote a code that check any income messages to telegram bot from you, and answer you.
Save it as separate file and run it at start.
Quote
source ~/wallets #import wallets
source ~/settings.sh #import settings

cd /tmp
while [ 1 ]
do
rm getUpdates
wget https://api.telegram.org/bot$TELEGRAM_API/getUpdates

INCOME=$(cat ./getUpdates | grep $TELEGRAM_CHAT | tail -1 | awk -F ":" '{print $13}' | cut -d \" -f 2)
INCOME_TIME=$(cat ./getUpdates | grep $TELEGRAM_CHAT | tail -1 | awk -F ":" '{print $12}' | cut -c -10)

LAST_INCOME_TIME=$(cat /home/m1/last_inc_time)

if [ $INCOME_TIME != $LAST_INCOME_TIME ]
then
  if [[ $INCOME == "State" || $INCOME == "state" || $INCOME == "STATE" ]]
  then
  echo state of rig
  ~/mail.sh 9
else
  echo invalid msg!
  fi
  echo $INCOME_TIME > /home/m1/last_inc_time #first time you must create this file yourself. put any numbers inside.
else
  echo no new messeges!
fi
sleep 5
done

To change or add new msg: $INCOME is text of message.   ~/mail.sh 9 - is command to do.
Quote
if [[ $INCOME == "State" || $INCOME == "state" || $INCOME == "STATE" ]]
  then
  echo state of rig
  ~/mail.sh 9
else
  echo invalid msg!
  fi

Please add to nvOC .

I will add this to my update stack.

Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1510996140
Hero Member
*
Offline Offline

Posts: 1510996140

View Profile Personal Message (Offline)

Ignore
1510996140
Reply with quote  #2

1510996140
Report to moderator
1510996140
Hero Member
*
Offline Offline

Posts: 1510996140

View Profile Personal Message (Offline)

Ignore
1510996140
Reply with quote  #2

1510996140
Report to moderator
1510996140
Hero Member
*
Offline Offline

Posts: 1510996140

View Profile Personal Message (Offline)

Ignore
1510996140
Reply with quote  #2

1510996140
Report to moderator
fullzero
Legendary
*
Offline Offline

Activity: 1092



View Profile
July 19, 2017, 11:19:58 PM
 #2022

Thank you very much for making such a wonderful OS.

Does NVOC v0018 have log file when miner restart after GPU soft crash?

In the v0018 1bash it does; in this updated version it only logs restarts.  This is because logging slightly decreases stability with using USB keys.  I will make watchdog logs a YES/NO option for the next 1bash.  For now you can open the watchdog file:

Code:
IAmNotAJeep_and_Maxximus007_WATCHDOG

go to line 86:

Code:
kill $target #| tee -a ${LOG_FILE}

and remove the # so it reads:

Code:
kill $target | tee -a ${LOG_FILE}

and it will log soft crashes.

fullzero
Legendary
*
Offline Offline

Activity: 1092



View Profile
July 19, 2017, 11:21:05 PM
 #2023

Okay, so you are using the autotemp, but did set INDIVIDUAL_POWERLIMIT="NO" ?
This can give a problem, to set autotemp correctly it needs this set to YES.

I set the temp control to set in line 51 and set the individual to yes in line 145 I set all the lines below to 100. I set all the target temp to 60 (this is the old v17 measured value) nand set restore to 20 in line 237.

Question: what does orginal power limit means? The card's default 125? Or powerlimit_watts in line 74?

Thank you!
The original power limit is the one read from the variables set in 1bash. You can see in the autotemp file: echo "INDIVIDUAL_POWERLIMIT_0:  ${POWER_LIMIT[0]}". That value is coming from the 1bash file, for instance INDIVIDUAL_POWERLIMIT_0=100.

Please run the autotemp script in a terminal and read the output.
Code:
/home/m1/Maxximus007_AUTO_TEMPERATURE_CONTROL

Maxximus007, thanks for helping  Smiley

fullzero
Legendary
*
Offline Offline

Activity: 1092



View Profile
July 19, 2017, 11:27:50 PM
 #2024

SSH users and new 1bash. You can no longer start things up by using bash 2unix as all the pastebin stuff is commented out.
You must now start up by SSH in to rig and run 1bash directly. Fullzero has placed the pastebin stuff in 1bash so make sure you edit 1bash first to remote and put your pastebin info in it.

For me personally new v18.1 does not work. Many many errors. Can't afford to have rig down to diagnosis right now so I am back to V.17 that works flawlessly for my purposes.

I would like to suggest making version 17 the "base" build as it works so well and incorporate all the features in as "modules" (separate programs), instead of adding all the code to the 1bash file.

File is getting so big it is hard to diagnosis stuff. Just one noobs opinion lol...
Yes, the 2unix has the pastebin out, and it's indeed quite handy in there to have a fitting 1bash directly. As fullzero already said, it will be modularized and optimized pretty soon, once it's on github. Currently there is just too much in one file, and makes it error prone. I use the new V0018 as a template, and removing parts I don't use.

Perhaps the first step should be separating the variables from the code with something like: source myvariables

It's still not a simple task for fullzero: There are many wishes, and for instance overclock is different per miner/coin etc..

TBH, I don't care too much to have all the possible (obscure) coins in it, if you want it, edit yourself (for now). Once it's split up in modules it's way easier to add additional coins.

I agree completely. Poor Fullzero is working overtime on this and giving it for free. I sure hope everyone is giving him some hash every once in a while. Hey Fullzero you should add your ETH or whatever you want to the code right at the top and comment it out. Most folks just open 1bash and delete yours and paste their address in. You are acknowledging everyone but yourself!
After reading the code: the new upPASTE will automatically update the default 1bash. So just change these lines in 1bash BEFORE booting (in the windows partition):
Code:
_Parallax_MODE="NO"             # YES NO

pasteBASH="np9FSHew"

upPASTE_TIMEOUT_IN_MINUTES=30
And your rig will update 1bash within 30 minutes. So you will send a few hashes to fullzero, not a big problem I believe..

My bad; I should have put the timeout at the bottom of the while loop so it executes at launch.  This morning when I was testing this I was using a 1 minute timeout so I wasn't thinking about this.

To make this change (running the update at launch) open the upPASTE file and cut line 18:

Code:
sleep $TIMEOUT

you must cut (remove this line so that line 18 is blank)

go the the open line 62 before:

Code:
done
fi

and paste:

Code:
sleep $TIMEOUT

so the bottom of the file reads:
Code:
sleep $TIMEOUT
done
fi

save


fullzero
Legendary
*
Offline Offline

Activity: 1092



View Profile
July 19, 2017, 11:33:37 PM
 #2025

Which image for MSI Gaming 5 motherboard to support 7 GPU?  Not sure if it's an issue with the image or something else but it wouldn't boot up even for the BIOS screen with the GPUs and NVOS flash drive connected when using the TB85 motherboard image.

 

For an MSI Z170-A GAMING M5 use this image:


Currently each image is unique so I can only ensure they will work for the mobo listed to support.

can use another mobo with 6 pcie ?
any brand ?

Most intel mobos should work; however many require bios setting changes.  In general; the z170 chipset are the hardest to get working with all pcie slots.

I have tested many motherboards and listed the bios settings that need to be changed with pictures on the OP. 

I highly recommend only getting a motherboard that has already been tested.  Specifically I would get a motherboard that works out the box with no bios changes if possible. 

Last time I checked the old but good: ASRock H81 PRO BTC (6x gpu) mobo was in stock at newegg.  I have a ton of these; even though they use an old chipset they are rock solid and work out the box.

TenaciousJ
Full Member
***
Offline Offline

Activity: 123


View Profile
July 20, 2017, 04:57:46 AM
 #2026

Beats me Fullzero. My version of 1bash v. 18 has no way of executing the bash file Watchdog until I added it.
Just saying "yes" to the switch won't start it.

The other stuff does not work as I stated which is why I wrote my own part and edited out some stuff.

Maybe just me and my rig..shrug dunno.

I don't wanna mess up anyone with my crazy changes so I'll just keep em to myself for now unless I see others with similar issues.

This is getting big and complex. Ever consider client side program running in background and controlling stuff via a webpage?

Yes this is planned: monitor / push / update / dashboard app; keep getting sidetracked adding contributions / new coins.

The new 1bash should solve problems / start watchdog and autotemp in a screen when in remote.

Love the v0018 release and all the functionality! 

However, POWERLIMIT NIGHTMARES! 

I have one major issue, I cannot lower the POWERLIMIT.  I run 8 rigs of 1050Ti and 125W is just way to high.  I have tried adjusting the base line and the individual POWERLIMIT settings and I am still seeing maximum power being utilized in NVIDIA-SMI and TEMP CONTROL.  I thought maybe the TEMP CONTROL was trumping the setting, but I don't think that is the case (at least based on what my 46 year old brain and eye balls looking at the 1bash code understands).  I thought maybe it was the correction in line 527, but that didn't change anything.

I tried "NO" for both WATCHDOG and TEMP CONTROL with POWERLIMIT set below MAX for the 1050Ti and I still see max power output.

I did notice during startup, of the three terminal screens that pop-up during startup that the second terminal session has the POWERLIMIT set correctly at 60.   However, something happens after the third terminal screen initiates (miner starting) that pushes the POWER back to MAX.

I added another rig of 1050Tis tonight and I saw more unusual behavior from POWER settings again where GPU0 goes to 125W as the max power limit and the rest of the GPUs all complied with my setting of 65Watts.   I have no idea what is causing this inconsistency in power limit settings.

I also noticed in the Guake terminal that the TEMP CONTROL module is displaying continuous notifications that 125W is not a valid power limit (even after changing the settings in the module to 60-65).

I normally run all my rigs at 60W, which keeps the current draw low enough to run 3 rigs of 8 GPUs on each 15 AMP circuit.  Also, extremely efficient.

I am still hunting for what is causing the forced 125W power setting.

Try the new 1bash and additional files posted on the OP.  Let me know if it doesn't solve this for you.


I tried updating to the newest posted 1bash files as you suggested to resolve a problem where I have set the powerlimits for my cards individually, but the script changes one of my 1080ti cards (250w) to the power limit set for the 1070s (140w) so the card is only pulling 550sol instead of the 750 it should be.

I've triple checked my individual power limit settings vs. the GPU ID from Nvidia xserver against the powerlimit ID in the script, and they match.  But it's not processing properly. 

So after updating to the new 1bash fileset, it gets to the point where the fan settings are modified, the script loads the EWBF miner (for HUSH), and promptly crashes with the 'screen is terminated' message.  I also have disabled autotemp and watchdog, but the problem persists.

Here's a screen shot of the problem that set this chain in motion showing the power draw of the cards vs. the powerlimit settings in 1bash and the IDs that were used in Xserver to match the powerlimits, and of the current problem. 

http://imgur.com/a/zwf2s


I can't figure out what might be causing the miner to crash immediately on load like it is.. I've tried zeroing overclocks in case it was related to that, but that was no help. I ruled out power mizer by disabling that as well, but it crashes still.

I'm also set to LOCAL mode. 
TenaciousJ
Full Member
***
Offline Offline

Activity: 123


View Profile
July 20, 2017, 05:01:17 AM
 #2027

Hi fullzero,

thank you for keeping this project alive and the constant updates.
I've been running 017 version on z270-hd3p gigabyte motherboard + 3 x 1080TI and a 1070 for almost 2 weeks now with no issues.

meanwhile does anyone have the issue with 018 version not working at all? ewbf does not even start. Most settings have been the same as from the onebash file in 017. Turned off most of the new additional features like watchdog and auto temp.
I've tried booting from an ssd as well as a 32gb sandisk ultra flair thumbdrive; I keep getting the error [Screen is terminating] at the end.

I understand the issue is most likely a configuration somewhere gone wrong, therefore it terminated before even trying to load ewbf miner, but was there such a drastic change from 017 to 018 that I missed out?

Would really like to find out if anyone faced a similar issue, so I iron it out and run ver 018.
Thanks!

I ran into the same problem after using the most current files.  No idea what's causing it.  I've disabled autotemp, watchdog, set to LOCAL, tried mizer on and off, etc. but nothing fixed it.  I see EWBF load for 1/2 a second then that 'screen is terminating' message pops up.  I think it might be related to watchdog, even though its disabled in the 1bash file, but I can't figure out how exactly.
kw1k
Newbie
*
Offline Offline

Activity: 13


View Profile
July 20, 2017, 05:15:36 AM
 #2028

fullzero -- did you managed to get ccminer_alexei78 into this new v18 build?

I still need to add some more ccminer versions; v0018 doesn't have the version I believe you are looking for.



I have compiled the correct alexis78 version for nvoc with arch flags for 10x series cards.

https://mega.nz/#!p64lHS4Q!BpaOMyEx5pL8GhkEXx6WTfgILxMa5FjvreN7jwLxuVE
BaliMiner
Newbie
*
Offline Offline

Activity: 3


View Profile
July 20, 2017, 05:22:16 AM
 #2029


BaliMiner please provide a BTC address for the next version.


Hil Fullzero this is my BTC address: 1HbzxQ6AVeWYvFm322KtxZcJJLAqfJHpN8
Avarets
Newbie
*
Offline Offline

Activity: 11


View Profile
July 20, 2017, 08:56:43 AM
 #2030

My configuration:
v0018
Biostar TB250-BTC PRO + 12 Zotac P106-100 cards (without output).
When I run it with LOCAL (GT 730 for monitor + 7 P106-100 cards) I see it works.
But when I remove GT 730 adapter and monitor and attach all 12 P106-100 cards and use REMOTE and connect by SSH it doesn't seem to be working.
I tried to run it manually but the OS was rebooted with Xorg error.
Any ideas how to fix it?

P.S. I tried new 1bash - still the same issue.

Code:
m1@m1-desktop:~$ pkill -e miner
m1@m1-desktop:~$ export DISPLAY=:0
m1@m1-desktop:~$ screen -r miner
There is no screen to be resumed matching miner.
m1@m1-desktop:~$ bash /home/m1/1bash


workername: nv045

Xorg PROBLEM DETECTED

Restoring Xorg

Rebooting in 5

FIrst: ensure you have made the 2x bios changes as indicated in the OP for this mobo; and saved / restarted as directed.  If you have made additional bios changes then you should restore the default settings and perform the procedure in the OP. 

Second while troubleshooting I recommend attaching the GPU with output to the primary 16x slot and using 11 of the mining GPUs in the other slots.  Run in local mode.

If you have significantly changed the GPU configuration; especially in regard to the the primary GPU it is likely the system will need to restore the xorg and reboot.  If it does this once it is expected; if it does this in a loop (ie multiple times in a row there is a problem).

Let me know how this goes.

PS: I highly recommend using the ASRock 13x mobo to get out the box; easy setup.  If I was having a lot of trouble with this mobo, I would get one of the ASRock and then return the Biostar when I had the rig running with 13x.




I figured out this was because of wrong xorg.conf.
Used this command:
Code:
sudo nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration
Also commented out this part and forced XORG to be OK:
Code:
XORG="OK"

#if grep -q "28800" /etc/X11/xorg.conf;
#then
#XORG="OK"
#fi
Now the script starts fine.

One more thing. The script doesn't support P106-100 overclokling because of this part:
Code:
___1050_or_1050ti="NO"

NORMAL="NO"

nvidia-smi -L > /tmp/tempa

if grep -q "1050" /tmp/tempa;
then
___1050_or_1050ti="YES"
fi

if grep -q "1060" /tmp/tempa;
then
NORMAL="YES"
fi

"nvidia-smi -L > /tmp/tempa" in case of P106-100 is like this:
Code:
m1@m1-desktop:~$ cat /tmp/tempa
GPU 0: P106-100 (UUID: GPU-afea0b93-e083-bde7-f6dd-fb5b9f55ae98)
GPU 1: P106-100 (UUID: GPU-191d50dc-d599-de1d-fa4b-54493a9035c6)
GPU 2: P106-100 (UUID: GPU-2ae0b358-33bb-8438-f47b-2a2ce8088f88)
GPU 3: P106-100 (UUID: GPU-66bce3b8-51aa-9f9d-f3c5-fce4e667f994)
GPU 4: P106-100 (UUID: GPU-bae124b9-96ad-5086-20f4-32bdb6d2663f)
GPU 5: P106-100 (UUID: GPU-a9664776-7549-499a-6cfa-3b74a6c6c843)
GPU 6: P106-100 (UUID: GPU-4b57123b-20b9-20c6-ffb9-0203a51cf009)
GPU 7: P106-100 (UUID: GPU-f851be56-15e7-adf2-5a65-7508a25e6e66)
GPU 8: P106-100 (UUID: GPU-1249a132-7df6-a1d3-4794-947cd1e1887a)
GPU 9: P106-100 (UUID: GPU-f31fca46-13ad-4eee-5024-177de21d36f9)
GPU 10: P106-100 (UUID: GPU-4161850e-1f6a-7c6b-fda6-03d58826f758)
GPU 11: P106-100 (UUID: GPU-af9286f8-e0c5-2139-87f1-7019b8a1ccca)


So I manually set "TI=2" and now overcloking values are applied.

Code:
TI="2"

if [ $___1050_or_1050ti == "YES" ]
then
    TI="2"
if [ $NORMAL == "YES" ]
then
    TI="2 3"
fi
fi
fullzero
Legendary
*
Offline Offline

Activity: 1092



View Profile
July 20, 2017, 01:02:26 PM
 #2031

Beats me Fullzero. My version of 1bash v. 18 has no way of executing the bash file Watchdog until I added it.
Just saying "yes" to the switch won't start it.

The other stuff does not work as I stated which is why I wrote my own part and edited out some stuff.

Maybe just me and my rig..shrug dunno.

I don't wanna mess up anyone with my crazy changes so I'll just keep em to myself for now unless I see others with similar issues.

This is getting big and complex. Ever consider client side program running in background and controlling stuff via a webpage?

Yes this is planned: monitor / push / update / dashboard app; keep getting sidetracked adding contributions / new coins.

The new 1bash should solve problems / start watchdog and autotemp in a screen when in remote.

Love the v0018 release and all the functionality! 

However, POWERLIMIT NIGHTMARES! 

I have one major issue, I cannot lower the POWERLIMIT.  I run 8 rigs of 1050Ti and 125W is just way to high.  I have tried adjusting the base line and the individual POWERLIMIT settings and I am still seeing maximum power being utilized in NVIDIA-SMI and TEMP CONTROL.  I thought maybe the TEMP CONTROL was trumping the setting, but I don't think that is the case (at least based on what my 46 year old brain and eye balls looking at the 1bash code understands).  I thought maybe it was the correction in line 527, but that didn't change anything.

I tried "NO" for both WATCHDOG and TEMP CONTROL with POWERLIMIT set below MAX for the 1050Ti and I still see max power output.

I did notice during startup, of the three terminal screens that pop-up during startup that the second terminal session has the POWERLIMIT set correctly at 60.   However, something happens after the third terminal screen initiates (miner starting) that pushes the POWER back to MAX.

I added another rig of 1050Tis tonight and I saw more unusual behavior from POWER settings again where GPU0 goes to 125W as the max power limit and the rest of the GPUs all complied with my setting of 65Watts.   I have no idea what is causing this inconsistency in power limit settings.

I also noticed in the Guake terminal that the TEMP CONTROL module is displaying continuous notifications that 125W is not a valid power limit (even after changing the settings in the module to 60-65).

I normally run all my rigs at 60W, which keeps the current draw low enough to run 3 rigs of 8 GPUs on each 15 AMP circuit.  Also, extremely efficient.

I am still hunting for what is causing the forced 125W power setting.

Try the new 1bash and additional files posted on the OP.  Let me know if it doesn't solve this for you.


I tried updating to the newest posted 1bash files as you suggested to resolve a problem where I have set the powerlimits for my cards individually, but the script changes one of my 1080ti cards (250w) to the power limit set for the 1070s (140w) so the card is only pulling 550sol instead of the 750 it should be.

I've triple checked my individual power limit settings vs. the GPU ID from Nvidia xserver against the powerlimit ID in the script, and they match.  But it's not processing properly. 

So after updating to the new 1bash fileset, it gets to the point where the fan settings are modified, the script loads the EWBF miner (for HUSH), and promptly crashes with the 'screen is terminated' message.  I also have disabled autotemp and watchdog, but the problem persists.

Here's a screen shot of the problem that set this chain in motion showing the power draw of the cards vs. the powerlimit settings in 1bash and the IDs that were used in Xserver to match the powerlimits, and of the current problem. 

http://imgur.com/a/zwf2s


I can't figure out what might be causing the miner to crash immediately on load like it is.. I've tried zeroing overclocks in case it was related to that, but that was no help. I ruled out power mizer by disabling that as well, but it crashes still.

I'm also set to LOCAL mode. 

I made an updated 1bash which should resolve these powerlimit / remote issues.  With the powerlimits the autotemp was not reinitializing unless explicitly killed or the rig was logged out or rebooted.  This had the effect of not allowing changes to the individual powerlimits until such killing or logout / reboot.

In your picture I can see another problem which is most likely what has been killing your 1bash prematurely.

By removing the unused individual powerlimit variables you have created a situation where later in 1bash those variables are undefined. 

For now; don't delete unused variables. 

I can add logic to check or otherwise avoid this type of problem in the future; but it is simple enough to leave the extra variables for now.


VoskCoin
Full Member
***
Offline Offline

Activity: 224


YouTube.com/VoskCoin


View Profile WWW
July 20, 2017, 01:06:41 PM
 #2032

I have a major issue, all of my miners just completely turned off 30 minutes ago,

Room was around 80 degrees, they never rebooted, breaker wasn't tripped, I have one asic miner in there and it was mining away when I walked in while every other machine was sitting there off?

Any idea on what happened? How can I figure out more and how can I prevent this from happening in the future?

Check out my Crypto YouTube channel
https://www.youtube.com/VoskCoin
If you enjoy my content click Subscribe
fullzero
Legendary
*
Offline Offline

Activity: 1092



View Profile
July 20, 2017, 01:07:05 PM
 #2033

Hi fullzero,

thank you for keeping this project alive and the constant updates.
I've been running 017 version on z270-hd3p gigabyte motherboard + 3 x 1080TI and a 1070 for almost 2 weeks now with no issues.

meanwhile does anyone have the issue with 018 version not working at all? ewbf does not even start. Most settings have been the same as from the onebash file in 017. Turned off most of the new additional features like watchdog and auto temp.
I've tried booting from an ssd as well as a 32gb sandisk ultra flair thumbdrive; I keep getting the error [Screen is terminating] at the end.

I understand the issue is most likely a configuration somewhere gone wrong, therefore it terminated before even trying to load ewbf miner, but was there such a drastic change from 017 to 018 that I missed out?

Would really like to find out if anyone faced a similar issue, so I iron it out and run ver 018.
Thanks!

I ran into the same problem after using the most current files.  No idea what's causing it.  I've disabled autotemp, watchdog, set to LOCAL, tried mizer on and off, etc. but nothing fixed it.  I see EWBF load for 1/2 a second then that 'screen is terminating' message pops up.  I think it might be related to watchdog, even though its disabled in the 1bash file, but I can't figure out how exactly.

I made an updated 1bash which should resolve these powerlimit / remote issues. 

With the powerlimits the autotemp was not reinitializing unless explicitly killed or the rig was logged out or rebooted.  This had the effect of not allowing changes to the individual powerlimits until such killing or logout / reboot.

With the remote issues; I was querying the existing processes incorrectly when in REMOTE causing duplicate watchdogs to be created.

I altered upPASTE to check for a 1bash update on launch.

I added logic based on Avarets experience with P106-100 (I don't have any of these so I am going on your report Avarets; let me know if these changes work)

Please let me know if there are any issues with these updates.

fullzero
Legendary
*
Offline Offline

Activity: 1092



View Profile
July 20, 2017, 01:21:43 PM
 #2034

I have a major issue, all of my miners just completely turned off 30 minutes ago,

Room was around 80 degrees, they never rebooted, breaker wasn't tripped, I have one asic miner in there and it was mining away when I walked in while every other machine was sitting there off?

Any idea on what happened? How can I figure out more and how can I prevent this from happening in the future?

If they are all powered off; I'm not sure what would have caused that.

If they are still on / but not mining:

I would look at one and see if there is a connection problem with the pool.  This is the most likely reason for a large number of rigs to simultaneously stop mining.

pool disconnect detection and mitigation / auto failover can be improved in a later version

Right now; if the pool server goes down: the rig will reinitialize 1bash 5 times then, reboot.  This will occur in a larger loop until the pool server is reachable. 

This occurs because when the pool is not providing work, the GPU utilization will be below 90.



VoskCoin
Full Member
***
Offline Offline

Activity: 224


YouTube.com/VoskCoin


View Profile WWW
July 20, 2017, 01:25:42 PM
 #2035

I have a major issue, all of my miners just completely turned off 30 minutes ago,

Room was around 80 degrees, they never rebooted, breaker wasn't tripped, I have one asic miner in there and it was mining away when I walked in while every other machine was sitting there off?

Any idea on what happened? How can I figure out more and how can I prevent this from happening in the future?

If they are all powered off; I'm not sure what would have caused that.

If they are still on / but not mining:

I would look at one and see if there is a connection problem with the pool.  This is the most likely reason for a large number of rigs to simultaneously stop mining.

pool disconnect detection and mitigation / auto failover can be improved in a later version

Right now; if the pool server goes down: the rig will reinitialize 1bash 5 times then, reboot.  This will occur in a larger loop until the pool server is reachable.  

This occurs because when the pool is not providing work, the GPU utilization will be below 90.




Any idea why they would stay turned off though? The pool did not go down / I called my buddy his rigs on your software on the same pool had no issue?

I have a whole house surge protector, but could it still be a powe surge?

Check out my Crypto YouTube channel
https://www.youtube.com/VoskCoin
If you enjoy my content click Subscribe
fullzero
Legendary
*
Offline Offline

Activity: 1092



View Profile
July 20, 2017, 01:38:10 PM
 #2036

I have a major issue, all of my miners just completely turned off 30 minutes ago,

Room was around 80 degrees, they never rebooted, breaker wasn't tripped, I have one asic miner in there and it was mining away when I walked in while every other machine was sitting there off?

Any idea on what happened? How can I figure out more and how can I prevent this from happening in the future?

If they are all powered off; I'm not sure what would have caused that.

If they are still on / but not mining:

I would look at one and see if there is a connection problem with the pool.  This is the most likely reason for a large number of rigs to simultaneously stop mining.

pool disconnect detection and mitigation / auto failover can be improved in a later version

Right now; if the pool server goes down: the rig will reinitialize 1bash 5 times then, reboot.  This will occur in a larger loop until the pool server is reachable.  

This occurs because when the pool is not providing work, the GPU utilization will be below 90.




Any idea why they would stay turned off though? The pool did not go down / I called my buddy his rigs on your software on the same pool had no issue?

I have a whole house surge protector, but could it still be a powe surge?

If they aren't all on the same circuit / other electronic devices seem to be working correctly; and you have verified the pool server is up; there may be a problem with your router / switch. 

I would first try a hard reboot of all the rigs and see if they power on (switch all the PSUs to off, click atx power switch at least 2 times, switch the PSUs back on and click the atx power switches) 

Sometimes a powerstrip will trip before a circuit (this depends on the powerstrip) so if you are using a powerstrip or PDU I would check that it is still on / not tripped.

spiz0r
Sr. Member
****
Offline Offline

Activity: 336



View Profile
July 20, 2017, 03:05:12 PM
 #2037


I got a BIOSTAR TB250-BTC PRO (12x gpu) today Link

I made a 12x 1060 rig with it.

ensure Mining Mode is enabled in the bios. 

ensure Max TOLUD is set to 3.5 GB in the bios.

NOTE: you must first only connect 6x GPUs, boot, make Bios changes, save and reboot, shutdown, add the other 6x GPUs, boot





I like the 13x out the box + m2 ssd ready ASRock more; but this is also a good mobo. 

Biostar sadly still can't handle; making mining settings the default.

It's good to see somebody got this board to work. I have problems with this board. I have a tons of PCIe bus errors. Are you sure you haven't changed anything in the bios? PCIe bus speeds auto or gen2, or above 4G MMIO? Also do you use the IGFX or one of the mining card?
Could you share your settings? Smiley

The settings are in the quote you posted; also on the OP:

ensure Mining Mode is enabled in the bios. LINK to PICTURE 

ensure Max TOLUD is set to 3.5 GB in the bios. LINK to PICTURE

NOTE: you must first only connect 6x GPUs, boot, make Bios changes, save and reboot, shutdown, add the other 6x GPUs, attach the USB or SSD and boot

Connect a monitor to the GPU connected to the 16x slot; nvOC and rxOC do not currently support integrated graphics.

Thanks, my problem was my skylake cpu. I needed to add one extra line to grub kernel and it worked fine. ( I couldn't make NVoC work, but I could make SMOS work with 12 cards)

DJ ACK
Newbie
*
Offline Offline

Activity: 11


View Profile
July 20, 2017, 03:07:45 PM
 #2038

Love the v0018 release and all the functionality!  

However, POWERLIMIT NIGHTMARES!  

I have one major issue, I cannot lower the POWERLIMIT.  I run 8 rigs of 1050Ti and 125W is just way to high.  I have tried adjusting the base line and the individual POWERLIMIT settings and I am still seeing maximum power being utilized in NVIDIA-SMI and TEMP CONTROL.  I thought maybe the TEMP CONTROL was trumping the setting, but I don't think that is the case (at least based on what my 46 year old brain and eye balls looking at the 1bash code understands).  I thought maybe it was the correction in line 527, but that didn't change anything.

I tried "NO" for both WATCHDOG and TEMP CONTROL with POWERLIMIT set below MAX for the 1050Ti and I still see max power output.

I did notice during startup, of the three terminal screens that pop-up during startup that the second terminal session has the POWERLIMIT set correctly at 60.   However, something happens after the third terminal screen initiates (miner starting) that pushes the POWER back to MAX.

I added another rig of 1050Tis tonight and I saw more unusual behavior from POWER settings again where GPU0 goes to 125W as the max power limit and the rest of the GPUs all complied with my setting of 65Watts.   I have no idea what is causing this inconsistency in power limit settings.

I also noticed in the Guake terminal that the TEMP CONTROL module is displaying continuous notifications that 125W is not a valid power limit (even after changing the settings in the module to 60-65).

I normally run all my rigs at 60W, which keeps the current draw low enough to run 3 rigs of 8 GPUs on each 15 AMP circuit.  Also, extremely efficient.

I am still hunting for what is causing the forced 125W power setting.

Try the new 1bash and additional files posted on the OP.  Let me know if it doesn't solve this for you.

[/quote]

Fullzero, yes this solved all my POWERLIMIT problems.  All 8 rigs up and hashing away.  Thank you very much!
hatch789
Jr. Member
*
Offline Offline

Activity: 53


View Profile WWW
July 20, 2017, 06:15:02 PM
 #2039

Hi Guys,

I have been happily working with the new nv0018 image for about 4 days now. It's going well and I plan to continue my work with it! I thank everyone for their hard work and contributions to the project ...especially Fullzero of course.

Anyway, last night I tried to shut my rigs down for some changes I was going to make. I realized I couldn't shutdown! Every time it shut down it would boot right back up.

I have seen this before and it's an ACPI issue more often than not. But for kicks and giggles, I tried booting to my windows drive on the same hardware. It shut down without any issue at all. So I know that the problem is not a BIOS setting as some posts suggest. Windows shuts the rig down just fine.

I toyed with the kernel settings GRUB_CMDLINE_LINUX_DEFAULT="quiet splash acpi=force"
( https://www.unixmen.com/fix-shutdown-power-computer-ubuntu-14-04/ )
but it didn't help...

I'm just wondering if anyone else has run into this? I'm using the Asus Z270-A motherboard:
https://www.newegg.com/Product/Product.aspx?item=N82E16813132936

These are nice MB's and let you run 7 cards on one Rig.

Thanks,
-Hatch -= http://UbuMiner.com =-
Avarets
Newbie
*
Offline Offline

Activity: 11


View Profile
July 20, 2017, 08:55:39 PM
 #2040

I added logic based on Avarets experience with P106-100 (I don't have any of these so I am going on your report Avarets; let me know if these changes work)

Please let me know if there are any issues with these updates.

The script doesn't seem to start automatically.
If I run it manually there are some errors but mining process starts:

Code:
Invalid MIT-MAGIC-COOKIE-1 keyFailed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused

ERROR: The control display is undefined; please run `nvidia-settings --help` for usage information.
Pages: « 1 ... 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 [102] 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 ... 284 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!