Stubo
Member
Offline
Activity: 224
Merit: 13
|
|
May 04, 2018, 07:51:26 PM |
|
Found them! There is a lot in there... But I am not sure what I am looking for. There is some warnings I see in there, all the files look identical almost. Are you in the discord? I could send it to you on there? This is the last portion of it.
I am not in the discord. There is going to be a LOT of stuff in the syslog and most of it you can ignore. What you want to do is to zero in on the entries just before it froze. This is all done by timestamps. You can also look for entries that are hard errors but those may prove to be red herrings. If you notice there are syslogs going back about a week. syslog is from today, syslog.1 is going to be yesterday, syslog.2.gz is the archived syslog from the day before that, etc. For the gz ones, just use "gunzip <filename>" as root to unzip them so that you can read them. So start by recalling the day and time of the last freeze(s), and then figure out which log it would be in based upon the timestamp of the syslogs. Next, open whichever one(s) those are and find that day/time. You should then see additional clues (probably errors) that were logged. Once you find them, we can go from there.
|
|
|
|
urnzwy
Newbie
Offline
Activity: 44
Merit: 0
|
|
May 04, 2018, 07:58:29 PM |
|
Found them! There is a lot in there... But I am not sure what I am looking for. There is some warnings I see in there, all the files look identical almost. Are you in the discord? I could send it to you on there? This is the last portion of it.
I am not in the discord. There is going to be a LOT of stuff in the syslog and most of it you can ignore. What you want to do is to zero in on the entries just before it froze. This is all done by timestamps. You can also look for entries that are hard errors but those may prove to be red herrings. If you notice there are syslogs going back about a week. syslog is from today, syslog.1 is going to be yesterday, syslog.2.gz is the archived syslog from the day before that, etc. For the gz ones, just use "gunzip <filename>" as root to unzip them so that you can read them. So start by recalling the day and time of the last freeze(s), and then figure out which log it would be in based upon the timestamp of the syslogs. Next, open whichever one(s) those are and find that day/time. You should then see additional clues (probably errors) that were logged. Once you find them, we can go from there. Updated the previous post to include the last portion of the files. It froze every day and all the files looks the same, so whatever this issue is it just happens over and over again.
|
|
|
|
Stubo
Member
Offline
Activity: 224
Merit: 13
|
|
May 04, 2018, 08:45:23 PM |
|
Found them! There is a lot in there... But I am not sure what I am looking for. There is some warnings I see in there, all the files look identical almost. Are you in the discord? I could send it to you on there? This is the last portion of it.
I am not in the discord. There is going to be a LOT of stuff in the syslog and most of it you can ignore. What you want to do is to zero in on the entries just before it froze. This is all done by timestamps. You can also look for entries that are hard errors but those may prove to be red herrings. If you notice there are syslogs going back about a week. syslog is from today, syslog.1 is going to be yesterday, syslog.2.gz is the archived syslog from the day before that, etc. For the gz ones, just use "gunzip <filename>" as root to unzip them so that you can read them. So start by recalling the day and time of the last freeze(s), and then figure out which log it would be in based upon the timestamp of the syslogs. Next, open whichever one(s) those are and find that day/time. You should then see additional clues (probably errors) that were logged. Once you find them, we can go from there. Updated the previous post to include the last portion of the files. It froze every day and all the files looks the same, so whatever this issue is it just happens over and over again. Let me see if I can make it more clear. Unlike some nvOC specific logs, Ubuntu [any unix system] syslogs are written to constantly, so grabbing the last (most recent) part of one is unlikely to be helpful in diagnosing your problem. Instead, what you want to do is view the logs with an editor like vi and scroll through it until you find the day and time of when the freeze occurred. For example, I had an issue with one of my miners on Wednesday morning. My pool indicated that it was not mining so tried to login via ssh but could not. I physically checked on the machine and it was running but I could not access it remotely (I don't have a display connected to it). So, I powered it down for a few minutes and then powered it back up. It came up just fine and started mining again. Of course, I wanted to find out what had happened so I started looking at my syslog for that day and at that time - sometime after 9:30am on May 2. When I found that part of the log, I saw this: May 2 07:35:02 Miner1 anacron[30615]: Job `cron.daily' terminated May 2 07:35:02 Miner1 anacron[30615]: Normal exit (1 job run) May 2 08:17:01 Miner1 CRON[16925]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) May 2 08:24:20 Miner1 systemd[1]: Starting Cleanup of Temporary Directories... May 2 08:24:20 Miner1 systemd-tmpfiles[19789]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring. May 2 08:24:20 Miner1 systemd[1]: Started Cleanup of Temporary Directories. May 2 09:17:01 Miner1 CRON[8112]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) [b]May 2 09:46:17 Miner1 avahi-daemon[821]: Withdrawing address record for 192.168.1.165 on enp0s31f6. May 2 09:46:17 Miner1 avahi-daemon[821]: Leaving mDNS multicast group on interface enp0s31f6.IPv4 with address 192.168.1.165. May 2 09:46:17 Miner1 avahi-daemon[821]: Interface enp0s31f6.IPv4 no longer relevant for mDNS. [/b]May 2 10:17:01 Miner1 CRON[23962]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) May 2 10:23:30 Miner1 systemd[1]: Starting Daily apt download activities... May 2 10:23:30 Miner1 systemd[1]: Started Daily apt download activities. May 2 11:17:01 Miner1 CRON[32329]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) May 2 12:17:01 Miner1 CRON[8511]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) May 2 12:43:29 Miner1 rsyslogd: [origin software="rsyslogd" swVersion="8.16.0" x-pid="846" x-info="http://www.rsyslog.com"] start
So for some still to be determined reason, my machine decided to withdraw its DNS registration from my DNS which resulted in loss of name resolution and connectivity. While not at all relevant to your situation, I just wanted to use this example of an issue that I had, how I found it, and what caused it. I hope this helps.
|
|
|
|
WaveFront
Member
Offline
Activity: 126
Merit: 10
|
|
May 06, 2018, 07:56:52 AM |
|
Bad news about Equihash : Bitmain launch Equihash ASIC this day. 10 KSol/s consumming only 300W ! What do you think about this ASIC ? https://shop.bitmain.com/product/detail?pid=00020180503154806494uGcSyiu806FDI said that I do not come to promote them, I did not buy ASIC but it is quite clear that this case will pose a huge problem. Much worse than that of the ETH miner of the same company. Only the Monéro community reacted in opposition to the bitmain ASICS. I simply wish to have your opinion on this issue which may make GPU mining obsolete on this algorithm rather quickly. At this rate, only Neoscrypt and Cryptonight V7 will remain to be mined via CPU and GPU. Not a good deal for decentralization as a cryptocurrency target. One rig 22mhX6 for 132MH drawing 680w VS 10Kh @300W is going to be a threat? Where? Oh and each unit does cost 2K and you can only have one for now. Hmmmmm. thay Its equihash not ethash ... For Ethash there is the Antiminer E3. Price and power consumption is not radically better than a GPU rig with an equivalent hash rate
|
|
|
|
CryptAtomeTrader44
Full Member
Offline
Activity: 340
Merit: 103
It is easier to break an atom than partialities AE
|
|
May 06, 2018, 09:07:13 AM Last edit: May 06, 2018, 09:27:59 AM by CryptAtomeTrader44 |
|
Bad news about Equihash : Bitmain launch Equihash ASIC this day. 10 KSol/s consumming only 300W ! What do you think about this ASIC ? https://shop.bitmain.com/product/detail?pid=00020180503154806494uGcSyiu806FDI said that I do not come to promote them, I did not buy ASIC but it is quite clear that this case will pose a huge problem. Much worse than that of the ETH miner of the same company. Only the Monéro community reacted in opposition to the bitmain ASICS. I simply wish to have your opinion on this issue which may make GPU mining obsolete on this algorithm rather quickly. At this rate, only Neoscrypt and Cryptonight V7 will remain to be mined via CPU and GPU. Not a good deal for decentralization as a cryptocurrency target. One rig 22mhX6 for 132MH drawing 680w VS 10Kh @300W is going to be a threat? Where? Oh and each unit does cost 2K and you can only have one for now. Hmmmmm. thay Yes, a threat to the profitability of every coins that uses the Equihash algo. The coins are massively produced for cheap which will reduce the profitability of RIGs graphics cards that consume much more and produce much less (as you indicate).The difficulty will also increase very strongly in a very short time. Yes, it costs $ 2,000 for now but how many KSol can you produce with a rig at this price? Certainly much less. So I'm not convinced by your answer a little too evasive.
|
|
|
|
CryptAtomeTrader44
Full Member
Offline
Activity: 340
Merit: 103
It is easier to break an atom than partialities AE
|
|
May 06, 2018, 09:41:36 AM Last edit: May 06, 2018, 12:09:38 PM by CryptAtomeTrader44 |
|
Bad news about Equihash : Bitmain launch Equihash ASIC this day. 10 KSol/s consumming only 300W ! What do you think about this ASIC ? https://shop.bitmain.com/product/detail?pid=00020180503154806494uGcSyiu806FDI said that I do not come to promote them, I did not buy ASIC but it is quite clear that this case will pose a huge problem. Much worse than that of the ETH miner of the same company. Only the Monéro community reacted in opposition to the bitmain ASICS. I simply wish to have your opinion on this issue which may make GPU mining obsolete on this algorithm rather quickly. At this rate, only Neoscrypt and Cryptonight V7 will remain to be mined via CPU and GPU. Not a good deal for decentralization as a cryptocurrency target. One rig 22mhX6 for 132MH drawing 680w VS 10Kh @300W is going to be a threat? Where? Oh and each unit does cost 2K and you can only have one for now. Hmmmmm. thay Its equihash not ethash ... For Ethash there is the Antiminer E3. Price and power consumption is not radically better than a GPU rig with an equivalent hash rate Absolutely. The comparison with their E3 for ethash does not hold. Electrical consumption for EThash is close to consumption of the GPU Rigs for a price roughly equivalent. In fact the E3 only saves space, so increase the density of mining per square meter. But with the Z9, it is possible to win on almost every table (yes, we have no information about the noise of this new ASIC). You gain in surface density, in electrical consumption (10,000 / 300 = 33.3Sol / Watt while the GPUs are at 5 Sol / W maximum) and temporal (hash per second).
|
|
|
|
terex
Newbie
Offline
Activity: 7
Merit: 0
|
|
May 07, 2018, 05:07:06 PM |
|
Hi Pap, I saw the MSFT ccminer for RVN in the miners update, I tried the code and it segmentation faults pretty quickly - I am running suprminer(no fee), it runs without issue - Tried it on two different frames, its been stable. https://github.com/ocminer/suprminer.git./suprminer/build.sh Don Test frame. ID,VENDOR,MODEL,PSTATE,TEMP,FAN,UTILIZATION,POWER,POWERLIMIT,MAXPOWER,GPUCLOCK,MEMCLOCK -------------------------------------------------------------------------------- 0, GIGABYTE, P106-100, P0, 49, 50, 100, 67.39, 85.00, 140.00, 1493, 3905 1, ASUS, P106-100, P0, 54, 50, 100, 66.95, 85.00, 140.00, 1594, 3905 2, ASUS, P106-100, P0, 41, 50, 99, 55.56, 85.00, 140.00, 1493, 3905 3, ASUS, P106-100, P0, 54, 50, 100, 51.85, 85.00, 140.00, 1657, 3905 4, ASUS, P106-100, P0, 56, 50, 99, 96.83, 85.00, 140.00, 1721, 3905 5, ASUS, P106-100, P0, 54, 50, 100, 57.54, 85.00, 140.00, 1493, 3905 6, ASUS, P106-100, P0, 57, 50, 100, 69.79, 85.00, 140.00, 1620, 3905 7, ASUS, P106-100, P0, 55, 50, 99, 61.98, 85.00, 140.00, 1493, 3905 8, MSI, P106-100, P0, 52, 50, 100, 51.64, 85.00, 140.00, 1468, 3905 9, MSI, P106-100, P0, 51, 50, 100, 54.11, 85.00, 140.00, 1493, 3905 10, MSI, P106-100, P0, 49, 50, 100, 52.74, 85.00, 140.00, 1468, 3905
|
|
|
|
urnzwy
Newbie
Offline
Activity: 44
Merit: 0
|
|
May 08, 2018, 05:45:15 AM |
|
I would like to give a special shout out to Stubo for so far helping me solve my crashing issue. Through multiple replies within the forum and additional through PM. Neither of my two 13 card Nvidia rigs has crashed in over 33 hours. Up from 12-18ish hours until crashing and freezing.
I had both machines at a 0/0 core/memory overlock running at 120W of power (1070 / 1070 Ti's) and 175W (1080 / 1080Ti).
I did not re-name the rigs, so both were named m1@m1-Desktop. This may have been causing an issue with the DNS registration (correct me if I am wrong Stubo).
The second thing I did was set my core to -100 and memory to +700 for mining ETH. The negative core suggested by Stubo. But keep the same power input. Nothing else was modified or changed.
I not only had an increase in my Mh/s (from 27 to 29.5) but the rigs seem stable thus far.
I will keep everyone posted if I come across anything else as I plan on setting back the original lack of overclock to see if there was any effect there or if the non-renaming was the issue.
|
|
|
|
papampi
Full Member
Offline
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
|
|
May 08, 2018, 07:42:07 AM |
|
Hi Pap, I saw the MSFT ccminer for RVN in the miners update, I tried the code and it segmentation faults pretty quickly - I am running suprminer(no fee), it runs without issue - Tried it on two different frames, its been stable. https://github.com/ocminer/suprminer.git./suprminer/build.sh Don Test frame. ID,VENDOR,MODEL,PSTATE,TEMP,FAN,UTILIZATION,POWER,POWERLIMIT,MAXPOWER,GPUCLOCK,MEMCLOCK -------------------------------------------------------------------------------- 0, GIGABYTE, P106-100, P0, 49, 50, 100, 67.39, 85.00, 140.00, 1493, 3905 1, ASUS, P106-100, P0, 54, 50, 100, 66.95, 85.00, 140.00, 1594, 3905 2, ASUS, P106-100, P0, 41, 50, 99, 55.56, 85.00, 140.00, 1493, 3905 3, ASUS, P106-100, P0, 54, 50, 100, 51.85, 85.00, 140.00, 1657, 3905 4, ASUS, P106-100, P0, 56, 50, 99, 96.83, 85.00, 140.00, 1721, 3905 5, ASUS, P106-100, P0, 54, 50, 100, 57.54, 85.00, 140.00, 1493, 3905 6, ASUS, P106-100, P0, 57, 50, 100, 69.79, 85.00, 140.00, 1620, 3905 7, ASUS, P106-100, P0, 55, 50, 99, 61.98, 85.00, 140.00, 1493, 3905 8, MSI, P106-100, P0, 52, 50, 100, 51.64, 85.00, 140.00, 1468, 3905 9, MSI, P106-100, P0, 51, 50, 100, 54.11, 85.00, 140.00, 1493, 3905 10, MSI, P106-100, P0, 49, 50, 100, 52.74, 85.00, 140.00, 1468, 3905 Yes suprminer is better than MSFT and better than that is Z-Enemy X16R - RVN - Miner head to head test logZealot/Enemy (z-enemy) NVIDIA GPU miner.
|
|
|
|
kostik2022
Newbie
Offline
Activity: 2
Merit: 0
|
|
May 08, 2018, 10:48:14 AM |
|
Hello all Got a strange issue. Have a new installation of nvOC. Farm intend to run in headless mode. After running nvidia-xconfig -a --allow-empty-initial-configuration --cool-bits=28 --use-display-device="DFP-0" --connected-monitor="DFP-0" the system reboots, & I got a message "Xorg problems" and then system reboots automatically in 5 seconds. After reboot xorg.conf looks cutted and not correct, and thabks this no xorg processes are running. In fact have no idea how to fix it. Please, help with advice
|
|
|
|
papampi
Full Member
Offline
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
|
|
May 08, 2018, 11:07:51 AM |
|
Hello all Got a strange issue. Have a new installation of nvOC. Farm intend to run in headless mode. After running nvidia-xconfig -a --allow-empty-initial-configuration --cool-bits=28 --use-display-device="DFP-0" --connected-monitor="DFP-0" the system reboots, & I got a message "Xorg problems" and then system reboots automatically in 5 seconds. After reboot xorg.conf looks cutted and not correct, and thabks this no xorg processes are running. In fact have no idea how to fix it. Please, help with advice As soon as rig started, close gnome-terminal Set p106 headless mode in 1bash to yes, so 3main wont check the xorg.conf and restore from backup. Put XORG_UPDATED in /home/m1/xorg_flag echo "XORG_UPDATED" > /home/m1/xorg_flag Then run your nvidia-xconfig and reboot. Hope it helps. Edit: If your cards are not p106, open 3main and change: if grep -q "P106-100" /tmp/tempa; then ___1050_or_1050ti="YES" P106_100="YES" fi To: if grep -q -E 'P106|P104|P102' /tmp/tempa; then ___1050_or_1050ti="YES" P106_100="YES" fi
|
|
|
|
|
papampi
Full Member
Offline
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
|
|
May 09, 2018, 06:04:45 PM |
|
I got a new problem.. ugh I get and it freezes and i got to manually restart. It helps if i close the miner_temp _control, until restarts the miner and loaded again. Did you changed the cards slot or added new one? I think thats a xorg problem. Restore xorg with: sudo wget -N https://raw.githubusercontent.com/papampi/nvOC_by_fullzero_Community_Release/19-2.1/xorg.conf -O /etc/X11/xorg.conf.default sudo cp '/etc/X11/xorg.conf.default' '/etc/X11/xorg.conf' sudo cp '/etc/X11/xorg.conf.default' '/etc/X11/xorg.conf.backup' sudo reboot
|
|
|
|
infowire
Newbie
Offline
Activity: 96
Merit: 0
|
|
May 10, 2018, 03:58:32 PM Last edit: May 11, 2018, 02:13:18 AM by infowire |
|
Threshold and utilization relate to memory or mem and core? Utilization is too low GPUs below threshold i have disabled -di through claymore, maybe that causes the reboot the system. How do you disable cards in nvOC? also i get this ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment '[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).
ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment '[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).
|
|
|
|
papampi
Full Member
Offline
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
|
|
May 11, 2018, 08:19:25 AM |
|
Threshold and utilization relate to memory or mem and core? Utilization is too low GPUs below threshold i have disabled -di through claymore, maybe that causes the reboot the system. How do you disable cards in nvOC? also i get this ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment '[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).
ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment '[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).
Those errors are usually xorg problem, did you restore nvOC default xorg as I posted? Threshold is the minimum GPU utilization before watchdog triggers There is no option in nvoc to disable a GPU yet But you can raise the number of GPU below threshold check in watchdog Open 5watchdog find this line and change 0 to the number of your disabled cards in miner if [ $NUM_GPU_BLW_THRSHLD -gt 0 ]
So if you disabled 2 cards in miner command it will be : if [ $NUM_GPU_BLW_THRSHLD -gt 2 ]
|
|
|
|
infowire
Newbie
Offline
Activity: 96
Merit: 0
|
|
May 11, 2018, 12:59:06 PM |
|
[/quote] Those errors are usually xorg problem, did you restore nvOC default xorg as I posted? Threshold is the minimum GPU utilization before watchdog triggers There is no option in nvoc to disable a GPU yet But you can raise the number of GPU below threshold check in watchdog Open 5watchdog find this line and change 0 to the number of your disabled cards in miner if [ $NUM_GPU_BLW_THRSHLD -gt 0 ]
So if you disabled 2 cards in miner command it will be : if [ $NUM_GPU_BLW_THRSHLD -gt 2 ]
[/quote] i did do the xorg. I will try the gpu threshold number thank you.
|
|
|
|
martyroz
|
|
May 11, 2018, 01:33:34 PM |
|
What's the current state of this community release? Is there support for X16R algo? (raven)
|
|
|
|
papampi
Full Member
Offline
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
|
|
May 11, 2018, 01:50:01 PM |
|
What's the current state of this community release? Is there support for X16R algo? (raven)
Add to 0miner and 1bash: 0miner: if [ $COIN == "RVN" ] then HCD='/home/m1/z-enemy/z-enemy_miner' ADDR="$RVN_ADDRESS.$RVN_WORKER" screen -dmSL miner $HCD -a x16r -o stratum+tcp://$RVN_POOL:$RVN_PORT -u $ADDR -p $MINER_PWD -i $RVN_INTENSITY fi 1bash: RVN_WORKER="$WORKERNAME" RVN_ADDRESS="Account name or RVN_address" RVN_POOL="pool address without startum+tcp:// " RVN_PORT="pool port" RVN_INTENSITY="19" Run: mkdir -p /home/m1/z-enemy/ wget -O- https://raw.githubusercontent.com/papampi/nvOC_miners/master/z-enemy/z-enemy-1.09a-cuda80.tar.gz | tar -xzC /home/m1/z-enemy/ --strip 1 chmod a+x /home/m1/z-enemy/z-enemy_miner Change coin in 1bash, and restart miner with: watchdog will catch no miner running and will restart it
|
|
|
|
infowire
Newbie
Offline
Activity: 96
Merit: 0
|
|
May 12, 2018, 01:36:04 AM Last edit: May 12, 2018, 02:52:16 AM by infowire |
|
Still getting the same arrows with doing default xorg. Don't seem to do anything as far as crashes go. ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment '[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).
ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment '[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).
Is it ok if i leave the setting like this ? I tried my cards by the type of card, they seems to be alright 4 hours. Now im going to see if they play nice with each other. if [ $NUM_GPU_BLW_THRSHLD -gt 12 ]
EDIT: All the cards temperature control hang again. Blah.
|
|
|
|
papampi
Full Member
Offline
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
|
|
May 12, 2018, 05:47:45 AM Last edit: May 12, 2018, 09:19:34 AM by papampi |
|
Still getting the same arrows with doing default xorg. Don't seem to do anything as far as crashes go. ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment '[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).
ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment '[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).
Is it ok if i leave the setting like this ? I tried my cards by the type of card, they seems to be alright 4 hours. Now im going to see if they play nice with each other. if [ $NUM_GPU_BLW_THRSHLD -gt 12 ]
EDIT: All the cards temperature control hang again. Blah. That error is from tempcontrol not watcdog, Its weird, I had this error once when I added some cards to a rig, but restoring default xorg solved it. May be try to re-install the image Edit: Are you using salfter switcher?
|
|
|
|
|