papampi
Full Member
Offline
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
|
|
October 28, 2017, 09:40:00 AM |
|
Hello,
I am trying to find out which GPU is crashing ethminer (I am mining ETH and I am tuning the individual overclocking for each card), and for this I would like to check the ethminer log files. Do I need to just look at the Ubuntu System logs or do I need to look for a specific log file?
I tried to collect the output of ethminer by editing 3main and appending > ~/ethminerlog.log to the line invoking ethminer, but the output is not redirected to the file.
Any idea? how to do this?
Cheers
Have a look at web info You should get some info like this : Watch Dog Alerts: WARNING: Sat Oct 28 10:34:16 IRST 2017 - GPU under threshold found - GPU UTILIZATION: 99 99 100 99 96 99 86 Or you can add L to your miner arguments if you are on older versions like this : screen -dmSL miner $HCD .... then check live log data with or full log with :
|
|
|
|
papampi
Full Member
Offline
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
|
|
October 28, 2017, 09:45:23 AM |
|
Hello,
I am trying to find out which GPU is crashing ethminer (I am mining ETH and I am tuning the individual overclocking for each card), and for this I would like to check the ethminer log files. Do I need to just look at the Ubuntu System logs or do I need to look for a specific log file?
I tried to collect the output of ethminer by editing 3main and appending > ~/ethminerlog.log to the line invoking ethminer, but the output is not redirected to the file.
Any idea? how to do this?
Cheers
Assuming your miner is launched with the "screen" command like most, you should be able to change that command to add the L flag for logging: screen -dmS miner .... to screen -dmSL miner .... should do the trick. I haven't done it but apparently it will write a file named 'screenlog.0'. Hope this helps. @Stubo You were faster than me
|
|
|
|
stef_stef
|
|
October 28, 2017, 03:31:02 PM |
|
Someone to let me know why I am greeted with this screen? Had some problems with OS crashing, and got this screen. Cause I couldn't care less to resolve the issue, I made anther usb (version 19), and again... Miner crashes, PC restarts and I cannot login. Tried using miner1, password, admin, root and a blank password, but nothing works.
|
|
|
|
wi$em@n
Newbie
Offline
Activity: 46
Merit: 0
|
|
October 28, 2017, 03:38:34 PM |
|
Someone to let me know why I am greeted with this screen? Had some problems with OS crashing, and got this screen. Cause I couldn't care less to resolve the issue, I made anther usb (version 19), and again... Miner crashes, PC restarts and I cannot login. Tried using miner1, password, admin, root and a blank password, but nothing works. https://image.prntscr.com/image/DG-gXiXFTS6I2HgZUX4DiA.pngLook at the page 246, SSH to your rig and: $ sudo dpkg --configure -a $ sudo reboot
|
|
|
|
WaveFront
Member
Offline
Activity: 126
Merit: 10
|
|
October 28, 2017, 08:19:05 PM |
|
@papampi and @stubo Thank you so much for your answers :-) I can see the logs perfectly now. Just another question In 5_restartlog I get an error "GPU under threshold found" message every 3 to 5 seconds (see below). Can I do something about this? I am running the latest version of nvOC v0019-1.3 GPU UTILIZATION: 97 100 93 100 100 GPU_COUNT: 5 GPU UTILIZATION: 99 97 100 97 100 GPU_COUNT: 5 GPU UTILIZATION: 93 100 100 93 91 GPU_COUNT: 5 GPU UTILIZATION: 96 99 100 97 82 Sat Oct 28 18:38:37 CEST 2017 - GPU under threshold found - GPU UTILIZATION: 96 99 100 97 82 GPU_COUNT: 5 GPU UTILIZATION: 96 100 100 100 89 Sat Oct 28 18:38:47 CEST 2017 - GPU under threshold found - GPU UTILIZATION: 96 100 100 100 89 GPU_COUNT: 5 GPU UTILIZATION: 100 96 98 100 98 GPU_COUNT: 5 GPU UTILIZATION: 100 100 100 100 96
|
|
|
|
Stubo
Member
Offline
Activity: 224
Merit: 13
|
|
October 28, 2017, 08:44:30 PM |
|
@papampi and @stubo Thank you so much for your answers :-) I can see the logs perfectly now. Just another question In 5_restartlog I get an error "GPU under threshold found" message every 3 to 5 seconds (see below). Can I do something about this? I am running the latest version of nvOC v0019-1.3 GPU UTILIZATION: 97 100 93 100 100 GPU_COUNT: 5 GPU UTILIZATION: 99 97 100 97 100 GPU_COUNT: 5 GPU UTILIZATION: 93 100 100 93 91 GPU_COUNT: 5 GPU UTILIZATION: 96 99 100 97 82 Sat Oct 28 18:38:37 CEST 2017 - GPU under threshold found - GPU UTILIZATION: 96 99 100 97 82 GPU_COUNT: 5 GPU UTILIZATION: 96 100 100 100 89 Sat Oct 28 18:38:47 CEST 2017 - GPU under threshold found - GPU UTILIZATION: 96 100 100 100 89 GPU_COUNT: 5 GPU UTILIZATION: 100 96 98 100 98 GPU_COUNT: 5 GPU UTILIZATION: 100 100 100 100 96 That comes from the watchdog (screen -r wdog) script 'IAmNotAJeep_and_Maxximus007_WATCHDOG'. On line 34, you will see the threshold set: So you could lower the threshold (I am just throwing it out there) or better yet, figure out why one GPU is not performing like the others. What are your GPUs and are all they all identical?
|
|
|
|
WaveFront
Member
Offline
Activity: 126
Merit: 10
|
|
October 28, 2017, 09:13:36 PM |
|
So you could lower the threshold (I am just throwing it out there) or better yet, figure out why one GPU is not performing like the others. What are your GPUs and are all they all identical? They are all GTX 1060, although from different manufacturers and versions. 2 with double fan and 3 with sigle fan. The errors are coming randomly for any of the cards. Not sure if it is normal or not
|
|
|
|
Stubo
Member
Offline
Activity: 224
Merit: 13
|
|
October 28, 2017, 09:30:57 PM |
|
So you could lower the threshold (I am just throwing it out there) or better yet, figure out why one GPU is not performing like the others. What are your GPUs and are all they all identical? They are all GTX 1060, although from different manufacturers and versions. 2 with double fan and 3 with sigle fan. The errors are coming randomly for any of the cards. Not sure if it is normal or not That doesn't sound normal. Unfortunately, I don't have any 1060's so I have no idea what the OC settings should be. In the meantime, just to keep mining with it how it is, you may want to disable the watchdog [in 1bash] which will keep restarting your miner as detects low GPU utilization. Hopefully somebody else can chime in and help you with the OC settings.
|
|
|
|
sergixc
Newbie
Offline
Activity: 32
Merit: 0
|
|
October 28, 2017, 11:21:58 PM |
|
Hi Check pls http://prntscr.com/h3azapwhat does that mean? the rig does not restart, could you please give an advice what to fix? Thank you in advance
|
|
|
|
codereddew12
Newbie
Offline
Activity: 36
Merit: 0
|
|
October 28, 2017, 11:32:41 PM |
|
After a restart, one of my rigs has a few GPUs where the minimum fan speed doesn't get set - i.e. min fan speed = 40; however, 1-3 GPUs are spinning @ 32%.
Granted they don't go over their temp limit, but they're constantly within +/- 1° without the fan speeding up (I have max temp diff set to 2). It's totally random at times which card(s) are affected, so I don't think there's something intrinsically wrong with any of the cards (MSI GTX 1070 x 7). Any idea why this could be happening? Not a big deal, just wanted to make sure that this isn't an omnious sign for anything.
|
|
|
|
kk003
Member
Offline
Activity: 117
Merit: 10
|
|
October 29, 2017, 12:56:48 AM |
|
So you could lower the threshold (I am just throwing it out there) or better yet, figure out why one GPU is not performing like the others. What are your GPUs and are all they all identical? They are all GTX 1060, although from different manufacturers and versions. 2 with double fan and 3 with sigle fan. The errors are coming randomly for any of the cards. Not sure if it is normal or not That doesn't sound normal. Unfortunately, I don't have any 1060's so I have no idea what the OC settings should be. In the meantime, just to keep mining with it how it is, you may want to disable the watchdog [in 1bash] which will keep restarting your miner as detects low GPU utilization. Hopefully somebody else can chime in and help you with the OC settings. My 13 gpu rig 1060 3Gb OC (this runs on centos 7): sudo nvidia-smi -pl 95 sudo nvidia-settings -a /GPUMemoryTransferRateOffset[3]=1500 sudo nvidia-settings -a [gpu:8]/GPUMemoryTransferRateOffset[3]=1400 and I don't touch the Graphics Clock here. My rig 3 1060 + 2 970. The 1060 settings: sudo nvidia-smi -pl 75 sudo nvidia-settings -a /GPUMemoryTransferRateOffset[3]=1500 sudo nvidia-settings -a /GPUGraphicsClockOffset[3]=-100 in 1bash would be: __CORE_OVERCLOCK_1=-100 MEMORY_OVERCLOCK_1=1500 __CORE_OVERCLOCK_2=-100 MEMORY_OVERCLOCK_2=1500 __CORE_OVERCLOCK_3=-100 MEMORY_OVERCLOCK_3=1300 for gpus 1,2,3 mining ETC around 24000Mh/s per gpu. Temp goes around 58-65 ºC Running time: 500 - 1000 hours hope help as reference ;-)
|
|
|
|
WaveFront
Member
Offline
Activity: 126
Merit: 10
|
|
October 29, 2017, 02:53:25 AM |
|
So you could lower the threshold (I am just throwing it out there) or better yet, figure out why one GPU is not performing like the others. What are your GPUs and are all they all identical? They are all GTX 1060, although from different manufacturers and versions. 2 with double fan and 3 with sigle fan. The errors are coming randomly for any of the cards. Not sure if it is normal or not That doesn't sound normal. Unfortunately, I don't have any 1060's so I have no idea what the OC settings should be. In the meantime, just to keep mining with it how it is, you may want to disable the watchdog [in 1bash] which will keep restarting your miner as detects low GPU utilization. Hopefully somebody else can chime in and help you with the OC settings. My 13 gpu rig 1060 3Gb OC (this runs on centos 7): sudo nvidia-smi -pl 95 sudo nvidia-settings -a /GPUMemoryTransferRateOffset[3]=1500 sudo nvidia-settings -a [gpu:8]/GPUMemoryTransferRateOffset[3]=1400 and I don't touch the Graphics Clock here. My rig 3 1060 + 2 970. The 1060 settings: sudo nvidia-smi -pl 75 sudo nvidia-settings -a /GPUMemoryTransferRateOffset[3]=1500 sudo nvidia-settings -a /GPUGraphicsClockOffset[3]=-100 in 1bash would be: __CORE_OVERCLOCK_1=-100 MEMORY_OVERCLOCK_1=1500 __CORE_OVERCLOCK_2=-100 MEMORY_OVERCLOCK_2=1500 __CORE_OVERCLOCK_3=-100 MEMORY_OVERCLOCK_3=1300 for gpus 1,2,3 mining ETC around 24000Mh/s per gpu. Temp goes around 58-65 ºC Running time: 500 - 1000 hours hope help as reference ;-) It's quite interesting. Which driver version do you use? My GPU are all GTX 1060 6Gb, and in 1bash the setups are: POWERLIMIT_WATTS=80 __CORE_OVERCLOCK=100 MEMORY_OVERCLOCK=1050 I cannot get anywhere close to your settings. When the memory overclock get over 1150 I start to have crashes every 15 seconds or so. Unless it is a driver version problem I cannot see where the problem with my setup is.
|
|
|
|
WaveFront
Member
Offline
Activity: 126
Merit: 10
|
|
October 29, 2017, 08:11:17 AM |
|
So you could lower the threshold (I am just throwing it out there) or better yet, figure out why one GPU is not performing like the others. What are your GPUs and are all they all identical? They are all GTX 1060, although from different manufacturers and versions. 2 with double fan and 3 with sigle fan. The errors are coming randomly for any of the cards. Not sure if it is normal or not That doesn't sound normal. Unfortunately, I don't have any 1060's so I have no idea what the OC settings should be. In the meantime, just to keep mining with it how it is, you may want to disable the watchdog [in 1bash] which will keep restarting your miner as detects low GPU utilization. Hopefully somebody else can chime in and help you with the OC settings. My 13 gpu rig 1060 3Gb OC (this runs on centos 7): sudo nvidia-smi -pl 95 sudo nvidia-settings -a /GPUMemoryTransferRateOffset[3]=1500 sudo nvidia-settings -a [gpu:8]/GPUMemoryTransferRateOffset[3]=1400 and I don't touch the Graphics Clock here. My rig 3 1060 + 2 970. The 1060 settings: sudo nvidia-smi -pl 75 sudo nvidia-settings -a /GPUMemoryTransferRateOffset[3]=1500 sudo nvidia-settings -a /GPUGraphicsClockOffset[3]=-100 in 1bash would be: __CORE_OVERCLOCK_1=-100 MEMORY_OVERCLOCK_1=1500 __CORE_OVERCLOCK_2=-100 MEMORY_OVERCLOCK_2=1500 __CORE_OVERCLOCK_3=-100 MEMORY_OVERCLOCK_3=1300 for gpus 1,2,3 mining ETC around 24000Mh/s per gpu. Temp goes around 58-65 ºC Running time: 500 - 1000 hours hope help as reference ;-) It's quite interesting. Which driver version do you use? My GPU are all GTX 1060 6Gb, and in 1bash the setups are: POWERLIMIT_WATTS=80 __CORE_OVERCLOCK=100 MEMORY_OVERCLOCK=1050 I cannot get anywhere close to your settings. When the memory overclock get over 1150 I start to have crashes every 15 seconds or so. Unless it is a driver version problem I cannot see where the problem with my setup is. I just read that not all memory behaves the same way and that I might have non Samsung memory on my GPUs. Is there an easy way to check for memory manufacturer on the GPUs from ubuntu?
|
|
|
|
papampi
Full Member
Offline
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
|
|
October 29, 2017, 08:24:20 AM |
|
So you could lower the threshold (I am just throwing it out there) or better yet, figure out why one GPU is not performing like the others. What are your GPUs and are all they all identical? They are all GTX 1060, although from different manufacturers and versions. 2 with double fan and 3 with sigle fan. The errors are coming randomly for any of the cards. Not sure if it is normal or not That doesn't sound normal. Unfortunately, I don't have any 1060's so I have no idea what the OC settings should be. In the meantime, just to keep mining with it how it is, you may want to disable the watchdog [in 1bash] which will keep restarting your miner as detects low GPU utilization. Hopefully somebody else can chime in and help you with the OC settings. My 13 gpu rig 1060 3Gb OC (this runs on centos 7): sudo nvidia-smi -pl 95 sudo nvidia-settings -a /GPUMemoryTransferRateOffset[3]=1500 sudo nvidia-settings -a [gpu:8]/GPUMemoryTransferRateOffset[3]=1400 and I don't touch the Graphics Clock here. My rig 3 1060 + 2 970. The 1060 settings: sudo nvidia-smi -pl 75 sudo nvidia-settings -a /GPUMemoryTransferRateOffset[3]=1500 sudo nvidia-settings -a /GPUGraphicsClockOffset[3]=-100 in 1bash would be: __CORE_OVERCLOCK_1=-100 MEMORY_OVERCLOCK_1=1500 __CORE_OVERCLOCK_2=-100 MEMORY_OVERCLOCK_2=1500 __CORE_OVERCLOCK_3=-100 MEMORY_OVERCLOCK_3=1300 for gpus 1,2,3 mining ETC around 24000Mh/s per gpu. Temp goes around 58-65 ºC Running time: 500 - 1000 hours hope help as reference ;-) It's quite interesting. Which driver version do you use? My GPU are all GTX 1060 6Gb, and in 1bash the setups are: POWERLIMIT_WATTS=80 __CORE_OVERCLOCK=100 MEMORY_OVERCLOCK=1050 I cannot get anywhere close to your settings. When the memory overclock get over 1150 I start to have crashes every 15 seconds or so. Unless it is a driver version problem I cannot see where the problem with my setup is. I just read that not all memory behaves the same way and that I might have non Samsung memory on my GPUs. Is there an easy way to check for memory manufacturer on the GPUs from ubuntu? Nope, only on windows with gpuz
|
|
|
|
WaveFront
Member
Offline
Activity: 126
Merit: 10
|
|
October 29, 2017, 09:01:56 AM |
|
Nope, only on windows with gpuz Hi Papampi, Thanks for the help. It's time to add an external drive with a Windows partition. At the price we are buying the GPU they'd better have Samsung memory :-D
|
|
|
|
Stubo
Member
Offline
Activity: 224
Merit: 13
|
|
October 29, 2017, 09:07:04 AM |
|
I just read that not all memory behaves the same way and that I might have non Samsung memory on my GPUs. Is there an easy way to check for memory manufacturer on the GPUs from ubuntu?
Nope, only on windows with gpuz Here is confirmation of that: https://devtalk.nvidia.com/default/topic/1018512/memory-brand-type/... and a few other things that we won't see on Linux.
|
|
|
|
kk003
Member
Offline
Activity: 117
Merit: 10
|
|
October 29, 2017, 09:36:41 AM |
|
So you could lower the threshold (I am just throwing it out there) or better yet, figure out why one GPU is not performing like the others. What are your GPUs and are all they all identical? They are all GTX 1060, although from different manufacturers and versions. 2 with double fan and 3 with sigle fan. The errors are coming randomly for any of the cards. Not sure if it is normal or not That doesn't sound normal. Unfortunately, I don't have any 1060's so I have no idea what the OC settings should be. In the meantime, just to keep mining with it how it is, you may want to disable the watchdog [in 1bash] which will keep restarting your miner as detects low GPU utilization. Hopefully somebody else can chime in and help you with the OC settings. My 13 gpu rig 1060 3Gb OC (this runs on centos 7): sudo nvidia-smi -pl 95 sudo nvidia-settings -a /GPUMemoryTransferRateOffset[3]=1500 sudo nvidia-settings -a [gpu:8]/GPUMemoryTransferRateOffset[3]=1400 and I don't touch the Graphics Clock here. My rig 3 1060 + 2 970. The 1060 settings: sudo nvidia-smi -pl 75 sudo nvidia-settings -a /GPUMemoryTransferRateOffset[3]=1500 sudo nvidia-settings -a /GPUGraphicsClockOffset[3]=-100 in 1bash would be: __CORE_OVERCLOCK_1=-100 MEMORY_OVERCLOCK_1=1500 __CORE_OVERCLOCK_2=-100 MEMORY_OVERCLOCK_2=1500 __CORE_OVERCLOCK_3=-100 MEMORY_OVERCLOCK_3=1300 for gpus 1,2,3 mining ETC around 24000Mh/s per gpu. Temp goes around 58-65 ºC Running time: 500 - 1000 hours hope help as reference ;-) It's quite interesting. Which driver version do you use? My GPU are all GTX 1060 6Gb, and in 1bash the setups are: POWERLIMIT_WATTS=80 __CORE_OVERCLOCK=100 MEMORY_OVERCLOCK=1050 I cannot get anywhere close to your settings. When the memory overclock get over 1150 I start to have crashes every 15 seconds or so. Unless it is a driver version problem I cannot see where the problem with my setup is. I just read that not all memory behaves the same way and that I might have non Samsung memory on my GPUs. Is there an easy way to check for memory manufacturer on the GPUs from ubuntu? Nope, only on windows with gpuz Driver Version: 384.59 and 384.90 my cards have Samsung memory
|
|
|
|
woodl1
Newbie
Offline
Activity: 15
Merit: 0
|
|
October 29, 2017, 12:00:42 PM |
|
Hi again, fullzero and all I've got a question for you. Some of my rigs now have a problem booting from USB stick. I have many different sticks but after a month or two of normal booting some of them started to fail on booting process, I/O errors etc... I understand that this is a quality problem in most cases, but also I think it's the usecase problem. So now I'm thinking of nvOC/rxOC mod that could do network booting to completely exclude this kind of issues while using these distros. The question is - have someone tried to boot nvOC from network? If yes, please share your experience to us!
|
|
|
|
papampi
Full Member
Offline
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
|
|
October 29, 2017, 12:52:38 PM Last edit: October 29, 2017, 01:09:03 PM by papampi |
|
So you could lower the threshold (I am just throwing it out there) or better yet, figure out why one GPU is not performing like the others. What are your GPUs and are all they all identical? They are all GTX 1060, although from different manufacturers and versions. 2 with double fan and 3 with sigle fan. The errors are coming randomly for any of the cards. Not sure if it is normal or not That doesn't sound normal. Unfortunately, I don't have any 1060's so I have no idea what the OC settings should be. In the meantime, just to keep mining with it how it is, you may want to disable the watchdog [in 1bash] which will keep restarting your miner as detects low GPU utilization. Hopefully somebody else can chime in and help you with the OC settings. My 13 gpu rig 1060 3Gb OC (this runs on centos 7): sudo nvidia-smi -pl 95 sudo nvidia-settings -a /GPUMemoryTransferRateOffset[3]=1500 sudo nvidia-settings -a [gpu:8]/GPUMemoryTransferRateOffset[3]=1400 and I don't touch the Graphics Clock here. My rig 3 1060 + 2 970. The 1060 settings: sudo nvidia-smi -pl 75 sudo nvidia-settings -a /GPUMemoryTransferRateOffset[3]=1500 sudo nvidia-settings -a /GPUGraphicsClockOffset[3]=-100 in 1bash would be: __CORE_OVERCLOCK_1=-100 MEMORY_OVERCLOCK_1=1500 __CORE_OVERCLOCK_2=-100 MEMORY_OVERCLOCK_2=1500 __CORE_OVERCLOCK_3=-100 MEMORY_OVERCLOCK_3=1300 for gpus 1,2,3 mining ETC around 24000Mh/s per gpu. Temp goes around 58-65 ºC Running time: 500 - 1000 hours hope help as reference ;-) 1060 3Gb has same hash rate as 6 Gb on both equihash and ethash? Is it even possible to mine eth with 3 Gb?
|
|
|
|
papampi
Full Member
Offline
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
|
|
October 29, 2017, 12:56:18 PM |
|
Hi again, fullzero and all I've got a question for you. Some of my rigs now have a problem booting from USB stick. I have many different sticks but after a month or two of normal booting some of them started to fail on booting process, I/O errors etc... I understand that this is a quality problem in most cases, but also I think it's the usecase problem. So now I'm thinking of nvOC/rxOC mod that could do network booting to completely exclude this kind of issues while using these distros. The question is - have someone tried to boot nvOC from network? If yes, please share your experience to us! Just get a 30$ SSD
|
|
|
|
|