fullzero (OP)
Legendary
Offline
Activity: 1260
Merit: 1009
|
|
October 25, 2017, 08:42:35 PM |
|
Max GPUs i have in one rig is 13. In case this doesn't help we need to see nvidia-xconfig output and xorg.conf that it makes, so we can try to find what's wrong.
Also you can try to reflash the latest version of nvOC to usb stick, edit 1bash at windows partition (don't forget to enable headless mode) and make the first boot from usb stick without monitor attached. After the first boot in headless mode you need to wait for autoconfiguration and auto reboot. Then use network scanner or see DHCP leases on the server side to find the IP of your rig (in most cases it will be the same as in previous attempts), ssh to it and see if xorg was correctly configured.
Thank you very much for your help! My rig is now running with all those settings: New USB, Headless, no monitor attached, know IP, have SSH and mining. Here's the weird thing: See below for my current xorg.conf that is running and I'm getting all 19 GPUs mining but only the 1st one is OC'd. What's weird about it is that: When I run sudo nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration I get a new xorg.conf that does two things: Bus IDs lose the "@" symbol, and mining does not work. Any ideas why some BusIDs use the @ while most I've seen don't, and why this seems to be allowing mining to run? Also weird is that if you notice, there are only BusIDs for 0-3 (4 and 5 are missing), then it continues from 6-11 and that's it. What are your thoughts on this? Current xorg.conf: Section "ServerLayout" Identifier "layout" Screen 0 "nvidia" Inactive "intel" EndSection
Section "Device" Identifier "intel" Driver "modesetting" BusID "PCI:0@0:2:0" Option "AccelMethod" "None" EndSection
Section "Screen" Identifier "intel" Device "intel" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:1@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:2@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:3@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:6@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:7@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:8@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:9@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:10@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:11@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
current lspci: 00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM 00:01.0 PCI bridge: Intel Corporation Sky Lake PCIe Controller ( 00:02.0 VGA compatible controller: Intel Corporation Device 1902 00:14.0 USB controller: Intel Corporation Device a2af 00:16.0 Communication controller: Intel Corporation Device a2ba 00:1b.0 PCI bridge: Intel Corporation Device a2eb (rev f0) 00:1b.5 PCI bridge: Intel Corporation Device a2ec (rev f0) 00:1b.6 PCI bridge: Intel Corporation Device a2ed (rev f0) 00:1b.7 PCI bridge: Intel Corporation Device a2ee (rev f0) 00:1c.0 PCI bridge: Intel Corporation Device a294 (rev f0) 00:1c.5 PCI bridge: Intel Corporation Device a295 (rev f0) 00:1c.6 PCI bridge: Intel Corporation Device a296 (rev f0) 00:1c.7 PCI bridge: Intel Corporation Device a297 (rev f0) 00:1d.0 PCI bridge: Intel Corporation Device a298 (rev f0) 00:1d.1 PCI bridge: Intel Corporation Device a299 (rev f0) 00:1d.2 PCI bridge: Intel Corporation Device a29a (rev f0) 00:1d.3 PCI bridge: Intel Corporation Device a29b (rev f0) 00:1f.0 ISA bridge: Intel Corporation Device a2c8 00:1f.2 Memory controller: Intel Corporation Device a2a1 00:1f.4 SMBus: Intel Corporation Device a2a3 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connecti 01:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 02:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 03:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 04:00.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:01.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:02.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:03.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:04.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:05.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:06.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:07.0 PCI bridge: ASMedia Technology Inc. Device 1187 06:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 07:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 08:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 09:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 0a:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 0b:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 0c:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 0d:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 0e:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 0f:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 10:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 11:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 12:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 13:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 14:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 15:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
try: sudo nvidia-xconfig -a --cool-bits=12 --allow-empty-initial-configuration sudo reboot
|
|
|
|
fullzero (OP)
Legendary
Offline
Activity: 1260
Merit: 1009
|
|
October 25, 2017, 08:43:26 PM |
|
Vi is the best and powerfull text editor that i 'd never seen. Notepad++ is a beautiful toy for a windowsian, but never, never powerful as vi
:-)
Agreed. Regex is right there. I have been using vim as of late for syntax highlighting but I have to install it myself: sudu apt-get install vim -y It would be nice if @fullzero could include it in v. 20. will do for -1.4
|
|
|
|
fullzero (OP)
Legendary
Offline
Activity: 1260
Merit: 1009
|
|
October 25, 2017, 08:45:21 PM |
|
Hi fullzero, Hi Guys, (Long time i was in r/o mode. This is my first post.) What you think about implementing into NVOC nvidia-docker and run miner for each isolated GPU in docker container? It will be more fail-safe for rig with many GPU`s, more flexible to individual OC and give a way to launch several different miners. for example: One of my rig have 13 GPU different models => OC settings are different for each others. If one GPU crash then crash all GPU`s and whatchdog restart miner. In my case restartin 3main takes wery long time, actually wery much time takes running nvidia-settings. I was forced to set in sleep timeout in wdog equal to 120s otherwise i have loop when wdog restarts 3mian. p.s. exuse me for poor english I would have to look at how well the docker containers interface with the hardware. I have used docker containers before; but I'm not sure if this is viable; my guess is it is not.
|
|
|
|
fullzero (OP)
Legendary
Offline
Activity: 1260
Merit: 1009
|
|
October 25, 2017, 08:48:24 PM |
|
I an running into an issue mining VTC. I am running my own pool and for some reason it is not accepting any shares if there is a worker name present. If it is just the wallet address it works fine. I cant seem to get nvOC to use no worker name, I have tried to use the wallet as the name and to hash it out completely and it still inputs "." as the name. Has anyone else ran into this? Can you disable worker names somehow?
Thanks
pool probably uses a /workername isntead of a .workername that is used in the 3main implementation. Open 3main in gedit and press and alter: ADDR="$VTC_ADDRESS.$VTC_WORKER"
or
ADDR="$BTC_ADDRESS.$VTC_WORKER" to: ADDR="$VTC_ADDRESS/$VTC_WORKER"
or
ADDR="$BTC_ADDRESS/$VTC_WORKER"
|
|
|
|
fullzero (OP)
Legendary
Offline
Activity: 1260
Merit: 1009
|
|
October 25, 2017, 08:50:59 PM |
|
Hi, just moved to NVOC, loving it. I had to mess around with what miner the bash file was using as it was not installed. I have a really noob question if someone could help me please.
1. How do I set up supernova pools that require a password and user id an example would be great thanks
2. Can I set up spill over pools?
Many thanks
1: If you haven't already; join the nvOC discord and damNmad has examples there. 2: I will add auto pool failover + failover pool variables to 1bash for v0020
|
|
|
|
fullzero (OP)
Legendary
Offline
Activity: 1260
Merit: 1009
|
|
October 25, 2017, 08:51:57 PM |
|
Need help: Rig has been running for like a month no issues, got this https://ibb.co/i85CWR error 2 days ago.. system will restart and go into Linux but won't pass the login screen, inserted minrr1 password, it works, doesn't say password is wrong but won't load it up.. fine, I reinstalled a fresh copy of nvoc on the hard drive and it worked for 24 hours and now I'm getting the same exact message and Won't go pass the login screen https://ibb.co/i85CWRsee: https://bitcointalk.org/index.php?topic=1854250.msg23535706#msg23535706
|
|
|
|
martyroz
|
|
October 25, 2017, 10:09:12 PM |
|
EDIT: OK, It's working. I had to change to EWBF miner and remove the $ from ZEN miner name How can I troubleshoot an issue where it seems that one or two GPU's aren't sucking as much power as they should. I have set all 1080ti's to 175W and the single 1070 to 125W; there is one 1080ti running at 149W. My first instinct is that I'm asking too much of my 1000W PSU
|
|
|
|
tomlev5
Newbie
Offline
Activity: 35
Merit: 0
|
|
October 25, 2017, 10:10:29 PM |
|
fullzero, you mentioned active cooling for 1080tis with 133 CFM 120mm fans. I’m interested in this solution, but I’m not sure if I will be able to put it together. Can you please share some details.
Do you remove original fans and cover so that only the cooler fins remain? How do you mount fans? How do you power the fans?
I leave the shroud (GPU fans and casing) on; although if you were super exacting it would probably be best to remove these; not worth the time IMO. The way my open frames are (see one of the demo videos); I just zip tie them to the front (blowing into the 1080tis) and power them via a 3x fan to molex adapter / then molex to whatever is available with the rigs PSU. I get fans from Hawkfish007 https://bitcointalk.org/index.php?action=profile;u=352509; he includes the 3x fan to molex adapters with the fans. OK, I think I get it now. You use double active cooling: original fans directly on the 1080tis + 120mm 133 CFM fans in the front of 1080tis. I couldn't see this setup on the demo videos; I watched v0014demo, v0018demo and v0019demo. I think there are no 1080tis in these videos, but I think I get the picture. Did anybody had to replace a failed original fan on a 1080ti? I think a good way would be to remove all fans and their casing and somehow mount a high pressure fan (Like Corsair SP120 High Static Pressure 2350 U/min). Some original fans on 1080tis are of low quality and will probably fail soon. Does anybody have some experience with these type of modifications?
|
|
|
|
tomlev5
Newbie
Offline
Activity: 35
Merit: 0
|
|
October 25, 2017, 10:26:01 PM Last edit: October 25, 2017, 10:41:27 PM by tomlev5 |
|
To prevent automatic updates in the future: use your favorite editor, open /etc/apt/apt.conf.d/10periodic and change: APT::Periodic::Update-Package-Lists "1"; To: APT::Periodic::Update-Package-Lists "0"; and /etc/apt/apt.conf.d/20auto-upgradesAPT::Periodic::Update-Package-Lists "1"; APT::Periodic::Unattended-Upgrade "1"; to APT::Periodic::Update-Package-Lists "0"; APT::Periodic::Unattended-Upgrade "0"; D ps: Seriously, why was this ON? Very nice suggestion Doodkeen, This is a community driven project and fullzero is doing the best job ever without asking for a dime (unlike other Linux mining distros) Some times in the middle of the road we miss some small points that are so obvious to others, So I'm kindly ask every Linux Guru here to do their best to look for improvements and fixes for nvOCFeel free to PM me or post in forum any suggestion you have. Thank you all. Yep, please feel free and provide the valuable feedback, so that we can make nvOC even better. Coming to the error, no one expected this sort of failure on this scale!! Hope we identify these sort of issues bit earlier. I think the problem is while Nvidia driver was updating wdog restarted the rig because of low utilization and that cause a miss configured Nvidia driver, I think solution is to prevent wdog from restarting the rig if dpkg is running.
Papampi, you are absolutely right I had this happen to 2 of my rigs. I will change all the update settings for -1.4 except security updates. My bad. fullzero, is there some way that I could completely disable all updates on nvOC: - disable automatic downloads of updates (drivers, system, nvOC) - disable automatic installation of updates (drivers, system, nvOC) I have a rig behind a firewall and absolutely no important data on a nvOC rig, so all I care is about stability. And stability is more fragile because of USB key usage. The logging is disabled by default, so there would be no writing to the USB key if I would disable updates. The nvOC is pretty stable even now, but I think that disabling all updates would help with the stability of the system. Of course I would also periodically update USB key with the whole image of nvOC, but only every month or so, when you publish the new version of stable nvOC.
|
|
|
|
cryptobadger666
Newbie
Offline
Activity: 10
Merit: 0
|
|
October 25, 2017, 11:32:21 PM |
|
it seems like most ETH pools require a email field, where you can put a email or passphrase in the miner, then you need to use that on the pool webpage to change payout settings..... is there a way to add this into the 1BASH file, or is there already a default that this os uses in that field? im beating my head against the wall trying to figure this out.
|
|
|
|
MentalNomad
Member
Offline
Activity: 83
Merit: 10
|
|
October 25, 2017, 11:47:15 PM |
|
How can I troubleshoot an issue where it seems that one or two GPU's aren't sucking as much power as they should. I have set all 1080ti's to 175W and the single 1070 to 125W; there is one 1080ti running at 149W. My first instinct is that I'm asking too much of my 1000W PSU When you set a power number, that's a power limit, not a power setting. If you're talking about GPU2 drawing less power... is it overclocked differently? And is it really identical hardware? Sometimes two cards look the same at a glance, but have different details and capabilities. I have a rig with all GTX 1070 cards, but they have different Min Power Limit and Max Power Limit capabilities.
|
|
|
|
ComputerGenie
|
|
October 26, 2017, 01:18:00 AM |
|
Or, perhaps that 149W is the result of that Ti protesting being underpowered by 75W
|
If you have to ask "why?", you wouldn`t understand my answer. Always be on the look out, because you never know when you'll be stalked by hit-men that eat nothing but cream cheese....
|
|
|
martyroz
|
|
October 26, 2017, 01:33:24 AM |
|
When you set a power number, that's a power limit, not a power setting.
If you're talking about GPU2 drawing less power... is it overclocked differently? And is it really identical hardware? Sometimes two cards look the same at a glance, but have different details and capabilities. I have a rig with all GTX 1070 cards, but they have different Min Power Limit and Max Power Limit capabilities.
Thanks for the response. So the question then is why is GPU2 not drawing more power? a 1080ti at 49°C / 590 sols They are not identical hardware. Due to various sales etc I have plugged in; 2 * ASUS 1080ti ROG STRIX 1 * EVGA 1080ti SC Black 1 * MSI 1080ti Duke OC 1 * Gigabyte 1070 (GPU1) All that is certain is that GPU1 is the 1070. I applied a modest OC of +50/+200 across the range.
|
|
|
|
Longsnowsm
|
|
October 26, 2017, 01:35:07 AM |
|
I have an older image of nvOC, and the machine is complaining it is running out of space on the USB. I looked and see a bunch of stuff in the /var/tmp, but do not have the perms apparently to clean that stuff up. What is the process for cleaning up the drive so that it doesn't run out of space? Thanks.
open guake and enter: sudo apt-get update
sudo apt-get autoremove
sudo apt-get autoclean should free up some space by removing unused system files. Thank you Fullzero. I will give that a try!
|
|
|
|
ComputerGenie
|
|
October 26, 2017, 03:40:50 AM |
|
.... I applied a modest OC of +50/+200 across the range.
At the risk of sounding like an ass... You limited the power of the card by -75W, overclocked it by +50, and you are surprised by a lowered output result
|
If you have to ask "why?", you wouldn`t understand my answer. Always be on the look out, because you never know when you'll be stalked by hit-men that eat nothing but cream cheese....
|
|
|
martyroz
|
|
October 26, 2017, 03:53:16 AM |
|
At the risk of sounding like an ass... You limited the power of the card by -75W, overclocked it by +50, and you are surprised by a lowered output result I have done the above for many other 1080ti's and this is the first one performing poorly (under 600 sols) My windows rig runs 1080ti's at 175W / +75core / +600mem and is rock solid at 680 sols. 580 sols is a clear anomaly amoungst 1080tis across 4 manufacturers.
|
|
|
|
ComputerGenie
|
|
October 26, 2017, 04:03:49 AM |
|
At the risk of sounding like an ass... You limited the power of the card by -75W, overclocked it by +50, and you are surprised by a lowered output result I have done the above for many other 1080ti's and this is the first one performing poorly (under 600 sols) My windows rig runs 1080ti's at 175W / +75core / +600mem and is rock solid at 680 sols. 580 sols is a clear anomaly amoungst 1080tis across 4 manufacturers. If you're going to undervolt and overclock, add 1 card at a time and find out what works best for that card; and then do so with each card (one at a time). Don't be afraid to lose 1 day doing benchmarks vs losing $800 because BrandX ModelY doesn't work with SettingW. You'll thank yourself in the long run.
|
If you have to ask "why?", you wouldn`t understand my answer. Always be on the look out, because you never know when you'll be stalked by hit-men that eat nothing but cream cheese....
|
|
|
martyroz
|
|
October 26, 2017, 04:09:31 AM |
|
If you're going to undervolt and overclock, add 1 card at a time and find out what works best for that card; and then do so with each card (one at a time). Don't be afraid to lose 1 day doing benchmarks vs losing $800 because BrandX ModelY doesn't work with SettingW. You'll thank yourself in the long run. Thanks for the comment. To be fair, that screenshot was taken with no OC applied. I didn't expect a 1080ti to behave like that by just limiting power to 175W. I will test tonight.
|
|
|
|
woodl1
Newbie
Offline
Activity: 15
Merit: 0
|
|
October 26, 2017, 08:36:53 AM |
|
Max GPUs i have in one rig is 13. In case this doesn't help we need to see nvidia-xconfig output and xorg.conf that it makes, so we can try to find what's wrong.
Also you can try to reflash the latest version of nvOC to usb stick, edit 1bash at windows partition (don't forget to enable headless mode) and make the first boot from usb stick without monitor attached. After the first boot in headless mode you need to wait for autoconfiguration and auto reboot. Then use network scanner or see DHCP leases on the server side to find the IP of your rig (in most cases it will be the same as in previous attempts), ssh to it and see if xorg was correctly configured.
Thank you very much for your help! My rig is now running with all those settings: New USB, Headless, no monitor attached, know IP, have SSH and mining. Here's the weird thing: See below for my current xorg.conf that is running and I'm getting all 19 GPUs mining but only the 1st one is OC'd. What's weird about it is that: When I run sudo nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration I get a new xorg.conf that does two things: Bus IDs lose the "@" symbol, and mining does not work. Any ideas why some BusIDs use the @ while most I've seen don't, and why this seems to be allowing mining to run? Also weird is that if you notice, there are only BusIDs for 0-3 (4 and 5 are missing), then it continues from 6-11 and that's it. What are your thoughts on this? Current xorg.conf: Section "ServerLayout" Identifier "layout" Screen 0 "nvidia" Inactive "intel" EndSection
Section "Device" Identifier "intel" Driver "modesetting" BusID "PCI:0@0:2:0" Option "AccelMethod" "None" EndSection
Section "Screen" Identifier "intel" Device "intel" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:1@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:2@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:3@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:6@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:7@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:8@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:9@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:10@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:11@0:0:0" Option "ConstrainCursor" "off" EndSection
Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "IgnoreDisplayDevices" "CRT" EndSection
current lspci: 00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM 00:01.0 PCI bridge: Intel Corporation Sky Lake PCIe Controller ( 00:02.0 VGA compatible controller: Intel Corporation Device 1902 00:14.0 USB controller: Intel Corporation Device a2af 00:16.0 Communication controller: Intel Corporation Device a2ba 00:1b.0 PCI bridge: Intel Corporation Device a2eb (rev f0) 00:1b.5 PCI bridge: Intel Corporation Device a2ec (rev f0) 00:1b.6 PCI bridge: Intel Corporation Device a2ed (rev f0) 00:1b.7 PCI bridge: Intel Corporation Device a2ee (rev f0) 00:1c.0 PCI bridge: Intel Corporation Device a294 (rev f0) 00:1c.5 PCI bridge: Intel Corporation Device a295 (rev f0) 00:1c.6 PCI bridge: Intel Corporation Device a296 (rev f0) 00:1c.7 PCI bridge: Intel Corporation Device a297 (rev f0) 00:1d.0 PCI bridge: Intel Corporation Device a298 (rev f0) 00:1d.1 PCI bridge: Intel Corporation Device a299 (rev f0) 00:1d.2 PCI bridge: Intel Corporation Device a29a (rev f0) 00:1d.3 PCI bridge: Intel Corporation Device a29b (rev f0) 00:1f.0 ISA bridge: Intel Corporation Device a2c8 00:1f.2 Memory controller: Intel Corporation Device a2a1 00:1f.4 SMBus: Intel Corporation Device a2a3 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connecti 01:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 02:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 03:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 04:00.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:01.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:02.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:03.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:04.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:05.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:06.0 PCI bridge: ASMedia Technology Inc. Device 1187 05:07.0 PCI bridge: ASMedia Technology Inc. Device 1187 06:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 07:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 08:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 09:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 0a:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 0b:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 0c:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 0d:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 0e:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 0f:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 10:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 11:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 12:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 13:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 14:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1) 15:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
as for BUS IDs naming, I don't have ideas why your rig using @, but if it works - let it be so The thing with 4-5 pci bus skipped is because other devices are using these buses, it's normal (you can see it with lspci - "04:00.0 PCI bridge: ASMedia Technology Inc. Device 1187" and so on). What I want to point in your current xorg.conf is that there is no Coolbits option in each Screen section. This option has to be set to overclock GPU, I suppose, so try out what fullzero suggested: sudo nvidia-xconfig -a --cool-bits=12 --allow-empty-initial-configuration sudo reboot
and if it doesn't help, try to add this option to each Screen section in your xorg.conf except the screen section with intel device. Here's xorg.conf from one of my working rigs, it got 7 GPUs, all P106-100. # nvidia-xconfig: X configuration file generated by nvidia-xconfig # nvidia-xconfig: version 384.59 (buildmeister@swio-display-x64-rhel04-01) Thu Jul 20 01:03:28 PDT 2017
Section "ServerLayout" Identifier "Layout0" Screen 0 "Screen0" Screen 1 "Screen1" RightOf "Screen0" Screen 2 "Screen2" RightOf "Screen1" Screen 3 "Screen3" RightOf "Screen2" Screen 4 "Screen4" RightOf "Screen3" Screen 5 "Screen5" RightOf "Screen4" Screen 6 "Screen6" RightOf "Screen5" InputDevice "Keyboard0" "CoreKeyboard" InputDevice "Mouse0" "CorePointer" EndSection
Section "Files" EndSection
Section "InputDevice" # generated from default Identifier "Mouse0" Driver "mouse" Option "Protocol" "auto" Option "Device" "/dev/psaux" Option "Emulate3Buttons" "no" Option "ZAxisMapping" "4 5" EndSection
Section "InputDevice" # generated from default Identifier "Keyboard0" Driver "kbd" EndSection
Section "Monitor" Identifier "Monitor0" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection
Section "Monitor" Identifier "Monitor1" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection
Section "Monitor" Identifier "Monitor2" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection
Section "Monitor" Identifier "Monitor3" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection
Section "Monitor" Identifier "Monitor4" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection
Section "Monitor" Identifier "Monitor5" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection
Section "Monitor" Identifier "Monitor6" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection
Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "P106-100" BusID "PCI:1:0:0" EndSection
Section "Device" Identifier "Device1" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "P106-100" BusID "PCI:2:0:0" EndSection
Section "Device" Identifier "Device2" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "P106-100" BusID "PCI:4:0:0" EndSection
Section "Device" Identifier "Device3" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "P106-100" BusID "PCI:5:0:0" EndSection
Section "Device" Identifier "Device4" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "P106-100" BusID "PCI:6:0:0" EndSection
Section "Device" Identifier "Device5" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "P106-100" BusID "PCI:7:0:0" EndSection
Section "Device" Identifier "Device6" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "P106-100" BusID "PCI:9:0:0" EndSection
Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "28" SubSection "Display" Depth 24 EndSubSection EndSection
Section "Screen" Identifier "Screen1" Device "Device1" Monitor "Monitor1" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "28" SubSection "Display" Depth 24 EndSubSection EndSection
Section "Screen" Identifier "Screen2" Device "Device2" Monitor "Monitor2" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "28" SubSection "Display" Depth 24 EndSubSection EndSection
Section "Screen" Identifier "Screen3" Device "Device3" Monitor "Monitor3" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "28" SubSection "Display" Depth 24 EndSubSection EndSection
Section "Screen" Identifier "Screen4" Device "Device4" Monitor "Monitor4" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "28" SubSection "Display" Depth 24 EndSubSection EndSection
Section "Screen" Identifier "Screen5" Device "Device5" Monitor "Monitor5" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "28" SubSection "Display" Depth 24 EndSubSection EndSection
Section "Screen" Identifier "Screen6" Device "Device6" Monitor "Monitor6" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "28" SubSection "Display" Depth 24 EndSubSection EndSection
|
|
|
|
martyroz
|
|
October 26, 2017, 09:08:57 AM |
|
How can I troubleshoot an issue where it seems that one or two GPU's aren't sucking as much power as they should. I have set all 1080ti's to 175W and the single 1070 to 125W; there is one 1080ti running at 149W.
OK. It looks like nvOC thinks that a 'MSI 1080ti DUKE OC' has a TDP of 280W. So when I set power limit to 175W, it factors that this is 62.5% of TDP. But TDP for this card is actually 250W and 62.5% of that is 156W. Hence seeing 149W. So as a workaround, I set power limit to 200W (71.4% of TDP) and it will now limit it to 178W or so - which is what I want.
|
|
|
|
|