Bitcoin Forum
April 24, 2024, 08:54:53 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 [249] 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 ... 417 »
  Print  
Author Topic: [OS] nvOC easy-to-use Linux Nvidia Mining  (Read 417953 times)
fullzero (OP)
Legendary
*
Offline Offline

Activity: 1260
Merit: 1009



View Profile
October 25, 2017, 08:42:35 PM
 #4961


Max GPUs i have in one rig is 13. In case this doesn't help we need to see nvidia-xconfig output and xorg.conf that it makes, so we can try to find what's wrong.

Also you can try to reflash the latest version of nvOC to usb stick, edit 1bash at windows partition (don't forget to enable headless mode) and make the first boot from usb stick without monitor attached. After the first boot in headless mode you need to wait for autoconfiguration and auto reboot. Then use network scanner or see DHCP leases on the server side to find the IP of your rig (in most cases it will be the same as in previous attempts), ssh to it and see if xorg was correctly configured.


Thank you very much for your help!
My rig is now running with all those settings: New USB, Headless, no monitor attached, know IP, have SSH and mining. Here's the weird thing: See below for my current xorg.conf that is running and I'm getting all 19 GPUs mining but only the 1st one is OC'd. What's weird about it is that:

When I run
Code:
sudo nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration
I get a new xorg.conf that does two things:
Bus IDs lose the "@" symbol, and mining does not work.

Any ideas why some BusIDs use the @ while most I've seen don't, and why this seems to be allowing mining to run?
Also weird is that if you notice, there are only BusIDs for 0-3 (4 and 5 are missing), then it continues from 6-11 and that's it.

What are your thoughts on this?

Current xorg.conf:
Code:
Section "ServerLayout"
    Identifier "layout"
    Screen 0 "nvidia"
    Inactive "intel"
EndSection

Section "Device"
    Identifier "intel"
    Driver "modesetting"
    BusID "PCI:0@0:2:0"
    Option "AccelMethod" "None"
EndSection

Section "Screen"
    Identifier "intel"
    Device "intel"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:1@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:2@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:3@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:6@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:7@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:8@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:9@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:10@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:11@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection



current lspci:
Code:
00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM
00:01.0 PCI bridge: Intel Corporation Sky Lake PCIe Controller (
00:02.0 VGA compatible controller: Intel Corporation Device 1902
00:14.0 USB controller: Intel Corporation Device a2af
00:16.0 Communication controller: Intel Corporation Device a2ba
00:1b.0 PCI bridge: Intel Corporation Device a2eb (rev f0)
00:1b.5 PCI bridge: Intel Corporation Device a2ec (rev f0)
00:1b.6 PCI bridge: Intel Corporation Device a2ed (rev f0)
00:1b.7 PCI bridge: Intel Corporation Device a2ee (rev f0)
00:1c.0 PCI bridge: Intel Corporation Device a294 (rev f0)
00:1c.5 PCI bridge: Intel Corporation Device a295 (rev f0)
00:1c.6 PCI bridge: Intel Corporation Device a296 (rev f0)
00:1c.7 PCI bridge: Intel Corporation Device a297 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Device a298 (rev f0)
00:1d.1 PCI bridge: Intel Corporation Device a299 (rev f0)
00:1d.2 PCI bridge: Intel Corporation Device a29a (rev f0)
00:1d.3 PCI bridge: Intel Corporation Device a29b (rev f0)
00:1f.0 ISA bridge: Intel Corporation Device a2c8
00:1f.2 Memory controller: Intel Corporation Device a2a1
00:1f.4 SMBus: Intel Corporation Device a2a3
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connecti
01:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
02:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
03:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
04:00.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:01.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:02.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:03.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:04.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:05.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:06.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:07.0 PCI bridge: ASMedia Technology Inc. Device 1187
06:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
07:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
08:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
09:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
0a:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
0b:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
0c:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
0d:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
0e:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
0f:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
10:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
11:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
12:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
13:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
14:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
15:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)



try:

Code:
sudo nvidia-xconfig -a --cool-bits=12 --allow-empty-initial-configuration
sudo reboot
1713948893
Hero Member
*
Offline Offline

Posts: 1713948893

View Profile Personal Message (Offline)

Ignore
1713948893
Reply with quote  #2

1713948893
Report to moderator
1713948893
Hero Member
*
Offline Offline

Posts: 1713948893

View Profile Personal Message (Offline)

Ignore
1713948893
Reply with quote  #2

1713948893
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713948893
Hero Member
*
Offline Offline

Posts: 1713948893

View Profile Personal Message (Offline)

Ignore
1713948893
Reply with quote  #2

1713948893
Report to moderator
fullzero (OP)
Legendary
*
Offline Offline

Activity: 1260
Merit: 1009



View Profile
October 25, 2017, 08:43:26 PM
 #4962

Vi is the best and powerfull text editor that i 'd never seen. Notepad++ is a beautiful toy for a windowsian, but never, never powerful as vi

:-)

Agreed. Regex is right there. I have been using vim as of late for syntax highlighting but I have to install it myself:

sudu apt-get install vim -y

It would be nice if @fullzero could include it in v. 20.

will do for -1.4
fullzero (OP)
Legendary
*
Offline Offline

Activity: 1260
Merit: 1009



View Profile
October 25, 2017, 08:45:21 PM
 #4963

Hi fullzero, Hi Guys,

(Long time i was in r/o mode. This is my first post.)

What you think about implementing into NVOC nvidia-docker and run miner for each isolated GPU in docker container?
It will be more fail-safe for rig with many GPU`s, more flexible to individual OC and give a way to launch several different miners.

for example: One of my rig have 13 GPU different models => OC settings are different for each others.
If one GPU crash then crash all GPU`s and whatchdog restart miner. In my case restartin 3main takes wery long time, actually wery much time takes running nvidia-settings.
I was forced to set in sleep timeout in wdog equal to 120s otherwise i have loop when wdog restarts 3mian.

p.s. exuse me for poor english  Wink


I would have to look at how well the docker containers interface with the hardware.  I have used docker containers before; but I'm not sure if this is viable; my guess is it is not.
fullzero (OP)
Legendary
*
Offline Offline

Activity: 1260
Merit: 1009



View Profile
October 25, 2017, 08:48:24 PM
 #4964

I an running into an issue mining VTC.  I am running my own pool and for some reason it is not accepting any shares if there is a worker name present.  If it is just the wallet address it works fine.
I cant seem to get nvOC to use no worker name, I have tried to use the wallet as the name and to hash it out completely and it still inputs "." as the name.  Has anyone else ran into this?  
Can you disable worker names somehow?

Thanks

pool probably uses a /workername isntead of a .workername that is used in the 3main implementation.  Open 3main in gedit and

press
Code:
ctrl + f


Code:
VTC 

and alter:

Code:
ADDR="$VTC_ADDRESS.$VTC_WORKER"

or

ADDR="$BTC_ADDRESS.$VTC_WORKER"

to:

Code:
ADDR="$VTC_ADDRESS/$VTC_WORKER"

or

ADDR="$BTC_ADDRESS/$VTC_WORKER"
fullzero (OP)
Legendary
*
Offline Offline

Activity: 1260
Merit: 1009



View Profile
October 25, 2017, 08:50:59 PM
 #4965

Hi, just moved to NVOC, loving it. I had to mess around with what miner the bash file was using as it was not installed. I have a really noob question if someone could help me please.

1. How do I set up supernova pools that require a password and user id an example would be great thanks

2. Can I set up spill over pools?


Many thanks

1: If you haven't already; join the nvOC discord and damNmad has examples there.

2:
I will add auto pool failover + failover pool variables to 1bash for v0020
fullzero (OP)
Legendary
*
Offline Offline

Activity: 1260
Merit: 1009



View Profile
October 25, 2017, 08:51:57 PM
 #4966

Need help:

Rig has been running for like a month no issues, got this https://ibb.co/i85CWR error 2 days ago.. system will restart and go into Linux but won't pass the login screen, inserted minrr1 password, it works, doesn't say password is wrong but won't load it up.. fine, I reinstalled a fresh copy of nvoc on the hard drive and it worked for 24 hours and now I'm getting the same exact message and Won't go pass the login screen

https://ibb.co/i85CWR

see:

https://bitcointalk.org/index.php?topic=1854250.msg23535706#msg23535706
martyroz
Full Member
***
Offline Offline

Activity: 325
Merit: 110


View Profile
October 25, 2017, 10:09:12 PM
 #4967

EDIT: OK, It's working. I had to change to EWBF miner and remove the $ from ZEN miner name Wink

How can I troubleshoot an issue where it seems that one or two GPU's aren't sucking as much power as they should. I have set all 1080ti's to 175W and the single 1070 to 125W; there is one 1080ti running at 149W.

My first instinct is that I'm asking too much of my 1000W PSU Smiley



tomlev5
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
October 25, 2017, 10:10:29 PM
 #4968

fullzero, you mentioned active cooling for 1080tis with 133 CFM 120mm fans.
I’m interested in this solution, but I’m not sure if I will be able to put it together. Can you please share some details.

Do you remove original fans and cover so that only the cooler fins remain?
How do you mount fans?
How do you power the fans?

I leave the shroud (GPU fans and casing) on; although if you were super exacting it would probably be best to remove these; not worth the time IMO.

The way my open frames are (see one of the demo videos); I just zip tie them to the front (blowing into the 1080tis) and power them via a 3x fan to molex adapter / then molex to whatever is available with the rigs PSU.  I get fans from Hawkfish007 https://bitcointalk.org/index.php?action=profile;u=352509; he includes the 3x fan to molex adapters with the fans.
OK, I think I get it now. You use double active cooling: original fans directly on the 1080tis + 120mm 133 CFM fans in the front of 1080tis.
I couldn't see this setup on the demo videos; I watched v0014demo, v0018demo and v0019demo. I think there are no 1080tis in these videos, but I think I get the picture.

Did anybody had to replace a failed original fan on a 1080ti?
I think a good way would be to remove all fans and their casing and somehow mount a high pressure fan (Like Corsair SP120 High Static Pressure 2350 U/min).
Some original fans on 1080tis are of low quality and will probably fail soon.
Does anybody have some experience with these type of modifications?

tomlev5
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
October 25, 2017, 10:26:01 PM
Last edit: October 25, 2017, 10:41:27 PM by tomlev5
 #4969

To prevent automatic updates in the future:

use your favorite editor, open /etc/apt/apt.conf.d/10periodic and change:

Code:
APT::Periodic::Update-Package-Lists "1";

To:

Code:
APT::Periodic::Update-Package-Lists "0";


and /etc/apt/apt.conf.d/20auto-upgrades
Code:
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";

to

Code:
APT::Periodic::Update-Package-Lists "0";
APT::Periodic::Unattended-Upgrade "0";


D

ps: Seriously, why was this ON?

Very nice suggestion Doodkeen,

This is a community driven project and fullzero is doing the best job ever without asking for a dime (unlike other Linux mining distros)
Some times in the middle of the road we miss some small points that are so obvious to others,

So I'm kindly ask every Linux Guru here to do their best to look for improvements and fixes for nvOC
Feel free to PM me or post in forum any suggestion you have.

Thank you all.


Yep, please feel free and provide the valuable feedback, so that we can make nvOC even better.

Coming to the error, no one expected this sort of failure on this scale!! Hope we identify these sort of issues bit earlier.

I think the problem is while Nvidia driver was updating wdog restarted the rig because of low utilization and that cause a miss configured Nvidia driver,
I think solution is to prevent wdog from restarting the rig if dpkg is running.
Papampi, you are absolutely right  Smiley


I had this happen to 2 of my rigs.  I will change all the update settings for -1.4 except security updates.  My bad.  
fullzero, is there some way that I could completely disable all updates on nvOC:
- disable automatic downloads of updates (drivers, system, nvOC)
- disable automatic installation of updates (drivers, system, nvOC)

I have a rig behind a firewall and absolutely no important data on a nvOC rig, so all I care is about stability.
And stability is more fragile because of USB key usage.
The logging is disabled by default, so there would be no writing to the USB key if I would disable updates.

The nvOC is pretty stable even now, but I think that disabling all updates would help with the stability of the system.

Of course I would also periodically update USB key with the whole image of nvOC, but only every month or so, when you publish the new version of stable nvOC.
cryptobadger666
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
October 25, 2017, 11:32:21 PM
 #4970

it seems like most ETH pools require a email field, where you can put a email or passphrase in the miner, then you need to use that on the pool webpage to change payout settings..... is there a way to add this into the 1BASH file, or is there already a default that this os uses in that field?   im beating my head against the wall trying to figure this out.
MentalNomad
Member
**
Offline Offline

Activity: 83
Merit: 10


View Profile
October 25, 2017, 11:47:15 PM
 #4971



How can I troubleshoot an issue where it seems that one or two GPU's aren't sucking as much power as they should. I have set all 1080ti's to 175W and the single 1070 to 125W; there is one 1080ti running at 149W.

My first instinct is that I'm asking too much of my 1000W PSU Smiley





When you set a power number, that's a power limit, not a power setting.

If you're talking about GPU2 drawing less power... is it overclocked differently? And is it really identical hardware?  Sometimes two cards look the same at a glance, but have different details and capabilities. I have a rig with all GTX 1070 cards, but they have different Min Power Limit and Max Power Limit capabilities.
ComputerGenie
Hero Member
*****
Offline Offline

Activity: 1092
Merit: 552


Retired IRCX God


View Profile
October 26, 2017, 01:18:00 AM
 #4972

Or, perhaps that 149W is the result of that Ti protesting being underpowered by 75W  Roll Eyes

If you have to ask "why?", you wouldn`t understand my answer.
Always be on the look out, because you never know when you'll be stalked by hit-men that eat nothing but cream cheese....
martyroz
Full Member
***
Offline Offline

Activity: 325
Merit: 110


View Profile
October 26, 2017, 01:33:24 AM
 #4973


When you set a power number, that's a power limit, not a power setting.

If you're talking about GPU2 drawing less power... is it overclocked differently? And is it really identical hardware?  Sometimes two cards look the same at a glance, but have different details and capabilities. I have a rig with all GTX 1070 cards, but they have different Min Power Limit and Max Power Limit capabilities.

Thanks for the response. So the question then is why is GPU2 not drawing more power? a 1080ti at 49°C / 590 sols

They are not identical hardware. Due to various sales etc I have plugged in;

2 * ASUS 1080ti ROG STRIX
1 * EVGA 1080ti SC Black
1 * MSI 1080ti Duke OC
1 * Gigabyte 1070 (GPU1)

All that is certain is that GPU1 is the 1070.
I applied a modest OC of +50/+200 across the range.
Longsnowsm
Hero Member
*****
Offline Offline

Activity: 868
Merit: 517


View Profile
October 26, 2017, 01:35:07 AM
 #4974

I have an older image of nvOC, and the machine is complaining it is running out of space on the USB.  I looked and see a bunch of stuff in the /var/tmp, but do not have the perms apparently to clean that stuff up.  What is the process for cleaning up the drive so that it doesn't run out of space?  Thanks.

open guake and enter:

Code:
sudo apt-get update

sudo apt-get autoremove

sudo apt-get autoclean

should free up some space by removing unused system files.

Thank you Fullzero.  I will give that a try!  Smiley
ComputerGenie
Hero Member
*****
Offline Offline

Activity: 1092
Merit: 552


Retired IRCX God


View Profile
October 26, 2017, 03:40:50 AM
 #4975

....
I applied a modest OC of +50/+200 across the range.
At the risk of sounding like an ass...
You limited the power of the card by -75W, overclocked it by +50, and you are surprised by a lowered output result  Huh

If you have to ask "why?", you wouldn`t understand my answer.
Always be on the look out, because you never know when you'll be stalked by hit-men that eat nothing but cream cheese....
martyroz
Full Member
***
Offline Offline

Activity: 325
Merit: 110


View Profile
October 26, 2017, 03:53:16 AM
 #4976

At the risk of sounding like an ass...
You limited the power of the card by -75W, overclocked it by +50, and you are surprised by a lowered output result  Huh

I have done the above for many other 1080ti's and this is the first one performing poorly (under 600 sols)
My windows rig runs 1080ti's at 175W / +75core / +600mem and is rock solid at 680 sols.
580 sols is a clear anomaly amoungst 1080tis across 4 manufacturers.
ComputerGenie
Hero Member
*****
Offline Offline

Activity: 1092
Merit: 552


Retired IRCX God


View Profile
October 26, 2017, 04:03:49 AM
 #4977

At the risk of sounding like an ass...
You limited the power of the card by -75W, overclocked it by +50, and you are surprised by a lowered output result  Huh

I have done the above for many other 1080ti's and this is the first one performing poorly (under 600 sols)
My windows rig runs 1080ti's at 175W / +75core / +600mem and is rock solid at 680 sols.
580 sols is a clear anomaly amoungst 1080tis across 4 manufacturers.
If you're going to undervolt and overclock, add 1 card at a time and find out what works best for that card; and then do so with each card (one at a time). Don't be afraid to lose 1 day doing benchmarks vs losing $800 because BrandX ModelY doesn't work with SettingW. You'll thank yourself in the long run. Wink

If you have to ask "why?", you wouldn`t understand my answer.
Always be on the look out, because you never know when you'll be stalked by hit-men that eat nothing but cream cheese....
martyroz
Full Member
***
Offline Offline

Activity: 325
Merit: 110


View Profile
October 26, 2017, 04:09:31 AM
 #4978

If you're going to undervolt and overclock, add 1 card at a time and find out what works best for that card; and then do so with each card (one at a time). Don't be afraid to lose 1 day doing benchmarks vs losing $800 because BrandX ModelY doesn't work with SettingW. You'll thank yourself in the long run. Wink

Thanks for the comment.
To be fair, that screenshot was taken with no OC applied. I didn't expect a 1080ti to behave like that by just limiting power to 175W.
I will test tonight.
woodl1
Newbie
*
Offline Offline

Activity: 15
Merit: 0


View Profile
October 26, 2017, 08:36:53 AM
 #4979


Max GPUs i have in one rig is 13. In case this doesn't help we need to see nvidia-xconfig output and xorg.conf that it makes, so we can try to find what's wrong.

Also you can try to reflash the latest version of nvOC to usb stick, edit 1bash at windows partition (don't forget to enable headless mode) and make the first boot from usb stick without monitor attached. After the first boot in headless mode you need to wait for autoconfiguration and auto reboot. Then use network scanner or see DHCP leases on the server side to find the IP of your rig (in most cases it will be the same as in previous attempts), ssh to it and see if xorg was correctly configured.


Thank you very much for your help!
My rig is now running with all those settings: New USB, Headless, no monitor attached, know IP, have SSH and mining. Here's the weird thing: See below for my current xorg.conf that is running and I'm getting all 19 GPUs mining but only the 1st one is OC'd. What's weird about it is that:

When I run
Code:
sudo nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration
I get a new xorg.conf that does two things:
Bus IDs lose the "@" symbol, and mining does not work.

Any ideas why some BusIDs use the @ while most I've seen don't, and why this seems to be allowing mining to run?
Also weird is that if you notice, there are only BusIDs for 0-3 (4 and 5 are missing), then it continues from 6-11 and that's it.

What are your thoughts on this?

Current xorg.conf:
Code:
Section "ServerLayout"
    Identifier "layout"
    Screen 0 "nvidia"
    Inactive "intel"
EndSection

Section "Device"
    Identifier "intel"
    Driver "modesetting"
    BusID "PCI:0@0:2:0"
    Option "AccelMethod" "None"
EndSection

Section "Screen"
    Identifier "intel"
    Device "intel"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:1@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:2@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:3@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:6@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:7@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:8@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:9@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:10@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:11@0:0:0"
    Option "ConstrainCursor" "off"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option "IgnoreDisplayDevices" "CRT"
EndSection



current lspci:
Code:
00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM
00:01.0 PCI bridge: Intel Corporation Sky Lake PCIe Controller (
00:02.0 VGA compatible controller: Intel Corporation Device 1902
00:14.0 USB controller: Intel Corporation Device a2af
00:16.0 Communication controller: Intel Corporation Device a2ba
00:1b.0 PCI bridge: Intel Corporation Device a2eb (rev f0)
00:1b.5 PCI bridge: Intel Corporation Device a2ec (rev f0)
00:1b.6 PCI bridge: Intel Corporation Device a2ed (rev f0)
00:1b.7 PCI bridge: Intel Corporation Device a2ee (rev f0)
00:1c.0 PCI bridge: Intel Corporation Device a294 (rev f0)
00:1c.5 PCI bridge: Intel Corporation Device a295 (rev f0)
00:1c.6 PCI bridge: Intel Corporation Device a296 (rev f0)
00:1c.7 PCI bridge: Intel Corporation Device a297 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Device a298 (rev f0)
00:1d.1 PCI bridge: Intel Corporation Device a299 (rev f0)
00:1d.2 PCI bridge: Intel Corporation Device a29a (rev f0)
00:1d.3 PCI bridge: Intel Corporation Device a29b (rev f0)
00:1f.0 ISA bridge: Intel Corporation Device a2c8
00:1f.2 Memory controller: Intel Corporation Device a2a1
00:1f.4 SMBus: Intel Corporation Device a2a3
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connecti
01:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
02:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
03:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
04:00.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:01.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:02.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:03.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:04.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:05.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:06.0 PCI bridge: ASMedia Technology Inc. Device 1187
05:07.0 PCI bridge: ASMedia Technology Inc. Device 1187
06:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
07:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
08:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
09:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
0a:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
0b:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
0c:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
0d:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
0e:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
0f:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
10:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
11:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
12:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
13:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
14:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)
15:00.0 3D controller: NVIDIA Corporation Device 1c07 (rev a1)



as for BUS IDs naming, I don't have ideas why your rig using @, but if it works - let it be so Smiley The thing with 4-5 pci bus skipped is because other devices are using these buses, it's normal (you can see it with lspci - "04:00.0 PCI bridge: ASMedia Technology Inc. Device 1187" and so on).

What I want to point in your current xorg.conf is that there is no Coolbits option in each Screen section. This option has to be set to overclock GPU, I suppose, so try out what fullzero suggested:
Code:
sudo nvidia-xconfig -a --cool-bits=12 --allow-empty-initial-configuration
sudo reboot

and if it doesn't help, try to add this option to each Screen section in your xorg.conf except the screen section with intel device.
Code:
Option         "Coolbits" "12"

Here's xorg.conf from one of my working rigs, it got 7 GPUs, all P106-100.
Code:
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 384.59  (buildmeister@swio-display-x64-rhel04-01)  Thu Jul 20 01:03:28 PDT 2017

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    Screen      1  "Screen1" RightOf "Screen0"
    Screen      2  "Screen2" RightOf "Screen1"
    Screen      3  "Screen3" RightOf "Screen2"
    Screen      4  "Screen4" RightOf "Screen3"
    Screen      5  "Screen5" RightOf "Screen4"
    Screen      6  "Screen6" RightOf "Screen5"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Monitor"
    Identifier     "Monitor1"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Monitor"
    Identifier     "Monitor2"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Monitor"
    Identifier     "Monitor3"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Monitor"
    Identifier     "Monitor4"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Monitor"
    Identifier     "Monitor5"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Monitor"
    Identifier     "Monitor6"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "P106-100"
    BusID          "PCI:1:0:0"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "P106-100"
    BusID          "PCI:2:0:0"
EndSection

Section "Device"
    Identifier     "Device2"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "P106-100"
    BusID          "PCI:4:0:0"
EndSection

Section "Device"
    Identifier     "Device3"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "P106-100"
    BusID          "PCI:5:0:0"
EndSection

Section "Device"
    Identifier     "Device4"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "P106-100"
    BusID          "PCI:6:0:0"
EndSection

Section "Device"
    Identifier     "Device5"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "P106-100"
    BusID          "PCI:7:0:0"
EndSection

Section "Device"
    Identifier     "Device6"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "P106-100"
    BusID          "PCI:9:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "28"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen1"
    Device         "Device1"
    Monitor        "Monitor1"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "28"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen2"
    Device         "Device2"
    Monitor        "Monitor2"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "28"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen3"
    Device         "Device3"
    Monitor        "Monitor3"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "28"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen4"
    Device         "Device4"
    Monitor        "Monitor4"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "28"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen5"
    Device         "Device5"
    Monitor        "Monitor5"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "28"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen6"
    Device         "Device6"
    Monitor        "Monitor6"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "28"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

martyroz
Full Member
***
Offline Offline

Activity: 325
Merit: 110


View Profile
October 26, 2017, 09:08:57 AM
 #4980

How can I troubleshoot an issue where it seems that one or two GPU's aren't sucking as much power as they should. I have set all 1080ti's to 175W and the single 1070 to 125W; there is one 1080ti running at 149W.

OK. It looks like nvOC thinks that a 'MSI 1080ti DUKE OC' has a TDP of 280W. So when I set power limit to 175W, it factors that this is 62.5% of TDP. But TDP for this card is actually 250W and 62.5% of that is 156W. Hence seeing 149W.

So as a workaround, I set power limit to 200W (71.4% of TDP) and it will now limit it to 178W or so - which is what I want.

Pages: « 1 ... 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 [249] 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 ... 417 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!