Bitcoin Forum
April 26, 2024, 06:14:40 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Ubuntu 11.04, Dual 5850 issues  (Read 3350 times)
area (OP)
Full Member
***
Offline Offline

Activity: 177
Merit: 100


View Profile
July 08, 2011, 07:40:06 AM
Last edit: July 08, 2011, 09:07:46 AM by area
 #1

I'm trying to get dual 5850s up and running on Ubuntu 11.04 and having a great deal of trouble. As soon as I run

Code:
 sudo aticonfig --initial -f --adapter=all
towards the end of a guide such as this one and restart, I am no longer able to boot. Using an xorg.conf of my own, I am able to boot, and I even see both cards when I run

Code:
aticonfig --list-adapters
* 0. 01:00.0 ATI Radeon HD 5800 Series
  1. 02:00.0 ATI Radeon HD 5800 Series

* - Default adapter
However, any other command using aticonfig seems to balk at the second card:

Code:
aticonfig --odgc --adapter=all

Adapter 0 - ATI Radeon HD 5800 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    157           300
             Current Peak :    725           1000
  Configurable Peak Range : [550-775]     [900-1125]
                 GPU load :    0%
ERROR - Get clocks failed for Adapter 1 - ATI Radeon HD 5800 Series

poclbm only sees one card that it is able to use too. Both cards are seen correctly in lscpi:

Code:
01:00.0 VGA compatible controller: ATI Technologies Inc Cypress [Radeon HD 5800 Series]
01:00.1 Audio device: ATI Technologies Inc Cypress HDMI Audio [Radeon HD 5800 Series]
02:00.0 VGA compatible controller: ATI Technologies Inc Cypress [Radeon HD 5800 Series]
02:00.1 Audio device: ATI Technologies Inc Cypress HDMI Audio [Radeon HD 5800 Series]
At this point, I'm out of ideas. Any suggestions?

EDIT: Looking at Xorg.0.log I find the following:

Code:
[     8.579] (==) Using config file: "/etc/X11/xorg.conf"
....
[     9.590] (WW) fglrx: No matching Device section for instance (BusID PCI:0@2:0:0) found

Which seems like a likely culprit for the problem - though as far as I can can tell, I do have such a device section in my xorg.conf (linked above).
1714112081
Hero Member
*
Offline Offline

Posts: 1714112081

View Profile Personal Message (Offline)

Ignore
1714112081
Reply with quote  #2

1714112081
Report to moderator
1714112081
Hero Member
*
Offline Offline

Posts: 1714112081

View Profile Personal Message (Offline)

Ignore
1714112081
Reply with quote  #2

1714112081
Report to moderator
"You Asked For Change, We Gave You Coins" -- casascius
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714112081
Hero Member
*
Offline Offline

Posts: 1714112081

View Profile Personal Message (Offline)

Ignore
1714112081
Reply with quote  #2

1714112081
Report to moderator
swivel
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
July 08, 2011, 09:15:01 AM
 #2

You should have two screen sections one for each card. See my xorg.conf. Change the PCI bus ID for the second card. Mine is on PCI:3:0:0 yours is PCI:2:0:0.
area (OP)
Full Member
***
Offline Offline

Activity: 177
Merit: 100


View Profile
July 08, 2011, 09:32:09 AM
 #3

Adding a screen section for my second card did not have an effect. Introducing the 'Monitor' sections cause my freeze on boot issue to return.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1002



View Profile
July 08, 2011, 11:20:43 AM
 #4

Don't forget after adding a second screen section for the other card to add a line to your Server Layout section too.
area (OP)
Full Member
***
Offline Offline

Activity: 177
Merit: 100


View Profile
July 08, 2011, 11:40:47 AM
 #5

Both with and without the additional line, I get the freeze on boot (no display, no response to ping) - this is the case even if I just use swivel's xorg.conf (after changing the PCI bus ID).
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1002



View Profile
July 08, 2011, 12:01:04 PM
 #6

Both with and without the additional line, I get the freeze on boot (no display, no response to ping) - this is the case even if I just use swivel's xorg.conf (after changing the PCI bus ID).

If you are able to access the filesystem then perhaps remove gdm from the boot process and see if you can at least get to the console.  For 11.04 I think you need to comment out

Code:
start on (filesystem
          and started dbus
          and (drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1
               or stopped udev-fallback-graphics))

to do this.
area (OP)
Full Member
***
Offline Offline

Activity: 177
Merit: 100


View Profile
July 08, 2011, 05:25:50 PM
 #7

That indeed lets me boot, and allows me to SSH in. I still see both cards from

Code:
sudo aticonfig --lsa
* 0. 01:00.0 ATI Radeon HD 5800 Series
  1. 02:00.0 ATI Radeon HD 5800 Series

But I now get

Code:
sudo aticonfig --odgc --adapter=all
ERROR - X needs to be running to perform ATI Overdrive(TM) commands

Which is fair enough. Attempting to start X as my user informs me that I am not authorized, and if I sudo X, I get

Code:
(WW) fglrx: No matching Device section for instance (BusID PCI:0@1:0:1) found
(WW) fglrx: No matching Device section for instance (BusID PCI:0@2:0:1) found

which in itself is suspicious, as there are appropriate sections for both in the xorg.conf it claims to be using. After printing the above lines, it hangs.

EDIT: Jumped the gun there, those are the HDMI audio addresses, which there are not sections for - which makes sense.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1002



View Profile
July 08, 2011, 06:15:55 PM
Last edit: July 08, 2011, 06:37:02 PM by teukon
 #8

That indeed lets me boot, and allows me to SSH in. I still see both cards from

Code:
sudo aticonfig --lsa
* 0. 01:00.0 ATI Radeon HD 5800 Series
  1. 02:00.0 ATI Radeon HD 5800 Series

But I now get

Code:
sudo aticonfig --odgc --adapter=all
ERROR - X needs to be running to perform ATI Overdrive(TM) commands

Which is fair enough. Attempting to start X as my user informs me that I am not authorized, and if I sudo X, I get

Code:
(WW) fglrx: No matching Device section for instance (BusID PCI:0@1:0:1) found
(WW) fglrx: No matching Device section for instance (BusID PCI:0@2:0:1) found

which in itself is suspicious, as there are appropriate sections for both in the xorg.conf it claims to be using. After printing the above lines, it hangs.

EDIT: Jumped the gun there, those are the HDMI audio addresses, which there are not sections for - which makes sense.

Forgive me if you find the following patronising but I'm unsure of your comfort with the Linux command line and have erred on the side of caution.

After logging in by ssh you'll need to start some basic form of X to use aticonfig.  I'd suggest running

Code:
xinit &
The "&" is to run the process in the background.  You may need to press enter once to have your prompt redrawn.

If you don't have an ".xinitrc" in your home directory this should just start a basic X session and spawn a single xterm window.  I personally include a ".xinitrc" file containing the single line "cat" to prevent the xterm from spawning.

You probably lack the authority to run X because you are accessing the system remotely.  Try running

Code:
sudo dpkg-reconfigure x11-common

and select 'Anybody' from the resulting list (I think the default in Ubuntu is 'Console Users Only').  Now try xinit again.  Remember to close any previous invocations of xinit.  To do this use

Code:
jobs

to list the background processes and

Code:
kill %n

to close background process [n].

cicada
Full Member
***
Offline Offline

Activity: 196
Merit: 100


View Profile
July 08, 2011, 07:26:12 PM
 #9

Didn't see it here, but don't Crossfire the cards;  remove the jumper if you've got it attached, and ensure it's not enabled in the software.  The aticonfig commands to check this escape me at the moment.

When Crossfired, it may be trying to treat your cards as a single interface, and denying direct requests to the second card.

I don't have a whole lot of experience in this area, but I do know it caused some similar weirdness when I tried enabling it on  my pair of 5830's.

Team Epic!

All your bitcoin are belong to 19mScWkZxACv215AN1wosNNQ54pCQi3iB7
area (OP)
Full Member
***
Offline Offline

Activity: 177
Merit: 100


View Profile
July 08, 2011, 10:10:12 PM
 #10

No need to apologise about maybe being patronising. It's definitely better to err on the side of caution in such situations. In particular making it so that X could be run by any user is not something I'd have come across quickly, so thanks for that.

Nevertheless, it doesn't seem to change much. The complete output:

Code:
user@obelix:~$ xinit &
[1] 1550
user@obelix:~$
X.Org X Server 1.10.1
Release Date: 2011-04-15
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.24-29-server x86_64 Ubuntu
Current Operating System: Linux obelix 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-2.6.38-8-generic root=UUID=f7663715-218e-48c3-bc97-3803617636c3 ro quiet splash nomodeset vt.handoff=7
Build Date: 19 April 2011  03:40:45PM
xorg-server 2:1.10.1-1ubuntu1 (For technical support please see http://www.ubuntu.com/support)
Current version of pixman: 0.20.2
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Fri Jul  8 23:05:49 2011
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
(WW) fglrx: No matching Device section for instance (BusID PCI:0@1:0:1) found
(WW) fglrx: No matching Device section for instance (BusID PCI:0@2:0:1) found

at which point I lose my SSH connection, no response to ping, and I have to reset. I do not have the cards connected via the Crossfire bridge, or chained together in software.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1002



View Profile
July 08, 2011, 10:52:28 PM
 #11

No need to apologise about maybe being patronising. It's definitely better to err on the side of caution in such situations. In particular making it so that X could be run by any user is not something I'd have come across quickly, so thanks for that.

Nevertheless, it doesn't seem to change much. The complete output:

Code:
user@obelix:~$ xinit &
[1] 1550
user@obelix:~$
X.Org X Server 1.10.1
Release Date: 2011-04-15
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.24-29-server x86_64 Ubuntu
Current Operating System: Linux obelix 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-2.6.38-8-generic root=UUID=f7663715-218e-48c3-bc97-3803617636c3 ro quiet splash nomodeset vt.handoff=7
Build Date: 19 April 2011  03:40:45PM
xorg-server 2:1.10.1-1ubuntu1 (For technical support please see http://www.ubuntu.com/support)
Current version of pixman: 0.20.2
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Fri Jul  8 23:05:49 2011
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
(WW) fglrx: No matching Device section for instance (BusID PCI:0@1:0:1) found
(WW) fglrx: No matching Device section for instance (BusID PCI:0@2:0:1) found

at which point I lose my SSH connection, no response to ping, and I have to reset. I do not have the cards connected via the Crossfire bridge, or chained together in software.

Hmm...  It sounds like the fglrx driver is crashing almost as soon as you start X.  Perhaps try reinstalling the drivers.

Code:
chmod +x ati-driver-installer-11-6-x86.x86_64.run
sudo ./ati-driver-installer-11-6-x86.x86_64.run

I would select

Code:
2) Generate Distribution Specific Driver Package

and generate packages designed for Ubuntu (say no to Fedora and Suse package options).

Then uninstall the existing drivers and install these newly generated ones.
Code:
sudo dpkg -r fglrx fglrx-amdcccle fglrx-dev
sudo dpkg -i fglrx_8.861-0ubuntu1_amd64.deb fglrx-amdcccle_8.861-0ubuntu1_amd64.deb fglrx-dev_8.861-0ubuntu1_amd64.deb
If you installed the drivers last time directly rather than by generating packages then you'll have to run "sudo ./ati-driver-installer-11-6-x86.x86_64.run --uninstall" (or the appropriate installer for the version you have) and hope that it cleans up properly.  You may have to hunt through /lib/modules and remove old "fglrx.ko" files.

Then reboot, log in, and try xinit again.  If you're not sure about your xorg.conf file then back it up and run

Code:
sudo aticonfig --initial

but judging by the output (which looks perfectly healthy to me) I don't think xorg.conf is the problem.

You might consider installing an older version of Catalyst instead of 11.6.  Versions before 11.6 respect BIOS limits so if one of your cards has had ridiculous clock frequencies commited then this is an option.
  
Rob P.
Member
**
Offline Offline

Activity: 84
Merit: 10


View Profile WWW
July 09, 2011, 12:35:00 AM
 #12

Wow...

Okay, some basics for folks:

1)  If you are not logged in, but have a graphical login being displayed, then X is already running.  DO NOT start it again.
2)  If you are not logged in, then the "root" user is currently running X.
3)  If you ARE logged in as a user, then THAT USER is running X, not root.

So, for case #2 above, for all commands, you'll have to run them with "sudo", because you have to run them as root (hence the above error):

Code:
DISPLAY=:0 sudo aticonfig --odgc --adapter=all

Also, you need the "DISPLAY=:0" because over SSH you don't have a valid DISPLAY variable, so you need to tell the commands where to find X.

For case #3 above (assuming you are SSHing into the box as the same user logged in), you cannot run them as "root" (via "sudo") because X is already running as another user, so instead you use:

Code:
DISPLAY=:0 aticonfig --odgc --adapter=all

This logic needs to be applied to ANY aticonfig command.

To avoid this confusion, I set my Ubuntu boxes to auto-login as a user, I then use that same account to SSH in, and that way I never have to run commands as root.

Also, any time you add or delete a card, you MUST run (add sudo if you need to):

Code:
aticonfig -f --initial --adapter=all

In order to get a valid xorg.conf file built with the update.

Finally, make sure OpenCL is seeing all of your cards:

Code:
# This assumes 64-bit, change to 32-bit if needed
cd <path to>/AMD-APP-SDK-v2.4-lnx64/bin/x86_64
export AMDAPPSDKROOT=<path to>AMD-APP-SDK-v2.4-lnx64/
export AMDAPPSDKSAMPLESROOT=<path to>/AMD-APP-SDK-v2.4-lnx64/
export LD_LIBRARY_PATH=${AMDAPPSDKROOT}lib/x86_64:${LD_LIBRARY_PATH}
./clinfo | grep TYPE_GPU (you should see one line for each GPU installed)

If you don't see one line for each card, then the problem may be with OpenCL.

--

If you like what I've written here, consider tipping the messenger:
1GZu4CtHa6ai8iWoWiVFxV5VVoNte4SkoG

If you don't like what I've written, send me a Tip and I'll stop talking.
area (OP)
Full Member
***
Offline Offline

Activity: 177
Merit: 100


View Profile
July 09, 2011, 07:24:51 AM
Last edit: July 09, 2011, 07:44:40 AM by area
 #13

Hmm...  It sounds like the fglrx driver is crashing almost as soon as you start X.  Perhaps try reinstalling the drivers.

...

You might consider installing an older version of Catalyst instead of 11.6.  Versions before 11.6 respect BIOS limits so if one of your cards has had ridiculous clock frequencies commited then this is an option.

Identical behaviour with a regenerated Catalyst 11.6, as well as Catalyst 11.5 (though the cards are new, so they shouldn't have any weird clocking set).

Also, you need the "DISPLAY=:0" because over SSH you don't have a valid DISPLAY variable, so you need to tell the commands where to find X.

This is in my .bashrc already; either way I'm fairly sure the problem is occurring before having it set would be relevant.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1002



View Profile
July 09, 2011, 08:58:35 AM
 #14

Hmm...  It sounds like the fglrx driver is crashing almost as soon as you start X.  Perhaps try reinstalling the drivers.

...

You might consider installing an older version of Catalyst instead of 11.6.  Versions before 11.6 respect BIOS limits so if one of your cards has had ridiculous clock frequencies commited then this is an option.

Identical behaviour with a regenerated Catalyst 11.6, as well as Catalyst 11.5 (though the cards are new, so they shouldn't have any weird clocking set).

Also, you need the "DISPLAY=:0" because over SSH you don't have a valid DISPLAY variable, so you need to tell the commands where to find X.

This is in my .bashrc already; either way I'm fairly sure the problem is occurring before having it set would be relevant.

This problem is defeating me too.  There are not many things in my experince which will crash Linux but this appears to be happening immediately whenever you try running X.

Without more information I still think the drivers are causing the crash.  Please post a list of your running processes here using

Code:
ps aux

Have you tried removing all traces of ATI's drivers from your system and starting a dummy version of X?  We're not going for a usable X environment at the moment, just for Linux to load X and not crash horribly.

xorg.conf
Code:
Section "InputDevice"
        Identifier      "Null Mouse"
        Driver          "void"
EndSection

Section "InputDevice"
        Identifier      "Null Keyboard"
        Driver          "void"
EndSection

Section "Device"
        Identifier      "Dummy Device"
        Driver          "dummy"
EndSection

Section "Monitor"
       Identifier      "Dummy Monitor"
EndSection

Section "Screen"
        Identifier      "Screen"
        Device          "Dummy Device"
        Monitor         "Dummy Monitor"
EndSection
You'd need the packages xserver-xorg-input-void and xserver-xorg-video-dummy for this.

At least there would be no mention of fglrx (and "lsmod | grep fglrx" should return nothing at all times) so if the system still hangs then we know something else is causing it.  I know these methods are rather clumsy, I would naturally try many things before taking such steps, but diagnosing a problem on a computer using a forum like this is a different game altogether.  If this doesn't work then I suggest reinstalling X.  If this fails then flatten the drive and install Ubuntu fresh (although I think plain Debian is more suitable here let's not start throwing more variables into the pot).

Don't worry about SDK for now, let's just try to stop Linux from crashing the moment you start X up.

Oh, and just to be crystal clear, by "log in" I of course mean via SSH.  There should certainly be no gdm or other graphical login manager at all.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1002



View Profile
July 09, 2011, 09:09:34 AM
 #15

Sorry, you may want to largely ignore my last post for now.  I was just reading over your inital post and you claimed that you could start X up with your own xorg.conf.  Please try this first before trying to purge the system of fglrx.

If you can get to a stage where you are able to run

Code:
xinit &

and have X running in Linux without crashes then we're good and can work from there.

Probably the most basic xorg.conf which should initialise both cards is:

Code:
Section "Device"
        Identifier      "Card 0"
        Driver          "fglrx"
        BusID           "PCI:1:0:0"
EndSection

Section "Device"
        Identifier      "Card 1"
        Driver          "fglrx"
        BusID           "PCI:2:0:0"
EndSection

Section "Monitor"
        Identifier      "Dummy Monitor"
EndSection

Section "Screen"
        Identifier      "Screen 0"
        Device          "Card 0"
        Monitor         "Dummy Monitor"
EndSection

Section "Screen"
        Identifier      "Screen 1"
        Device          "Card 1"
        Monitor         "Dummy Monitor"
EndSection

Section "ServerLayout"
        Identifier      "Server Layout"
        Screen          "Screen 0"
        Screen          "Screen 1"
EndSection
You shouldn't need any special xserver-xorg packages, just Catalyst.

Try to isolate a small simple change to xorg.conf which will cause Linux to crash and look at the log if possible.

Make sure Linux really is crashing if possible.  Connect a keyboard and monitor if you have them and run "xinit &" on the machine using the console.  If Ctrl+Alt+F2 does not take you to a second terminal (enabled by default in Ubuntu) then things really are bad.  If Ctrl+Alt+F2 does give you a responsive console then login and look at "top" and "ps aux" for clues.  Try killing all "xinit" and see if responsiveness returns to the first terminal (Ctrl+Alt+F1).
Zagitta
Full Member
***
Offline Offline

Activity: 302
Merit: 100


Presale is live!


View Profile
July 09, 2011, 08:17:22 PM
 #16

--snip--

1)  If you are not logged in, but have a graphical login being displayed, then X is already running.  DO NOT start it again.
2)  If you are not logged in, then the "root" user is currently running X.
3)  If you ARE logged in as a user, then THAT USER is running X, not root.

--snip--

I'm pretty sure which user X is running as is irrelevant if you just run
Code:
xhost +
at least that works perfectly for me when having to access aticonfig from PHP Smiley

area (OP)
Full Member
***
Offline Offline

Activity: 177
Merit: 100


View Profile
July 09, 2011, 11:43:54 PM
 #17

The system really does lock properly - Ctrl+Alt+F2 has no effect.

Removing the line

Code:
        Screen          "Screen 1"

in your 'bare-bones' xorg.conf is the difference between hanging and not, so it really is that second card. Upon further investigation, I think that second card has died at some point during this process - if I only have that physical card in slot 1 present, the system refuses to POST. It boots fine with just the first card in slot one. I might tinker a little more later, but I suspect it's RMA time.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1002



View Profile
July 10, 2011, 12:11:51 AM
 #18

The system really does lock properly - Ctrl+Alt+F2 has no effect.

Removing the line

Code:
        Screen          "Screen 1"

in your 'bare-bones' xorg.conf is the difference between hanging and not, so it really is that second card. Upon further investigation, I think that second card has died at some point during this process - if I only have that physical card in slot 1 present, the system refuses to POST. It boots fine with just the first card in slot one. I might tinker a little more later, but I suspect it's RMA time.

Ah, a faulty card!  If only my knowledge of Linux were sufficient to help you fix that; alack.
being
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
September 11, 2011, 11:43:48 AM
 #19

Hey.


I have two cards and one monitor. I want to be "stuck" on one screen only (so my mouse cursor wouldn't slide on the other screen), so I made two different xorg.conf files and made one initialize one card and second the second card. I used gdm for my desktop and xinit -config xorg2.conf -- :1 to "activate" the second card.

But the problem is, that when I run poclbm.py it hangs till I ALT+CTRL+F9 to activate the second cards screen :1
So basically I'm only able to use the main desktop card with this setup.


Any advice to make it function?


Thanks.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1002



View Profile
September 11, 2011, 12:33:31 PM
 #20

Hey.


I have two cards and one monitor. I want to be "stuck" on one screen only (so my mouse cursor wouldn't slide on the other screen), so I made two different xorg.conf files and made one initialize one card and second the second card. I used gdm for my desktop and xinit -config xorg2.conf -- :1 to "activate" the second card.

But the problem is, that when I run poclbm.py it hangs till I ALT+CTRL+F9 to activate the second cards screen :1
So basically I'm only able to use the main desktop card with this setup.


Any advice to make it function?


Thanks.

Have you tried linking the first card to your monitor and the second to a dummy monitor with a single xorg.conf file?  Scroll up to find my example of pointing my cards to a dummy monitor and compare the file with the one generated by
Code:
aticonfig --initial
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!