Cray-1
Newbie
Offline
Activity: 36
Merit: 0
|
|
June 22, 2012, 05:15:32 AM |
|
I am having trouble with one of my rigs. Approximately once per 24 hours it stops mining and displays the “Red Screen” stating "A fault has been found in gpu x" for all five GPU’s.
I am running a msi 890fxa-gd70 with 5 Saphire 5830’s and a Cooler Master 1200w under BAMT v0.5c. I have 3 other near identical rigs that do not seem to be having any problems. The rig will be running fine and then just stop mining and give me the Red Screen. I can determine no pattern as to when it stops working. I have been having this problem for a while so the first thing I did was stop all overclocking. This did not help.
Here are some clues, any help would be appreciated.
1. While in the red screen I went to BAMT control and clicked on Stop Mining (this seemed to act normal). Then I clicked on Start Mining and the start screen seemed to be acting normal. When I went to GPU status screen it displayed all 5 cards producing 0 M/H. 2. While in the red screen I went to a windows computer on the same network and ran Putty. Putty was not able to communicate with the problem rig.
I have switched out the BAMT usb drive – does not help. I am not overclocking. It must be hardware (motherboard?) or the power supply, I guess. The bios on the mobo has been updated to the latest approved version. What tests can I run to narrow this down? Thanks
When it "goes red" do you still have access to the internet? For example, can you open Iceweasel and go to Google? I was having this issue too, turned out to be this strange "lost the DHCP lease" issue that has been going around but hard to pin down. I put an old Intel Pro/100 card in and the issue has left that rig. Next time it fails I will test the internet. Thanks for the advice!!
|
|
|
|
lodcrappo (OP)
|
|
June 23, 2012, 09:01:32 AM |
|
I am having trouble with one of my rigs. Approximately once per 24 hours it stops mining and displays the “Red Screen” stating "A fault has been found in gpu x" for all five GPU’s.
I am running a msi 890fxa-gd70 with 5 Saphire 5830’s and a Cooler Master 1200w under BAMT v0.5c. I have 3 other near identical rigs that do not seem to be having any problems. The rig will be running fine and then just stop mining and give me the Red Screen. I can determine no pattern as to when it stops working. I have been having this problem for a while so the first thing I did was stop all overclocking. This did not help.
Here are some clues, any help would be appreciated.
1. While in the red screen I went to BAMT control and clicked on Stop Mining (this seemed to act normal). Then I clicked on Start Mining and the start screen seemed to be acting normal. When I went to GPU status screen it displayed all 5 cards producing 0 M/H. 2. While in the red screen I went to a windows computer on the same network and ran Putty. Putty was not able to communicate with the problem rig.
I have switched out the BAMT usb drive – does not help. I am not overclocking. It must be hardware (motherboard?) or the power supply, I guess. The bios on the mobo has been updated to the latest approved version. What tests can I run to narrow this down? Thanks
One bad GPU can lock up the bus and prevent the rest from mining. Remove or swap between another machine until the problem goes away or moves to the other machine.
|
|
|
|
lodcrappo (OP)
|
|
June 23, 2012, 09:11:57 AM |
|
I think Lod said it was funded. Im waiting to hear from him I got a Daimond HD7970 I was gonna send him for 80BTC (well 72BTC now since price went up). Unless some one promised to pay and has not come thru yet.
We have enough btc donated, just been busy. Amazon has em for $450 plus a lame $20 rebate that I'll never get around to submitting, seems to be the price pretty much everywhere.. http://www.amazon.com/dp/B006UACSZ4/ref=asc_df_B006UACSZ42067048?smid=ATVPDKIKX0DER&tag=hyprod-2072 seems high, but maybe btc moved since that estimate. I'd rather keep it in btc so lemme know if you can match amazons price or close, and if so we'll do this thing.
|
|
|
|
Joshwaa
|
|
June 23, 2012, 12:43:07 PM |
|
If you want the Diamond HD7970 that I have that has been in a machine for a week I will give it to you for 67.5BTC If you want a new in box yea best I can do is 72BTC.
|
|
|
|
lodcrappo (OP)
|
|
June 24, 2012, 12:33:32 AM |
|
If you want the Diamond HD7970 that I have that has been in a machine for a week I will give it to you for 67.5BTC If you want a new in box yea best I can do is 72BTC.
So here's a question.. i was thinking to myself: well used is fine so long as it has a warranty. so then i'm thinking "how do you submit a warranty claim with no receipt". but then I realized it would be the same whether I the new one or the used one.. Anyway, I'll buy the used one from you if its got a warranty that I can use in the event it dies.. not sure if that is even possible tho.
|
|
|
|
nolo200
Member
Offline
Activity: 98
Merit: 10
|
|
June 24, 2012, 02:30:44 AM |
|
Hey guys, I just set up a new rig with BAMT. I'm running 6 5830's in it, all running stable @ ~=63C. Problem is there are two different models of Sapphire 5830 Xtremes. They look the exact same, but the components on the PCB are different and in different locations. I've had been using phatk2 with phoenix on my 5970 with great success. phatk2 with phoenix on the one model of 5830 gets me a solid 320MH/s @ 1000:300:1.168V. On the other model, same clocks/voltage, I get 250MH/s. So here is the funny thing. I switched the 250MH/s cards into my gaming computer and I get the same thing with phatk2, but if I use guiminer with poclbm, I get an easy 310MH/s....which is ass backwards if you ask me. So I'm trying to run 3 cards on phatk2 and 3 on poclbm in bamt with no success. The phatk2's start mining no problem, but the poclbm's don't start and the cpu peaks and stays peaked trying to start them. As far as software. I installed bamt 0.5c. Ran apt-get update (Don't do upgrade it screws EVERYTHING up). And ran the fixer. I tried installing pyopencl and it seemed to screw thing up too. Hardware: sempron 140, msi 890fxa-gd70, 1gb kingston, 2x cooler master 750w supplies, 3 extenders are powered by molex. Please help guys! How do I get this poclbm working? Thanks in advance.
|
|
|
|
BitMinerN8
|
|
June 24, 2012, 02:34:35 AM |
|
Hey guys, I just set up a new rig with BAMT. I'm running 6 5830's in it, all running stable @ ~=63C. Problem is there are two different models of Sapphire 5830 Xtremes. They look the exact same, but the components on the PCB are different and in different locations. I've had been using phatk2 with phoenix on my 5970 with great success. phatk2 with phoenix on the one model of 5830 gets me a solid 320MH/s @ 1000:300:1.168V. On the other model, same clocks/voltage, I get 250MH/s. So here is the funny thing. I switched the 250MH/s cards into my gaming computer and I get the same thing with phatk2, but if I use guiminer with poclbm, I get an easy 310MH/s....which is ass backwards if you ask me. So I'm trying to run 3 cards on phatk2 and 3 on poclbm in bamt with no success. The phatk2's start mining no problem, but the poclbm's don't start and the cpu peaks and stays peaked trying to start them. As far as software. I installed bamt 0.5c. Ran apt-get update (Don't do upgrade it screws EVERYTHING up). And ran the fixer. I tried installing pyopencl and it seemed to screw thing up too. Hardware: sempron 140, msi 890fxa-gd70, 1gb kingston, 2x cooler master 750w supplies, 3 extenders are powered by molex. Please help guys! How do I get this poclbm working? Thanks in advance. See if there are any files located in this directory: /live/image/BAMT/CONTROL/ACTIVE/ That dir should be empty, so if you see anything like a noOCGPUx in there, remove it, then restart mining.
|
|
|
|
nolo200
Member
Offline
Activity: 98
Merit: 10
|
|
June 24, 2012, 02:38:20 AM |
|
/live/image/BAMT/CONTROL/ACTIVE is empty
|
|
|
|
BitMinerN8
|
|
June 24, 2012, 02:54:51 AM |
|
/live/image/BAMT/CONTROL/ACTIVE is empty Not sure then, I have not run into that, but I don't push them that far either. On one of my rigs I have (5) 5830s, but I only take them to 995/280 (core_speed_2: 995, leave 0 and 1 commented out) and just left all the core_voltage_x commented out. So maybe try backing down on the OC and see if the Hhash goes up on it. Good luck.
|
|
|
|
Joshwaa
|
|
June 24, 2012, 12:01:27 PM |
|
If you want the Diamond HD7970 that I have that has been in a machine for a week I will give it to you for 67.5BTC If you want a new in box yea best I can do is 72BTC.
So here's a question.. i was thinking to myself: well used is fine so long as it has a warranty. so then i'm thinking "how do you submit a warranty claim with no receipt". but then I realized it would be the same whether I the new one or the used one.. Anyway, I'll buy the used one from you if its got a warranty that I can use in the event it dies.. not sure if that is even possible tho. They have a warranty (all cards from Newegg). I will make sure I handle it for you. Not a problem. If you want to go the 72BTC route it would be shipped to you with your name and you could handle, up to you.
|
|
|
|
nolo200
Member
Offline
Activity: 98
Merit: 10
|
|
June 25, 2012, 12:45:58 AM |
|
Problem solved. It seems that the problem was that the bios would ONLY allow Sapphire Trixx to overclock the cards. Flashed the bios with a "bios overclock" and everything works fine now. Apparently the difference in CCC 11.6 and 12.4 was the differential between the poclbm and phatk2 problem leading me to believe poclbm is better with CCC 12.4 while phatk2 is better with 11.6.
Everything is running 100% stable now with 6 5830's running at 1020Mhz/360Mhz/1.168V/80% fan. Average 6x320MH/s=1.92GH/s per rig. Not sure on power consumption (I left my Kill-a-watt in Missouri and I have free power this summer for my internship so I really don't care), but the fans on the power supplies are pumping some pretty warm air out. Rig is headless on its' own 20amp breaker.
Cards are running at about 70C in an ambient of about 69F.
I'll be sure and donate some BTC to the BAMT guys for this awesome OS.
|
|
|
|
Cray-1
Newbie
Offline
Activity: 36
Merit: 0
|
|
June 25, 2012, 08:31:16 PM |
|
I am having a lot of problems with the loss of network issue on my rigs. I understand 1) it is not really a BAMT problem and more a Linux Network manager problem, 2) only happens on some hardware and 3) several fixes have been discussed on this thread.
I am a Linux newbie and so my learning curve seems slow. Please be patient. I have 4 nearly identical rigs all with MSI 890fxa-gd70’s, Saphire 5830’s and a Cooler Master 1200w. I lost my network connection with all of them. (I still have internet to my laptop). I have a verizon router in my house and a switch box in my garage where the rigs are. The address to one of my rigs was (before the disconnect) 192.168.1.23.
Question 1. I am standing in front of the rig with a monitor, keyboard and mouse, what can I do (what command is entered) to re-establish the connection? How do I force a new lease?
Here is what I have tried. 1) I took BitMinerN8’s suggestion to use an older NIC. No luck I still lost connection. 2) I entered “ifdown eth0” logged in as root. I got the response “eth0” not configured. Eth0 was the port that had been used. 3) I shut down the rig and then restarted it. This works occasionally (but very rare).
Question 2. I tried the more permanent fix suggested in the bitcointrading.com link. Here is what I did: 1) I logged in as root 2) Then “nano /etc/network/interfaces” 3) The configuration file had: auto loiface lo inet loopback 4) I added the following under the first two lines: (I put a “1” in the 3rd position of the address, netmask, broadcast, and gateway. I think this is right because I go through a router and switch. Is this right? iface eth0 inet static address 192.168.1.23 netmask 255.255.255.0 network 192.168.1.0 broadcast 192.168.1.255 gateway 192.168.1.1 5) Then: apt-get remove network-manager* AND /etc/init.d/networking restart This did not fix the problem. Then I “nano /etc/network/interfaces” and my changes were not there. Instead I had "auto lo / iface lo inet loopback", and "allow-hotplug eth0 / iface eth0 inet dhcp"
How do I know what address do I use? I used 192.168.1.23
Please help. There is nothing more frustrating than staring at a rig (or four) that is not mining.
|
|
|
|
BitMinerN8
|
|
June 25, 2012, 09:07:45 PM |
|
I am having a lot of problems with the loss of network issue on my rigs. I understand 1) it is not really a BAMT problem and more a Linux Network manager problem, 2) only happens on some hardware and 3) several fixes have been discussed on this thread.
I am a Linux newbie and so my learning curve seems slow. Please be patient. I have 4 nearly identical rigs all with MSI 890fxa-gd70’s, Saphire 5830’s and a Cooler Master 1200w. I lost my network connection with all of them. (I still have internet to my laptop). I have a verizon router in my house and a switch box in my garage where the rigs are. The address to one of my rigs was (before the disconnect) 192.168.1.23.
Question 1. I am standing in front of the rig with a monitor, keyboard and mouse, what can I do (what command is entered) to re-establish the connection? How do I force a new lease?
Here is what I have tried. 1) I took BitMinerN8’s suggestion to use an older NIC. No luck I still lost connection. 2) I entered “ifdown eth0” logged in as root. I got the response “eth0” not configured. Eth0 was the port that had been used. 3) I shut down the rig and then restarted it. This works occasionally (but very rare).
Question 2. I tried the more permanent fix suggested in the bitcointrading.com link. Here is what I did: 1) I logged in as root 2) Then “nano /etc/network/interfaces” 3) The configuration file had: auto loiface lo inet loopback 4) I added the following under the first two lines: (I put a “1” in the 3rd position of the address, netmask, broadcast, and gateway. I think this is right because I go through a router and switch. Is this right? iface eth0 inet static address 192.168.1.23 netmask 255.255.255.0 network 192.168.1.0 broadcast 192.168.1.255 gateway 192.168.1.1 5) Then: apt-get remove network-manager* AND /etc/init.d/networking restart This did not fix the problem. Then I “nano /etc/network/interfaces” and my changes were not there. Instead I had "auto lo / iface lo inet loopback", and "allow-hotplug eth0 / iface eth0 inet dhcp"
How do I know what address do I use? I used 192.168.1.23
Please help. There is nothing more frustrating than staring at a rig (or four) that is not mining.
Well you can pick your own addresses, just need to keep track of what you assign to which rig. Just make sure after you assign one you try to ping your default gateway, then try something external. Example: (And forgive me if you're versed in TCP/IP troubleshooting) If your router IP is 192.168.1.1 then after you assign a static, try to ping that IP. Then try something external like 4.2.2.2 just to see if routing is working, then ping a name like google.com to see if DNS is working. "ifconfig" will show you what you currently have assigned via DHCP. It is probably best to NOT use static and assign DHCP leases in your router so that each rig will always get the IP you want. That way your not messing with config files. Less to configure if you reload a USB with BAMT. Not that this is a permanent fix, but just another way to access some network stuff. If you're in the GUI/Desktop have you right clicked on the icon on the task bar between the CPU graph and the clock? Sometimes disable/enabling works for me on this other rig I have that does this. It only does it after I reboot, and then after 10 min of working fine. I disable/enabling and it's good for weeks on end. Not really an option if your rigs are headless though and not on a switchbox/KVM. I think the Xwindows has issues with plugging and unplugging displays, it can crash sometimes. Sorry your having such a hard time with it. Certain combinations of hardware can be a pain in the ass.
|
|
|
|
nolo200
Member
Offline
Activity: 98
Merit: 10
|
|
June 25, 2012, 11:19:46 PM |
|
I am having a lot of problems with the loss of network issue on my rigs. I understand 1) it is not really a BAMT problem and more a Linux Network manager problem, 2) only happens on some hardware and 3) several fixes have been discussed on this thread.
I am a Linux newbie and so my learning curve seems slow. Please be patient. I have 4 nearly identical rigs all with MSI 890fxa-gd70’s, Saphire 5830’s and a Cooler Master 1200w. I lost my network connection with all of them. (I still have internet to my laptop). I have a verizon router in my house and a switch box in my garage where the rigs are. The address to one of my rigs was (before the disconnect) 192.168.1.23.
Question 1. I am standing in front of the rig with a monitor, keyboard and mouse, what can I do (what command is entered) to re-establish the connection? How do I force a new lease?
Here is what I have tried. 1) I took BitMinerN8’s suggestion to use an older NIC. No luck I still lost connection. 2) I entered “ifdown eth0” logged in as root. I got the response “eth0” not configured. Eth0 was the port that had been used. 3) I shut down the rig and then restarted it. This works occasionally (but very rare).
Question 2. I tried the more permanent fix suggested in the bitcointrading.com link. Here is what I did: 1) I logged in as root 2) Then “nano /etc/network/interfaces” 3) The configuration file had: auto loiface lo inet loopback 4) I added the following under the first two lines: (I put a “1” in the 3rd position of the address, netmask, broadcast, and gateway. I think this is right because I go through a router and switch. Is this right? iface eth0 inet static address 192.168.1.23 netmask 255.255.255.0 network 192.168.1.0 broadcast 192.168.1.255 gateway 192.168.1.1 5) Then: apt-get remove network-manager* AND /etc/init.d/networking restart This did not fix the problem. Then I “nano /etc/network/interfaces” and my changes were not there. Instead I had "auto lo / iface lo inet loopback", and "allow-hotplug eth0 / iface eth0 inet dhcp"
How do I know what address do I use? I used 192.168.1.23
Please help. There is nothing more frustrating than staring at a rig (or four) that is not mining.
My 890fxa-gd70 worked first thing when I plugged it in to the router. I assigned a permanent lease for the ip in my router afterwards and no problems since. You might want to check your bios version. Upgrade if necessary, it's SUPER easy with this board.
|
|
|
|
Inaba
Legendary
Offline
Activity: 1260
Merit: 1000
|
|
June 26, 2012, 01:18:41 AM |
|
There is a known problem with BAMT and some machines keeping a DHCP address. Assigning a permanent lease on your router is still a dynamic address and won't solve the problem.
A fix is unlikely, since the problem is relegated to a few boards/configurations and it's hard to reproduce apparently.
Just assign a static IP to your machine and be done with it. Worked for me.
|
If you're searching these lines for a point, you've probably missed it. There was never anything there in the first place.
|
|
|
CoinDiner
Newbie
Offline
Activity: 28
Merit: 0
|
|
June 26, 2012, 07:19:18 PM |
|
Is BAMT 0.5 only for GPU rigs?
|
|
|
|
tnkflx
|
|
June 26, 2012, 07:27:32 PM |
|
Is BAMT 0.5 only for GPU rigs?
BAMT can support all devices that cgminer can.
|
| Operating electrum.be & us.electrum.be |
|
|
|
CoinDiner
Newbie
Offline
Activity: 28
Merit: 0
|
|
June 26, 2012, 07:47:06 PM |
|
nice one thanks!
|
|
|
|
lodcrappo (OP)
|
|
June 26, 2012, 09:23:04 PM |
|
Is BAMT 0.5 only for GPU rigs?
yes
|
|
|
|
tnkflx
|
|
June 26, 2012, 09:30:10 PM |
|
Is BAMT 0.5 only for GPU rigs?
yes Doesnµt the included cgminer support fpga devices?
|
| Operating electrum.be & us.electrum.be |
|
|
|
|