lodcrappo (OP)
|
|
March 06, 2012, 03:52:50 AM |
|
OK I know this is problem a noob? But the configuration file that comes up at startup is that cgminer. File or do I have to set it up to use cgminer?
if you are seeing a config file opening at startup, you are using an outdated version that probably doesn't even support cgminer. please use a current (0.5c) image. you will find an example config with all options detailed in /opt/bamt/examples, or on our Wiki in the Examples section. These documents apply only to the current version.
|
|
|
|
boozer
|
|
March 06, 2012, 03:55:29 AM |
|
What does it mean when mother puts noGPUx in the ACTIVE directory, I found out that was the card I was having issues with... I reset it to stock speeds and a short time later it rebooted and put noGPUx in the ACTIVE directory. I assume this is going badly for me and this core.... Especially since it happened at stock speeds.
yeah that is not good. actually never heard of it happening at stock speeds. your fan still turning and whatnot? the presence of that file means phoenix went defunct, which means your GPU went insane as far as I know. when the GPU stops responding, phoenix tends to become very upset. might want to inspect that something isn't burning up on the card. the temp sensor is not all knowing. it only knows temp in one place. if you have another card exhausting onto that one, it could be crazy hot in a place the sensor doesn't really read, stuff like that. or.. could just be borken lol... well glad I could be a first to get it to happen at stock ... At stock BAMT reboots every 20ish minutes due to this card... i tried multiple times removing it from ACTIVE and it just keeps coming back.... I tried removing from ACTIVE, leaving it at stock and clocking the mem to 300, but then it reboots and sets it back to stock mem..... think I might have got bad card off ebay... I dont have any cards venting into it... just 3 cards on the board. Fan is still running, although sometimes temps jump all over the place, so maybe the fan is cutting in and out... This card has been a pain in my ass, lol. Thanks for all your help lodcrappo! I'll go try.... .... something, lol.
|
|
|
|
lodcrappo (OP)
|
|
March 06, 2012, 03:58:05 AM |
|
What does it mean when mother puts noGPUx in the ACTIVE directory, I found out that was the card I was having issues with... I reset it to stock speeds and a short time later it rebooted and put noGPUx in the ACTIVE directory. I assume this is going badly for me and this core.... Especially since it happened at stock speeds.
yeah that is not good. actually never heard of it happening at stock speeds. your fan still turning and whatnot? the presence of that file means phoenix went defunct, which means your GPU went insane as far as I know. when the GPU stops responding, phoenix tends to become very upset. might want to inspect that something isn't burning up on the card. the temp sensor is not all knowing. it only knows temp in one place. if you have another card exhausting onto that one, it could be crazy hot in a place the sensor doesn't really read, stuff like that. or.. could just be borken lol... well glad I could be a first to get it to happen at stock ... At stock BAMT reboots every 20ish minutes due to this card... i tried multiple times removing it from ACTIVE and it just keeps coming back.... I tried removing from ACTIVE, leaving it at stock and clocking the mem to 300, but then it reboots and sets it back to stock mem..... think I might have got bad card off ebay... I dont have any cards venting into it... just 3 cards on the board. Fan is still running, although sometimes temps jump all over the place, so maybe the fan is cutting in and out... This card has been a pain in my ass, lol. Thanks for all your help lodcrappo! I'll go try.... .... something, lol. you can put detect_defunct: 0 in settings area of bamt.conf. that will stop mother from rebooting. i'd guess that would just leave the thing hung up and not mining every 20 minutes instead of rebooting, but who knows. if temps are jumping all over the place... well doesn't sound good to me. one possibility: reseat the heatsink/apply thermal paste better (not more, just better, usually)/stuff like that
|
|
|
|
BitMinerN8
|
|
March 06, 2012, 04:02:38 AM |
|
What does it mean when mother puts noGPUx in the ACTIVE directory, I found out that was the card I was having issues with... I reset it to stock speeds and a short time later it rebooted and put noGPUx in the ACTIVE directory. I assume this is going badly for me and this core.... Especially since it happened at stock speeds.
yeah that is not good. actually never heard of it happening at stock speeds. your fan still turning and whatnot? the presence of that file means phoenix went defunct, which means your GPU went insane as far as I know. when the GPU stops responding, phoenix tends to become very upset. might want to inspect that something isn't burning up on the card. the temp sensor is not all knowing. it only knows temp in one place. if you have another card exhausting onto that one, it could be crazy hot in a place the sensor doesn't really read, stuff like that. or.. could just be borken lol... well glad I could be a first to get it to happen at stock ... At stock BAMT reboots every 20ish minutes due to this card... i tried multiple times removing it from ACTIVE and it just keeps coming back.... I tried removing from ACTIVE, leaving it at stock and clocking the mem to 300, but then it reboots and sets it back to stock mem..... think I might have got bad card off ebay... I dont have any cards venting into it... just 3 cards on the board. Fan is still running, although sometimes temps jump all over the place, so maybe the fan is cutting in and out... This card has been a pain in my ass, lol. Thanks for all your help lodcrappo! I'll go try.... .... something, lol. you can put detect_defunct: 0 in settings area of bamt.conf. that will stop mother from rebooting. i'd guess that would just leave the thing hung up and not mining every 20 minutes instead of rebooting, but who knows. if temps are jumping all over the place... well doesn't sound good to me. You say you have 3 GPU on this rig. Is the GPU in the middle or on an end? Maybe swap them around and see if the problem follows the card. I had 1 of the 5 on one of my rigs that was acting up. I moved it to the end with nothing blocking the fans and it's been stable now for 5 days.
|
|
|
|
boozer
|
|
March 06, 2012, 04:18:29 AM |
|
you can put
detect_defunct: 0
in settings area of bamt.conf. that will stop mother from rebooting. i'd guess that would just leave the thing hung up and not mining every 20 minutes instead of rebooting, but who knows.
if temps are jumping all over the place... well doesn't sound good to me.
You say you have 3 GPU on this rig. Is the GPU in the middle or on an end? Maybe swap them around and see if the problem follows the card. I had 1 of the 5 on one of my rigs that was acting up. I moved it to the end with nothing blocking the fans and it's been stable now for 5 days. Mother just put noGPU0 back into the ACTIVE directory, so its not even running now, however, it just rebooted again... now its the second core on that same card that got the OC removed... I had it at 800/300... sigh.... BitMinerN8: Its at the top and currently running about 10C higher than the middle and bottom card with one core disabled and the other at stock. I'll try moving it to the "bottom" slot.. I have an open air rig, so the "bottom" slot runs the coolest as nothing is in its way... I assume there is no way to determine GPU numbers it will be if I move it? I thought I remember reading somewhere that it was just based on the motherboard. If i still have issues, I'll try resetting the thermal paste/heatsink.. and if still issues... I'll finish beating my head on a wall and return it as i just bought it on ebay, lol.
|
|
|
|
xanadu
Member
Offline
Activity: 63
Merit: 10
|
|
March 06, 2012, 05:00:11 AM |
|
I burned a new USB stick with the .5b image tonight, edited bamt.conf and the pools files, then applied the fixes via fixer. The GPUs won't start mining, I'm getting the "No protocol specified" error. Mother -v returns: mother starts (19 seconds since last run) babysit autoconf client... autoconf client is ok look for defunct phoenix... gathering GPU status...No protocol specified done broadcasting status No protocol specified checking GPU health... refreshing desktop bg.. Running /etc/init.d/mine restart returns: Stopping mining processes...: mine... Starting mining processes...: minestart_mining: starting mining processes No protocol specified ..munin. Running aticonfig --list-adapters returns * 0. 07:00.0 ATI Radeon HD 5900 Series 1. 0e:00.0 ATI Radeon HD 5900 Series 2. 0f:00.0 ATI Radeon HD 5900 Series 3. 06:00.0 ATI Radeon HD 5900 Series * - Default adapter What did I miss, it seems like things should be running fine? Thanks! -X
|
|
|
|
lodcrappo (OP)
|
|
March 06, 2012, 05:08:56 AM |
|
I burned a new USB stick with the .5b image tonight, edited bamt.conf and the pools files, then applied the fixes via fixer. The GPUs won't start mining, I'm getting the "No protocol specified" error. Mother -v returns: mother starts (19 seconds since last run) babysit autoconf client... autoconf client is ok look for defunct phoenix... gathering GPU status...No protocol specified done broadcasting status No protocol specified checking GPU health... refreshing desktop bg.. Running /etc/init.d/mine restart returns: Stopping mining processes...: mine... Starting mining processes...: minestart_mining: starting mining processes No protocol specified ..munin. Running aticonfig --list-adapters returns * 0. 07:00.0 ATI Radeon HD 5900 Series 1. 0e:00.0 ATI Radeon HD 5900 Series 2. 0f:00.0 ATI Radeon HD 5900 Series 3. 06:00.0 ATI Radeon HD 5900 Series * - Default adapter What did I miss, it seems like things should be running fine? Thanks! -X looks like xwindows isn't happy. try (logged in as root, never sudo): /opt/bamt/start_mining pre (let it sit there for a bit) and xauth merge /home/user/.Xauthority all that happens on boot anyway, but sometimes if your rig boots slowly it can happen too soon. the best test for if its fixed is: atitweak -s if atitweak can't list your cards, the ADL libs aren't working and BAMT isn't going to have any joy.
|
|
|
|
xanadu
Member
Offline
Activity: 63
Merit: 10
|
|
March 06, 2012, 05:21:38 AM |
|
looks like xwindows isn't happy. try (logged in as root, never sudo): /opt/bamt/start_mining pre (let it sit there for a bit)
and
xauth merge /home/user/.Xauthority all that happens on boot anyway, but sometimes if your rig boots slowly it can happen too soon.
the best test for if its fixed is: atitweak -s if atitweak can't list your cards, the ADL libs aren't working and BAMT isn't going to have any joy.
That did the trick, just running the /opt/bamt/start_mining pre and everything suddenly appeared on my monitoring machine, then I ran a mine restart to get things hashing. So, how do I make this happen properly during the normal booting sequence? Thank you for the advice. -X
|
|
|
|
Definit
|
|
March 06, 2012, 05:48:39 AM |
|
im actually dealing with an issue that is similar... maybe not, but... with 1 5970 in only, no matter if i switch it out with a different 5970= GPU0 gets 368 m/hs GPU1 gets 322 m/hs it seems as if gpu1 will never overclock no matter the settings, even if its the same as gpu0 in which case should work since its dual gpu... trying to re-write the usb, and start fresh........any ideas as to what i might need to do will be tried n tested.
|
|
|
|
BitMinerN8
|
|
March 06, 2012, 06:40:22 AM |
|
you can put
detect_defunct: 0
in settings area of bamt.conf. that will stop mother from rebooting. i'd guess that would just leave the thing hung up and not mining every 20 minutes instead of rebooting, but who knows.
if temps are jumping all over the place... well doesn't sound good to me.
You say you have 3 GPU on this rig. Is the GPU in the middle or on an end? Maybe swap them around and see if the problem follows the card. I had 1 of the 5 on one of my rigs that was acting up. I moved it to the end with nothing blocking the fans and it's been stable now for 5 days. Mother just put noGPU0 back into the ACTIVE directory, so its not even running now, however, it just rebooted again... now its the second core on that same card that got the OC removed... I had it at 800/300... sigh.... BitMinerN8: Its at the top and currently running about 10C higher than the middle and bottom card with one core disabled and the other at stock. I'll try moving it to the "bottom" slot.. I have an open air rig, so the "bottom" slot runs the coolest as nothing is in its way... I assume there is no way to determine GPU numbers it will be if I move it? I thought I remember reading somewhere that it was just based on the motherboard. If i still have issues, I'll try resetting the thermal paste/heatsink.. and if still issues... I'll finish beating my head on a wall and return it as i just bought it on ebay, lol. There is a cool tool built into bamt since fix 13 for helping to determine GPU numbers. Type: idgpu More info here: http://bamter.org/redmine/news/11
|
|
|
|
boozer
|
|
March 06, 2012, 07:09:00 AM |
|
Is there any possibility that a failure on "any" GPU gets reported as GPU0 issue, regardless which gpu it was? I removed the card i thought was bad... so only had 4 GPU's, but the new gpu0 again started having the same problem.
So I added the card I thought had issues back into the mix and set every card to stock and it has been running fine. So before, gpu0 running stock would die, but with all other gpus at stock, everything seems to be fine.... its only been 30 minutes, but thats longer than it used to last, lol. Just thought I would ask about the gpu0 theory, but maybe they both had problems.... I'll check in the AM to see if it stayed up the rest of the night.
|
|
|
|
Definit
|
|
March 06, 2012, 07:20:49 AM |
|
as for the gpu thing i mentioned earlier... re-writing bamt onto the usb drive worked...but as i add in a 2nd & or 3rd 5970 a whole new set of problem's occur. urggh.
for everyone running 6 gpu's / 3 5970's... how many watts you pushing them with? with the 800/300/1.05v ?
i cant even seem to get it to like those settings on 0.5...
even with just 2 5970's (4gpus) with 1-1300w psu rosewill lightning, it just seems to lock up... i tried adjusting the boot up time to a more spread 10sec between them...but nothing, just locking up...
doesnt make sense to me right now
|
|
|
|
boozer
|
|
March 06, 2012, 07:33:41 AM Last edit: March 06, 2012, 10:13:38 AM by boozer |
|
as for the gpu thing i mentioned earlier... re-writing bamt onto the usb drive worked...but as i add in a 2nd & or 3rd 5970 a whole new set of problem's occur. urggh.
for everyone running 6 gpu's / 3 5970's... how many watts you pushing them with? with the 800/300/1.05v ?
i cant even seem to get it to like those settings on 0.5...
even with just 2 5970's (4gpus) with 1-1300w psu rosewill lightning, it just seems to lock up... i tried adjusting the boot up time to a more spread 10sec between them...but nothing, just locking up...
doesnt make sense to me right now
have you tried running them all at stock 725/300/1.05v? That's what I went back to and have been the most stable on thus far.... I'm headed to bed now.. see how it is in the AM.
|
|
|
|
malevolent
can into space
Legendary
Offline
Activity: 3472
Merit: 1722
|
|
March 06, 2012, 08:25:18 AM Last edit: March 06, 2012, 08:37:03 AM by malevolent |
|
Try mining with the card on windows and open gpu-z which reads temperatures from all sensors. This is how I found why one of my cards was throttling despite <80C shown in bamt and stock voltage.
|
Signature space available for rent.
|
|
|
Splirow
|
|
March 06, 2012, 08:45:00 AM |
|
as for the gpu thing i mentioned earlier... re-writing bamt onto the usb drive worked...but as i add in a 2nd & or 3rd 5970 a whole new set of problem's occur. urggh.
for everyone running 6 gpu's / 3 5970's... how many watts you pushing them with? with the 800/300/1.05v ?
i cant even seem to get it to like those settings on 0.5...
even with just 2 5970's (4gpus) with 1-1300w psu rosewill lightning, it just seems to lock up... i tried adjusting the boot up time to a more spread 10sec between them...but nothing, just locking up...
doesnt make sense to me right now
I have Bamt with 3 5970. I have them at 825/300/1.05v getting 382 m/hash It is perfectly stable for me. Last time I checked, i was pulling 930 watts from the wall.
|
|
|
|
DeathAndTaxes
Donator
Legendary
Offline
Activity: 1218
Merit: 1079
Gerald Davis
|
|
March 06, 2012, 01:24:53 PM |
|
Is there any possibility that a failure on "any" GPU gets reported as GPU0 issue, regardless which gpu it was? I removed the card i thought was bad... so only had 4 GPU's, but the new gpu0 again started having the same problem.
So I added the card I thought had issues back into the mix and set every card to stock and it has been running fine. So before, gpu0 running stock would die, but with all other gpus at stock, everything seems to be fine.... its only been 30 minutes, but thats longer than it used to last, lol. Just thought I would ask about the gpu0 theory, but maybe they both had problems.... I'll check in the AM to see if it stayed up the rest of the night.
If you are having multi-card failures start at stock. Run everything at 725/240 (300 also works but it is actually a MH "valley") with 85% fan for 24 hours to check for stability. From your descriptions it sounds like you have them in a closed case. Likely that isn't going to work w/ 3x5970s. I run 3x5970s in open frame with a Ultra Kaze fan placed at the expansion slot to create negative pressure and aid w/ exhaust. Even then temps are high. 5970s simply run hot and trying to run 3x in a case (no matter how good the case) is simply a recipe for failure.
|
|
|
|
Intention
|
|
March 06, 2012, 01:49:06 PM |
|
turn mem_speed back on for 0 and 1. your card probably has a default speed higher than 300 in profile 1, if not 0. that will prevent you from setting 2 to 300. higher profile cannot have lower values than lower profile, basic rule of overclocking (some cards don't care, some do).
When I ran it with: gpu0: # remove disabled: or set it to 0 to actually use this card.. disabled: 0 debug_oc: 1 #core_speed_0: 800 #core_speed_1: 850 core_speed_2: 900 mem_speed_0: 300 mem_speed_1: 300 mem_speed_2: 300 #core_voltage_0: 1.125 #core_voltage_1: 1.125 #core_voltage_2: 1.125 It is still throwing errors...taking the you cannot have lower values on higher profiles into account I also attempted mem_speed_0:300 1:350 2:400 just to see what the card would do. Right now I'm just running it stock but if it keeps being stubborn I might just put Windows or something on the USB stick since the stupid poorly designed mobo has the SATA ports blocked by the fan of a videocard...granted this was back when cards were only 1 slot. I appreciate the suggestions.
|
|
|
|
jamesg
VIP
Legendary
Offline
Activity: 1358
Merit: 1000
AKA: gigavps
|
|
March 06, 2012, 02:11:29 PM |
|
To all BAMT users: If you are asking for support on the forums from lodcrappo, please help him out by sending a donation. It wasn't until very recently that I fully understood just how awesome BAMT is and how much easier it makes MY life. 0.5 has been FLAWLESS for me and I run 89 GPUs and 1 FPGA so if you are having problems, it is most likey NOT BAMT. Please show lodcrappo your appreciation for his FREE software! Best, gigavps
|
|
|
|
boozer
|
|
March 06, 2012, 04:09:20 PM |
|
If you are having multi-card failures start at stock. Run everything at 725/240 (300 also works but it is actually a MH "valley") with 85% fan for 24 hours to check for stability.
From your descriptions it sounds like you have them in a closed case. Likely that isn't going to work w/ 3x5970s. I run 3x5970s in open frame with a Ultra Kaze fan placed at the expansion slot to create negative pressure and aid w/ exhaust. Even then temps are high. 5970s simply run hot and trying to run 3x in a case (no matter how good the case) is simply a recipe for failure.
Its a wide open rig, similar to the one shown in the "build your own" hardware section of this forum. I think I ran everything everything at stock for 24 hours, but its been awhile, so I'll go back to that and set mem at 240 and see if stock is stable or not.
|
|
|
|
DeathAndTaxes
Donator
Legendary
Offline
Activity: 1218
Merit: 1079
Gerald Davis
|
|
March 06, 2012, 04:24:29 PM |
|
Its a wide open rig, similar to the one shown in the "build your own" hardware section of this forum. I think I ran everything everything at stock for 24 hours, but its been awhile, so I'll go back to that and set mem at 240 and see if stock is stable or not. If it is stable at stock and not at higher then it is simply excessive overclock. Lower memclock won't make it more stable but it does use less wattage. 300 is commonly used number but it is actually a very bad memclock. It is actually slower than both 280 and 310. When you find clocks that are stable for 24 hours you likely aren't there yet. Eventually the rig will crash. When it does drop the core clock 5 Mhz to 10 Mhz on the affected GPU and reboot. One by one you will find the stable clocks. Now the rig may run 15 days or so and crash. You can either accept that or drop clocks another 5 Mhz or so. Eventually you will find the speed that runs 24/7 for 90+ days.
|
|
|
|
|