Bitcoin Forum
April 20, 2024, 12:53:37 AM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 [14] 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 ... 80 »
  Print  
Author Topic: BAMT version 0.5 - Easy USB based mining Linux with farm wide management tools  (Read 324103 times)
lodcrappo (OP)
Hero Member
*****
Offline Offline

Activity: 616
Merit: 506


View Profile
March 06, 2012, 03:52:50 AM
 #261

OK I know this is problem a noob? But the configuration file that comes up at startup is that cgminer. File or do I have to set it up to use cgminer?


if you are seeing a config file opening at startup, you are using an outdated version that probably doesn't even support cgminer.

please use a current (0.5c) image.

you will find an example config with all options detailed in /opt/bamt/examples, or on our Wiki in the Examples section.  These documents apply only to the current version.


1713574417
Hero Member
*
Offline Offline

Posts: 1713574417

View Profile Personal Message (Offline)

Ignore
1713574417
Reply with quote  #2

1713574417
Report to moderator
The forum was founded in 2009 by Satoshi and Sirius. It replaced a SourceForge forum.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713574417
Hero Member
*
Offline Offline

Posts: 1713574417

View Profile Personal Message (Offline)

Ignore
1713574417
Reply with quote  #2

1713574417
Report to moderator
1713574417
Hero Member
*
Offline Offline

Posts: 1713574417

View Profile Personal Message (Offline)

Ignore
1713574417
Reply with quote  #2

1713574417
Report to moderator
boozer
Sr. Member
****
Offline Offline

Activity: 309
Merit: 250


View Profile
March 06, 2012, 03:55:29 AM
 #262

What does it mean when mother puts noGPUx in the ACTIVE directory, I found out that was the card I was having issues with... I reset it to stock speeds and a short time later it rebooted and put noGPUx in the ACTIVE directory.  I assume this is going badly for me and this core....  Especially since it happened at stock speeds.

yeah that is not good.  actually never heard of it happening at stock speeds.  your fan still turning and whatnot?

the presence of that file means phoenix went defunct, which means your GPU went insane as far as I know.  when the GPU stops responding, phoenix tends to become very upset.

might want to inspect that something isn't burning up on the card.  the temp sensor is not all knowing.  it only knows temp in one place.
if you have another card exhausting onto that one, it could be crazy hot in a place the sensor doesn't really read, stuff like that.

or.. could just be borken

lol... well glad I could be a first to get it to happen at stock   Roll Eyes... At stock BAMT reboots every 20ish minutes due to this card... i tried multiple times removing it from ACTIVE and it just keeps coming back.... I tried removing from ACTIVE, leaving it at stock and clocking the mem to 300, but then it reboots and sets it back to stock mem..... think I might have got bad card off ebay... I dont have any cards venting into it... just 3 cards on the board.  Fan is still running, although sometimes temps jump all over the place, so maybe the fan is cutting in and out... This card has been a pain in my ass, lol.  Thanks for all your help lodcrappo!  I'll go try....  .... something, lol.
lodcrappo (OP)
Hero Member
*****
Offline Offline

Activity: 616
Merit: 506


View Profile
March 06, 2012, 03:58:05 AM
 #263

What does it mean when mother puts noGPUx in the ACTIVE directory, I found out that was the card I was having issues with... I reset it to stock speeds and a short time later it rebooted and put noGPUx in the ACTIVE directory.  I assume this is going badly for me and this core....  Especially since it happened at stock speeds.

yeah that is not good.  actually never heard of it happening at stock speeds.  your fan still turning and whatnot?

the presence of that file means phoenix went defunct, which means your GPU went insane as far as I know.  when the GPU stops responding, phoenix tends to become very upset.

might want to inspect that something isn't burning up on the card.  the temp sensor is not all knowing.  it only knows temp in one place.
if you have another card exhausting onto that one, it could be crazy hot in a place the sensor doesn't really read, stuff like that.

or.. could just be borken

lol... well glad I could be a first to get it to happen at stock   Roll Eyes... At stock BAMT reboots every 20ish minutes due to this card... i tried multiple times removing it from ACTIVE and it just keeps coming back.... I tried removing from ACTIVE, leaving it at stock and clocking the mem to 300, but then it reboots and sets it back to stock mem..... think I might have got bad card off ebay... I dont have any cards venting into it... just 3 cards on the board.  Fan is still running, although sometimes temps jump all over the place, so maybe the fan is cutting in and out... This card has been a pain in my ass, lol.  Thanks for all your help lodcrappo!  I'll go try....  .... something, lol.

you can put

  detect_defunct: 0

in settings area of bamt.conf.  that will stop mother from rebooting.  i'd guess that would just leave the thing hung up and not mining every 20 minutes instead of rebooting, but who knows.

if temps are jumping all over the place... well doesn't sound good to me.

one possibility: reseat the heatsink/apply thermal paste better (not more, just better, usually)/stuff like that
BitMinerN8
Hero Member
*****
Offline Offline

Activity: 626
Merit: 500


Mining since May 2011.


View Profile
March 06, 2012, 04:02:38 AM
 #264

What does it mean when mother puts noGPUx in the ACTIVE directory, I found out that was the card I was having issues with... I reset it to stock speeds and a short time later it rebooted and put noGPUx in the ACTIVE directory.  I assume this is going badly for me and this core....  Especially since it happened at stock speeds.

yeah that is not good.  actually never heard of it happening at stock speeds.  your fan still turning and whatnot?

the presence of that file means phoenix went defunct, which means your GPU went insane as far as I know.  when the GPU stops responding, phoenix tends to become very upset.

might want to inspect that something isn't burning up on the card.  the temp sensor is not all knowing.  it only knows temp in one place.
if you have another card exhausting onto that one, it could be crazy hot in a place the sensor doesn't really read, stuff like that.

or.. could just be borken

lol... well glad I could be a first to get it to happen at stock   Roll Eyes... At stock BAMT reboots every 20ish minutes due to this card... i tried multiple times removing it from ACTIVE and it just keeps coming back.... I tried removing from ACTIVE, leaving it at stock and clocking the mem to 300, but then it reboots and sets it back to stock mem..... think I might have got bad card off ebay... I dont have any cards venting into it... just 3 cards on the board.  Fan is still running, although sometimes temps jump all over the place, so maybe the fan is cutting in and out... This card has been a pain in my ass, lol.  Thanks for all your help lodcrappo!  I'll go try....  .... something, lol.

you can put

  detect_defunct: 0

in settings area of bamt.conf.  that will stop mother from rebooting.  i'd guess that would just leave the thing hung up and not mining every 20 minutes instead of rebooting, but who knows.

if temps are jumping all over the place... well doesn't sound good to me.


You say you have 3 GPU on this rig. Is the GPU in the middle or on an end? Maybe swap them around and see if the problem follows the card. I had 1 of the 5 on one of my rigs that was acting up. I moved it to the end with nothing blocking the fans and it's been stable now for 5 days.
boozer
Sr. Member
****
Offline Offline

Activity: 309
Merit: 250


View Profile
March 06, 2012, 04:18:29 AM
 #265


you can put

  detect_defunct: 0

in settings area of bamt.conf.  that will stop mother from rebooting.  i'd guess that would just leave the thing hung up and not mining every 20 minutes instead of rebooting, but who knows.

if temps are jumping all over the place... well doesn't sound good to me.


You say you have 3 GPU on this rig. Is the GPU in the middle or on an end? Maybe swap them around and see if the problem follows the card. I had 1 of the 5 on one of my rigs that was acting up. I moved it to the end with nothing blocking the fans and it's been stable now for 5 days.

Mother just put noGPU0 back into the ACTIVE directory, so its not even running now, however, it just rebooted again... now its the second core on that same card that got the OC removed... I had it at 800/300... sigh....


BitMinerN8:
Its at the top and currently running  about 10C higher than the middle and bottom card with one core disabled and the other at stock.  I'll try moving it to the "bottom" slot.. I have an open air rig, so the "bottom" slot runs the coolest as nothing is in its way...  I assume there is no way to determine GPU numbers it will be if I move it?  I thought I remember reading somewhere that it was just  based on the motherboard.


If i still have issues, I'll try resetting the thermal paste/heatsink.. and if still issues... I'll finish beating my head on a wall and return it as i just bought it on ebay, lol.
xanadu
Member
**
Offline Offline

Activity: 63
Merit: 10


View Profile
March 06, 2012, 05:00:11 AM
 #266

I burned a new USB stick with the .5b image tonight, edited bamt.conf and the pools files, then applied the fixes via fixer.

The GPUs won't start mining, I'm getting the "No protocol specified" error.

Mother -v returns:
Quote
mother starts (19 seconds since last run)
babysit autoconf client...
        autoconf client is ok
look for defunct phoenix...
gathering GPU status...No protocol specified
done
broadcasting status
No protocol specified
checking GPU health...
refreshing desktop bg..

Running /etc/init.d/mine restart returns:
Quote
Stopping mining processes...: mine...
Starting mining processes...: minestart_mining: starting mining processes
No protocol specified
..munin.

Running aticonfig --list-adapters returns
Quote
* 0. 07:00.0 ATI Radeon HD 5900 Series
  1. 0e:00.0 ATI Radeon HD 5900 Series
  2. 0f:00.0 ATI Radeon HD 5900 Series
  3. 06:00.0 ATI Radeon HD 5900 Series
* - Default adapter

What did I miss, it seems like things should be running fine?

Thanks!
-X
lodcrappo (OP)
Hero Member
*****
Offline Offline

Activity: 616
Merit: 506


View Profile
March 06, 2012, 05:08:56 AM
 #267

I burned a new USB stick with the .5b image tonight, edited bamt.conf and the pools files, then applied the fixes via fixer.

The GPUs won't start mining, I'm getting the "No protocol specified" error.

Mother -v returns:
Quote
mother starts (19 seconds since last run)
babysit autoconf client...
        autoconf client is ok
look for defunct phoenix...
gathering GPU status...No protocol specified
done
broadcasting status
No protocol specified
checking GPU health...
refreshing desktop bg..

Running /etc/init.d/mine restart returns:
Quote
Stopping mining processes...: mine...
Starting mining processes...: minestart_mining: starting mining processes
No protocol specified
..munin.

Running aticonfig --list-adapters returns
Quote
* 0. 07:00.0 ATI Radeon HD 5900 Series
  1. 0e:00.0 ATI Radeon HD 5900 Series
  2. 0f:00.0 ATI Radeon HD 5900 Series
  3. 06:00.0 ATI Radeon HD 5900 Series
* - Default adapter

What did I miss, it seems like things should be running fine?

Thanks!
-X


looks like xwindows isn't happy.  try (logged in as root, never sudo):

/opt/bamt/start_mining pre

(let it sit there for a bit)

and

xauth merge /home/user/.Xauthority


all that happens on boot anyway, but sometimes if your rig boots slowly it can happen too soon.



the best test for if its fixed is:

atitweak -s

if atitweak can't list your cards, the ADL libs aren't working and BAMT isn't going to have any joy.



xanadu
Member
**
Offline Offline

Activity: 63
Merit: 10


View Profile
March 06, 2012, 05:21:38 AM
 #268

Quote
looks like xwindows isn't happy.  try (logged in as root, never sudo):
/opt/bamt/start_mining pre
(let it sit there for a bit)

and

xauth merge /home/user/.Xauthority
all that happens on boot anyway, but sometimes if your rig boots slowly it can happen too soon.

the best test for if its fixed is:
atitweak -s
if atitweak can't list your cards, the ADL libs aren't working and BAMT isn't going to have any joy.

That did the trick, just running the /opt/bamt/start_mining pre and everything suddenly appeared on my monitoring machine, then I ran a mine restart to get things hashing.  So, how do I make this happen properly during the normal booting sequence?

Thank you for the advice.
-X
Definit
Sr. Member
****
Offline Offline

Activity: 357
Merit: 250



View Profile
March 06, 2012, 05:48:39 AM
 #269

im actually dealing with an issue that is similar... maybe not, but...
with 1 5970 in only, no matter if i switch it out with a different 5970=

GPU0 gets 368 m/hs
GPU1 gets 322 m/hs

it seems as if gpu1 will never overclock no matter the settings, even if its the same as gpu0 in which case should work since its dual gpu...

trying to re-write the usb, and start fresh........any ideas as to what i might need to do will be tried n tested.  Smiley
BitMinerN8
Hero Member
*****
Offline Offline

Activity: 626
Merit: 500


Mining since May 2011.


View Profile
March 06, 2012, 06:40:22 AM
 #270


you can put

  detect_defunct: 0

in settings area of bamt.conf.  that will stop mother from rebooting.  i'd guess that would just leave the thing hung up and not mining every 20 minutes instead of rebooting, but who knows.

if temps are jumping all over the place... well doesn't sound good to me.


You say you have 3 GPU on this rig. Is the GPU in the middle or on an end? Maybe swap them around and see if the problem follows the card. I had 1 of the 5 on one of my rigs that was acting up. I moved it to the end with nothing blocking the fans and it's been stable now for 5 days.

Mother just put noGPU0 back into the ACTIVE directory, so its not even running now, however, it just rebooted again... now its the second core on that same card that got the OC removed... I had it at 800/300... sigh....


BitMinerN8:
Its at the top and currently running  about 10C higher than the middle and bottom card with one core disabled and the other at stock.  I'll try moving it to the "bottom" slot.. I have an open air rig, so the "bottom" slot runs the coolest as nothing is in its way...  I assume there is no way to determine GPU numbers it will be if I move it?  I thought I remember reading somewhere that it was just  based on the motherboard.


If i still have issues, I'll try resetting the thermal paste/heatsink.. and if still issues... I'll finish beating my head on a wall and return it as i just bought it on ebay, lol.

There is a cool tool built into bamt since fix 13 for helping to determine GPU numbers. Type: idgpu
More info here: http://bamter.org/redmine/news/11
boozer
Sr. Member
****
Offline Offline

Activity: 309
Merit: 250


View Profile
March 06, 2012, 07:09:00 AM
 #271

Is there any possibility that a failure on "any" GPU gets reported as GPU0 issue, regardless which gpu it was?  I removed the card i thought was bad... so only had 4 GPU's, but the new gpu0 again started having the same problem. 

So I added the card I thought had issues back into the mix and set every card to stock and it has been running fine.  So before, gpu0 running stock would die, but with all other gpus at stock, everything seems to be fine.... its only been 30 minutes, but thats longer than it used to last, lol.  Just thought I would ask about the gpu0 theory, but maybe they both had problems.... I'll check in the AM to see if it stayed up the rest of the night.
Definit
Sr. Member
****
Offline Offline

Activity: 357
Merit: 250



View Profile
March 06, 2012, 07:20:49 AM
 #272

as for the gpu thing i mentioned earlier... re-writing bamt onto the usb drive worked...but as i add in a 2nd & or 3rd 5970 a whole new set of problem's occur. urggh.


for everyone running 6 gpu's / 3 5970's... how many watts you pushing them with? with the 800/300/1.05v ?

i cant even seem to get it to like those settings on 0.5...

even with just 2 5970's (4gpus) with 1-1300w psu rosewill lightning, it just seems to lock up... i tried adjusting the boot up time to a more spread 10sec between them...but nothing, just locking up...

doesnt make sense to me right now
boozer
Sr. Member
****
Offline Offline

Activity: 309
Merit: 250


View Profile
March 06, 2012, 07:33:41 AM
Last edit: March 06, 2012, 10:13:38 AM by boozer
 #273

as for the gpu thing i mentioned earlier... re-writing bamt onto the usb drive worked...but as i add in a 2nd & or 3rd 5970 a whole new set of problem's occur. urggh.


for everyone running 6 gpu's / 3 5970's... how many watts you pushing them with? with the 800/300/1.05v ?

i cant even seem to get it to like those settings on 0.5...

even with just 2 5970's (4gpus) with 1-1300w psu rosewill lightning, it just seems to lock up... i tried adjusting the boot up time to a more spread 10sec between them...but nothing, just locking up...

doesnt make sense to me right now

have you tried running them all at stock 725/300/1.05v?  That's what I went back to and have been the most stable on thus far.... I'm headed to bed now.. see how it is in the AM.
malevolent
can into space
Legendary
*
Offline Offline

Activity: 3472
Merit: 1721



View Profile
March 06, 2012, 08:25:18 AM
Last edit: March 06, 2012, 08:37:03 AM by malevolent
 #274



Try mining with the card on windows and open gpu-z which reads temperatures from all sensors. This is how I found why one of my cards was throttling despite <80C shown in bamt and stock voltage.

Signature space available for rent.
Splirow
Full Member
***
Offline Offline

Activity: 164
Merit: 100


View Profile
March 06, 2012, 08:45:00 AM
 #275

as for the gpu thing i mentioned earlier... re-writing bamt onto the usb drive worked...but as i add in a 2nd & or 3rd 5970 a whole new set of problem's occur. urggh.


for everyone running 6 gpu's / 3 5970's... how many watts you pushing them with? with the 800/300/1.05v ?

i cant even seem to get it to like those settings on 0.5...

even with just 2 5970's (4gpus) with 1-1300w psu rosewill lightning, it just seems to lock up... i tried adjusting the boot up time to a more spread 10sec between them...but nothing, just locking up...

doesnt make sense to me right now


I have Bamt with 3 5970. I have them at 825/300/1.05v getting 382 m/hash

It is perfectly stable for me. Last time I checked, i was pulling 930 watts from the wall.
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1079


Gerald Davis


View Profile
March 06, 2012, 01:24:53 PM
 #276

Is there any possibility that a failure on "any" GPU gets reported as GPU0 issue, regardless which gpu it was?  I removed the card i thought was bad... so only had 4 GPU's, but the new gpu0 again started having the same problem.  

So I added the card I thought had issues back into the mix and set every card to stock and it has been running fine.  So before, gpu0 running stock would die, but with all other gpus at stock, everything seems to be fine.... its only been 30 minutes, but thats longer than it used to last, lol.  Just thought I would ask about the gpu0 theory, but maybe they both had problems.... I'll check in the AM to see if it stayed up the rest of the night.

If you are having multi-card failures start at stock.
Run everything at 725/240 (300 also works but it is actually a MH "valley") with 85% fan for 24 hours to check for stability.

From your descriptions it sounds like you have them in a closed case.  Likely that isn't going to work w/ 3x5970s.  I run 3x5970s in open frame with a Ultra Kaze fan placed at the expansion slot to create negative pressure and aid w/ exhaust.  Even then temps are high.  5970s simply run hot and trying to run 3x in a case (no matter how good the case) is simply a recipe for failure.
Intention
Full Member
***
Offline Offline

Activity: 128
Merit: 100


View Profile
March 06, 2012, 01:49:06 PM
 #277

turn mem_speed back on for 0 and 1.   your card probably has a default speed higher than 300 in profile 1, if not 0.  that will prevent you from setting 2 to 300.  higher profile cannot have lower values than lower profile, basic rule of overclocking (some cards don't care, some do).

When I ran it with:

gpu0:

  # remove disabled: or set it to 0 to actually use this card..

  disabled: 0
  debug_oc: 1
  #core_speed_0: 800
  #core_speed_1: 850
  core_speed_2: 900

  mem_speed_0: 300
  mem_speed_1: 300
  mem_speed_2: 300

 #core_voltage_0: 1.125
 #core_voltage_1: 1.125
 #core_voltage_2: 1.125

It is still throwing errors...taking the you cannot have lower values on higher profiles into account I also attempted mem_speed_0:300 1:350 2:400 just to see what the card would do.
Right now I'm just running it stock but if it keeps being stubborn I might just put Windows or something on the USB stick since the stupid poorly designed mobo has the SATA ports blocked by the fan of a videocard...granted this was back when cards were only 1 slot.

I appreciate the suggestions.

YinCoin YangCoin ☯☯First Ever POS/POW Alternator! Multipool! ☯ ☯ http://yinyangpool.com/ 
Free Distribution! https://bitcointalk.org/index.php?topic=623937
jamesg
VIP
Legendary
*
Offline Offline

Activity: 1358
Merit: 1000


AKA: gigavps


View Profile
March 06, 2012, 02:11:29 PM
 #278

To all BAMT users:

If you are asking for support on the forums from lodcrappo, please help him out by sending a donation. It wasn't until very recently that I fully understood just how awesome BAMT is and how much easier it makes MY life.

0.5 has been FLAWLESS for me and I run 89 GPUs and 1 FPGA so if you are having problems, it is most likey NOT BAMT.

Please show lodcrappo your appreciation for his FREE software!  Cheesy

Best,
gigavps
boozer
Sr. Member
****
Offline Offline

Activity: 309
Merit: 250


View Profile
March 06, 2012, 04:09:20 PM
 #279


If you are having multi-card failures start at stock.
Run everything at 725/240 (300 also works but it is actually a MH "valley") with 85% fan for 24 hours to check for stability.

From your descriptions it sounds like you have them in a closed case.  Likely that isn't going to work w/ 3x5970s.  I run 3x5970s in open frame with a Ultra Kaze fan placed at the expansion slot to create negative pressure and aid w/ exhaust.  Even then temps are high.  5970s simply run hot and trying to run 3x in a case (no matter how good the case) is simply a recipe for failure.

Its a wide open rig, similar to the one shown in the "build your own" hardware section of this forum.  I think I ran everything everything at stock for 24 hours, but its been awhile, so I'll go back to that and set mem at 240 and see if stock is stable or not.
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1079


Gerald Davis


View Profile
March 06, 2012, 04:24:29 PM
 #280

Its a wide open rig, similar to the one shown in the "build your own" hardware section of this forum.  I think I ran everything everything at stock for 24 hours, but its been awhile, so I'll go back to that and set mem at 240 and see if stock is stable or not.

If it is stable at stock and not at higher then it is simply excessive overclock.  Lower memclock won't make it more stable but it does use less wattage.  300 is commonly used number but it is actually a very bad memclock.  It is actually slower than both 280 and 310. Smiley

When you find clocks that are stable for 24 hours you likely aren't there yet.  Eventually the rig will crash.  When it does drop the core clock 5 Mhz to 10 Mhz on the affected GPU and reboot.  One by one you will find the stable clocks.  Now the rig may run 15 days or so and crash.  You can either accept that or drop clocks another 5 Mhz or so.  Eventually you will find the speed that runs 24/7 for 90+ days.
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 [14] 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 ... 80 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!