Bitcoin Forum
April 26, 2024, 07:19:29 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 [45] 46 47 48 49 50 51 52 53 54 »
  Print  
Author Topic: BAMT - Easy persistent USB key based linux for dedicated miners/mining farms  (Read 167434 times)
lodcrappo (OP)
Hero Member
*****
Offline Offline

Activity: 616
Merit: 506


View Profile
February 18, 2012, 07:58:57 AM
Last edit: February 18, 2012, 03:58:06 PM by lodcrappo
 #881

Fix 30 for 0.4

mother is a script that runs every 60 seconds in BAMT.  it sends the status broadcasts, checks GPU health and sends email alerts, makes sure munin is sane, etc.

this fix adds a new capability to mother.  she will look for hung/locked up GPUs.  these happen when you get a bit too crazy with the overclocking (and I know you do, so don't even try to pretend you don't Smiley.  they can cause the rig to become unresponsive and can even take out mining on the other GPUs.

if mother finds a problem, she will disable o/c on that GPU and coldreboot the rig.  a bit harsh, but effective.  the rig will restart and the GPU will at least be mining again (and you will be able to get into the rig again).

field testing by several helpful volunteers has shown this to be a very effective way to keep mining while experimenting with increased clocked rates, especially for remote machines.

this feature will be enabled by default once fix 30 is installed.  if you do not want mother to do this, add:

  detect_defunct: 0

to the settings section of your bamt.conf.

When a problematic OC is detected a file noOCGPU# is created in /live/image/BAMT/CONTROL/ACTIVE/ . 

Reduce your clock and delete this file with rm /live/image/BAMT/CONTROL/ACTIVE/noOCGPU1 to resume overclocking

1714159169
Hero Member
*
Offline Offline

Posts: 1714159169

View Profile Personal Message (Offline)

Ignore
1714159169
Reply with quote  #2

1714159169
Report to moderator
You can see the statistics of your reports to moderators on the "Report to moderator" pages.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
godofal
Full Member
***
Offline Offline

Activity: 160
Merit: 100

TACNAYN - destroyer of worlds


View Profile
February 18, 2012, 09:05:49 AM
 #882

so i have a pretty big problem with BAMT Grin
the hardware: 2x HD5970, 4GB balistix ram, C2D E8400 all smacked together on a ASUS P5B deluxe
and some old 4GB USB stick
(yes, the hardware is overkill, i had it lying around)

i know for a fact that the mobo, ram, CPU and GPU's are OK, the ram went through memtest's, the CPU and mobo have both ran for ~2 years as my main PC, and the GPU's both have produced solid mining (~3 days straight)

so here's the problem:
the whole thing is unstable and simply refuses to coöperate
when we first tried it, we just failed becouse we didnt have the know-how (we still don't, but i'm sure its no longer a rookie mistake)
then we had the whole thing running for a day (without any updates) and then it crashed (giving some nice R.I.P. artwork, and an error saying that the horrible horrible error couldnt be written to any logs whatsoever)
after that, it gave that error a couple more times, before i figured out how to update BAMT
when i did that, it still didnt work, so i tried again, doing nothing different to my knowledge, and somehow it started to mine; 3 days without a single hiccup
so we decided to OC the cards for that little bit of extra hashing power, to 800/300 (core/ram)
this also mined solid, for a day or two
after that, things went south
i've tried doing a clean install, but for some reason whenever it boots now (multiple fresh installs) it does everything fine untill i change the configs (no OC)
after that: it doesnt matter if it's connected to the internet or not (to check if it was the mining) it just freezes up completely after ~10 seconds
the first ~10 seconds i can do whatever i want, and the thing responds like it should; GPUmon opens up, i can edit configs (but not really, since theres no time to change anything) and everything. after that; the screen simply goes black/grey without a pointer or anything
putty and the browser monitor thingie dont respond to anything either

i should note that we tried installing windows on a HDD on the same machine before, but that didnt work (HDD's weren't seen, couldnt be installed to, couldnt be booted from, etc etc)
its just so erratic and random that i have absolutely no idea what's going on, and since i have no previous experience with linux whatsoever, i also have no idea how to fix whatever might be broken
but something tells me that this isnt normal, that there's something else going on with this thing, since windows failed to work too

is there anyone with ideas on how to proceed? ideas to fix it, try something we've forgotten, or a pointer into the right direction would be most apreciated

p.s.
i fiddled around in the bios, several occasions, restored it to defaults, checked for random stuff that might be causing problems or anything
nothing there as far as i can see

lodcrappo (OP)
Hero Member
*****
Offline Offline

Activity: 616
Merit: 506


View Profile
February 18, 2012, 09:09:48 AM
 #883

so i have a pretty big problem with BAMT Grin
the hardware: 2x HD5970, 4GB balistix ram, C2D E8400 all smacked together on a ASUS P5B deluxe
and some old 4GB USB stick
(yes, the hardware is overkill, i had it lying around)

i know for a fact that the mobo, ram, CPU and GPU's are OK, the ram went through memtest's, the CPU and mobo have both ran for ~2 years as my main PC, and the GPU's both have produced solid mining (~3 days straight)

so here's the problem:
the whole thing is unstable and simply refuses to coöperate
when we first tried it, we just failed becouse we didnt have the know-how (we still don't, but i'm sure its no longer a rookie mistake)
then we had the whole thing running for a day (without any updates) and then it crashed (giving some nice R.I.P. artwork, and an error saying that the horrible horrible error couldnt be written to any logs whatsoever)
after that, it gave that error a couple more times, before i figured out how to update BAMT
when i did that, it still didnt work, so i tried again, doing nothing different to my knowledge, and somehow it started to mine; 3 days without a single hiccup
so we decided to OC the cards for that little bit of extra hashing power, to 800/300 (core/ram)
this also mined solid, for a day or two
after that, things went south
i've tried doing a clean install, but for some reason whenever it boots now (multiple fresh installs) it does everything fine untill i change the configs (no OC)
after that: it doesnt matter if it's connected to the internet or not (to check if it was the mining) it just freezes up completely after ~10 seconds
the first ~10 seconds i can do whatever i want, and the thing responds like it should; GPUmon opens up, i can edit configs (but not really, since theres no time to change anything) and everything. after that; the screen simply goes black/grey without a pointer or anything
putty and the browser monitor thingie dont respond to anything either

i should note that we tried installing windows on a HDD on the same machine before, but that didnt work (HDD's weren't seen, couldnt be installed to, couldnt be booted from, etc etc)
its just so erratic and random that i have absolutely no idea what's going on, and since i have no previous experience with linux whatsoever, i also have no idea how to fix whatever might be broken
but something tells me that this isnt normal, that there's something else going on with this thing, since windows failed to work too

is there anyone with ideas on how to proceed? ideas to fix it, try something we've forgotten, or a pointer into the right direction would be most apreciated

p.s.
i fiddled around in the bios, several occasions, restored it to defaults, checked for random stuff that might be causing problems or anything
nothing there as far as i can see

you do have a problem, but i'm sorry it's not anything to do with bamt.

all i can say is remove/replace hardware until you find which part is causing this.  as you've said, your machine works fine for a day or 3 and then without changing anything stops working.  that's broken hardware.

Transisto
Donator
Legendary
*
Offline Offline

Activity: 1731
Merit: 1008



View Profile WWW
February 18, 2012, 09:13:48 AM
 #884

so i have a pretty big problem with BAMT Grin
the hardware: 2x HD5970, 4GB balistix ram, C2D E8400 all smacked together on a ASUS P5B deluxe
and some old 4GB USB stick
(yes, the hardware is overkill, i had it lying around)

i know for a fact that the mobo, ram, CPU and GPU's are OK, the ram went through memtest's, the CPU and mobo have both ran for ~2 years as my main PC,(WITHOUT HDD ?)  and the GPU's both have produced solid mining (~3 days straight)

so here's the problem:
the whole thing is unstable and simply refuses to coöperate
when we first tried it, we just failed becouse we didnt have the know-how (we still don't, but i'm sure its no longer a rookie mistake)

then we had the whole thing running for a day (without any updates) and then it crashed (giving some nice R.I.P. artwork, Huh ARTIFACT Huh and an error saying that the horrible horrible error couldnt be written to any logs whatsoever) = (YOUR USB STICK LOOK DEAD)


i should note that we tried installing windows on a HDD on the same machine before, but that didnt work (HDD's weren't seen, couldnt be installed to, couldnt be booted from, etc etc)
its just so erratic and random that i have absolutely no idea what's going on, and since i have no previous experience with linux whatsoever, i also have no idea how to fix whatever might be broken
but something tells me that this isnt normal, that there's something else going on with this thing, since windows failed to work too

is there anyone with ideas on how to proceed? ideas to fix it, try something we've forgotten, or a pointer into the right direction would be most apreciated

p.s.
i fiddled around in the bios, several occasions, restored it to defaults, checked for random stuff that might be causing problems or anything
nothing there as far as i can see
This has little to do with BAMT or linux
If you can't boot windows, you most likely have an hardware failure.  Could be you 5970, ... If you can't even detect HDD I'd trash this board right away.
max in montreal
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


View Profile
February 18, 2012, 05:45:16 PM
 #885

first thing i cnanger in the config file is the time it takes to start mining after a reboot, I change it to 120 seconds...this way if i go too far on the numbers, i have time to change it and save the settings before it starts to mine and freezes again.

my 5970's were acting the same way...but its because i changed the fans, and should have changed the heat pads too...i think thaty is still a problem, and waiting for the parts to come in...

change the start to mine time to 120 sec...it will make your life easier...
godofal
Full Member
***
Offline Offline

Activity: 160
Merit: 100

TACNAYN - destroyer of worlds


View Profile
February 18, 2012, 08:10:34 PM
 #886

i need to clear up a few things:
there was only 1 HDD it couldnt see, i had 1 other lying around to test, wich it díd see (windows) but couldnt install too (just said that as an error, no specifics whatsoever)
the mobo, ram and CPU also ran for years as my main PC, with HDD's and windows
the RIP artwork werent artifacts; it was in the command window, after giving the restart mining command. it were lines that made a gravestone with RIP in it; it was a gravestone by design, not an error
also, the USB stick isnt the problem, i used a brand new one at first, and switched to an older one later when i found that one lying around, wich gave the same errors
windows díd boot; i tried installing from USB (ripped the iso myself) and it worked untill you needed to select a HDD
the whole thing ran for 3 days without a problem, and then i stopped it myself in order to OC it, it didnt crash that time and it might have continued to work without a problem (can't know for sure though)

if its a hardware problem, then i'd say its the motherboard, but im pretty sure its good since ive used it for so long. it might have been damaged during transport though, so i'l try using another one

@max
nice tip, but it won't work i'm afraid
like i said; i tried pulling out the internet cable so it couldnt start mining (no work from pool) but it still goes black

BinaryMage
Hero Member
*****
Offline Offline

Activity: 560
Merit: 500


Ad astra.


View Profile
February 18, 2012, 08:32:40 PM
 #887

i need to clear up a few things:
there was only 1 HDD it couldnt see, i had 1 other lying around to test, wich it díd see (windows) but couldnt install too (just said that as an error, no specifics whatsoever)
the mobo, ram and CPU also ran for years as my main PC, with HDD's and windows
the RIP artwork werent artifacts; it was in the command window, after giving the restart mining command. it were lines that made a gravestone with RIP in it; it was a gravestone by design, not an error
also, the USB stick isnt the problem, i used a brand new one at first, and switched to an older one later when i found that one lying around, wich gave the same errors
windows díd boot; i tried installing from USB (ripped the iso myself) and it worked untill you needed to select a HDD
the whole thing ran for 3 days without a problem, and then i stopped it myself in order to OC it, it didnt crash that time and it might have continued to work without a problem (can't know for sure though)

if its a hardware problem, then i'd say its the motherboard, but im pretty sure its good since ive used it for so long. it might have been damaged during transport though, so i'l try using another one

@max
nice tip, but it won't work i'm afraid
like i said; i tried pulling out the internet cable so it couldnt start mining (no work from pool) but it still goes black

Download a Ubuntu Live USB image, load it on the USB stick, see if that boots alright. If it does, we know some BAMT configuration is the issue. If not, you have a hardware problem.

-- BinaryMage -- | OTC | PGP
lodcrappo (OP)
Hero Member
*****
Offline Offline

Activity: 616
Merit: 506


View Profile
February 18, 2012, 08:57:16 PM
 #888

i need to clear up a few things:
there was only 1 HDD it couldnt see, i had 1 other lying around to test, wich it díd see (windows) but couldnt install too (just said that as an error, no specifics whatsoever)
the mobo, ram and CPU also ran for years as my main PC, with HDD's and windows
the RIP artwork werent artifacts; it was in the command window, after giving the restart mining command. it were lines that made a gravestone with RIP in it; it was a gravestone by design, not an error
also, the USB stick isnt the problem, i used a brand new one at first, and switched to an older one later when i found that one lying around, wich gave the same errors
windows díd boot; i tried installing from USB (ripped the iso myself) and it worked untill you needed to select a HDD
the whole thing ran for 3 days without a problem, and then i stopped it myself in order to OC it, it didnt crash that time and it might have continued to work without a problem (can't know for sure though)

if its a hardware problem, then i'd say its the motherboard, but im pretty sure its good since ive used it for so long. it might have been damaged during transport though, so i'l try using another one

@max
nice tip, but it won't work i'm afraid
like i said; i tried pulling out the internet cable so it couldnt start mining (no work from pool) but it still goes black

Download a Ubuntu Live USB image, load it on the USB stick, see if that boots alright. If it does, we know some BAMT configuration is the issue. If not, you have a hardware problem.

actually, this is only a partial test.  ubuntu live isn't going to load the ATI proprietary drivers afaik, so the GPUs won't be doing the same things at all as if they were init by the ATI driver.  they will be in some compatible vesa mode or maybe the oss ati driver, which is a whole different animal.   if it still crashes, just more confirmation that the hardware is wack, but if it runs really doesn't mean much.

the fact is that bamt runs fine on hundreds of machines, and its just superficial changes from debian which runs fine on millions of machines.  this is not a software problem.
BinaryMage
Hero Member
*****
Offline Offline

Activity: 560
Merit: 500


Ad astra.


View Profile
February 18, 2012, 11:22:29 PM
 #889

actually, this is only a partial test.  ubuntu live isn't going to load the ATI proprietary drivers afaik, so the GPUs won't be doing the same things at all as if they were init by the ATI driver.  they will be in some compatible vesa mode or maybe the oss ati driver, which is a whole different animal.   if it still crashes, just more confirmation that the hardware is wack, but if it runs really doesn't mean much.

the fact is that bamt runs fine on hundreds of machines, and its just superficial changes from debian which runs fine on millions of machines.  this is not a software problem.


True; I think it runs VESA. I was just trying to convince him that it wasn't a BAMT issue.  Grin

-- BinaryMage -- | OTC | PGP
godofal
Full Member
***
Offline Offline

Activity: 160
Merit: 100

TACNAYN - destroyer of worlds


View Profile
February 18, 2012, 11:30:03 PM
 #890

well, if you guys say its not bamt, i believe you Wink
like i said; ive never worked with linux before and altough i know its really good ive always felt like it takes too much knowledge and configurations. i kinda thought it might just be linux acting up with this mobo or something
that, and me being pretty sure the hardware checks out
il check out a different mobo when i got time, but since itl be alot of work im going to do that tomorow or something (gotta take it out of another miner, and the miner we're talking about is watercooled so changing hardware isnt really all that easy)

max in montreal
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


View Profile
February 19, 2012, 02:55:54 AM
 #891

Quote
@max
nice tip, but it won't work i'm afraid
like i said; i tried pulling out the internet cable so it couldnt start mining (no work from pool) but it still goes black

I believe it sets the clocks before is starts mining...

the default clocks are way way too high for a 5970...

I would redo the image.
pull out one card
disable gpu1
set the clock for gpu 0 and one to 700/300
set the start mining time to 120 sec
set up your pool file

on the windows machine find the bamt image and make a copy and rename it to mine
pop the usb back into your windows machine and start up the imaging tool you used to image your usb key...but now in the tool browse to the copy you called mine...with the usb driveand click on read...read means copy the usb image to mine.img...now you have your own image ready in case you need to reimage...

next pop it back in the miner, make sure only one 5970 is in there...start it up with the cards disabled. does it boot?

do the bamt fixer, get bamt up to date. reboot

still boots?


enable gpu 0 reboot

boots? mines? if it stars mining enable gpu1.

reboot, if it mines add other card ad the 2 other gpus in the config, disable gpu2, disable gpu 3...

reboot...does it boot?

enable gpu2 reboot...do they all mine...enable the last gpu....reboot...does it all mine?

Splirow
Full Member
***
Offline Offline

Activity: 164
Merit: 100


View Profile
February 19, 2012, 05:50:18 AM
 #892

Max,

Few questions.... I have 3 5970 on 1 rig running bamt. It mines normally, however, the system is very slow. ( mouse drags.....etc)

are you going through he same thing? Is it normal?

I have them over clocked at 820 with mem 500. Aggression is 13

What is your settings?
BinaryMage
Hero Member
*****
Offline Offline

Activity: 560
Merit: 500


Ad astra.


View Profile
February 19, 2012, 06:52:05 AM
 #893

Max,

Few questions.... I have 3 5970 on 1 rig running bamt. It mines normally, however, the system is very slow. ( mouse drags.....etc)

are you going through he same thing? Is it normal?

I have them over clocked at 820 with mem 500. Aggression is 13

What is your settings?


That's perfectly normal.  Wink

The videocards are just being heavily used by the mining process, which is as it should be. I would suggest connecting and controlling the rig via SSH. If you must control with an attached monitor, turn the aggression down to ~7 on the video-outputting GPU; it'll speed up your desktop.

-- BinaryMage -- | OTC | PGP
Splirow
Full Member
***
Offline Offline

Activity: 164
Merit: 100


View Profile
February 19, 2012, 06:58:55 AM
 #894

Max,

Few questions.... I have 3 5970 on 1 rig running bamt. It mines normally, however, the system is very slow. ( mouse drags.....etc)

are you going through he same thing? Is it normal?

I have them over clocked at 820 with mem 500. Aggression is 13

What is your settings?


That's perfectly normal.  Wink

The videocards are just being heavily used by the mining process, which is as it should be. I would suggest connecting and controlling the rig via SSH. If you must control with an attached monitor, turn the aggression down to ~7 on the video-outputting GPU; it'll speed up your desktop.

Thanks....

What settings you have on 5970?
BinaryMage
Hero Member
*****
Offline Offline

Activity: 560
Merit: 500


Ad astra.


View Profile
February 19, 2012, 07:37:21 AM
 #895

Thanks....

What settings you have on 5970?

I run my 5870s at 1000/300, but you couldn't hit that on a 5970 without phase-change or liquid nitrogen.

820 is fine for core, you might be able to get it to 850. Take your memory down to 300. (Unless that crashes it; it shouldn't)

-- BinaryMage -- | OTC | PGP
malevolent
can into space
Legendary
*
Offline Offline

Activity: 3472
Merit: 1721



View Profile
February 19, 2012, 10:30:10 AM
 #896

Max,

Few questions.... I have 3 5970 on 1 rig running bamt. It mines normally, however, the system is very slow. ( mouse drags.....etc)

are you going through he same thing? Is it normal?

I have them over clocked at 820 with mem 500. Aggression is 13

What is your settings?


It's normal, especially if you cpu usage and/or agression level is high.

Signature space available for rent.
max in montreal
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


View Profile
February 19, 2012, 02:19:00 PM
 #897

for my 5970 and the clocks set to 800/300 i am getting about 365mh/s

the aggression is whatever bamt originally was, i have not changed its default numbers.

i do not ssh i control the machine by sitting in front of it, i have kvm switches that i use...

I hope i do not sound like an expert, these are just my settings through simple trial and error...and other advice posted on this forum
Splirow
Full Member
***
Offline Offline

Activity: 164
Merit: 100


View Profile
February 19, 2012, 03:13:31 PM
 #898

for my 5970 and the clocks set to 800/300 i am getting about 365mh/s

the aggression is whatever bamt originally was, i have not changed its default numbers.

i do not ssh i control the machine by sitting in front of it, i have kvm switches that i use...

I hope i do not sound like an expert, these are just my settings through simple trial and error...and other advice posted on this forum

Which kernel are u using?

Are you using Phoenix?

How many 5970 does your rig have?

Thanks Max
max in montreal
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


View Profile
February 19, 2012, 04:16:40 PM
 #899

i know nothing about kernals...just the latest and greatest bamt updates always installed. I had 2 5970 going at one point, but they are waiting to be repaired, so there is only one for now...
BitMinerN8
Hero Member
*****
Offline Offline

Activity: 626
Merit: 500


Mining since May 2011.


View Profile
February 19, 2012, 07:01:11 PM
 #900

Fix 30 for 0.4

mother is a script that runs every 60 seconds in BAMT.  it sends the status broadcasts, checks GPU health and sends email alerts, makes sure munin is sane, etc.

this fix adds a new capability to mother.  she will look for hung/locked up GPUs.  these happen when you get a bit too crazy with the overclocking (and I know you do, so don't even try to pretend you don't Smiley.  they can cause the rig to become unresponsive and can even take out mining on the other GPUs.

if mother finds a problem, she will disable o/c on that GPU and coldreboot the rig.  a bit harsh, but effective.  the rig will restart and the GPU will at least be mining again (and you will be able to get into the rig again).

field testing by several helpful volunteers has shown this to be a very effective way to keep mining while experimenting with increased clocked rates, especially for remote machines.

this feature will be enabled by default once fix 30 is installed.  if you do not want mother to do this, add:

  detect_defunct: 0

to the settings section of your bamt.conf.

When a problematic OC is detected a file noOCGPU# is created in /live/image/BAMT/CONTROL/ACTIVE/ . 

Reduce your clock and delete this file with rm /live/image/BAMT/CONTROL/ACTIVE/noOCGPU1 to resume overclocking


This works very well, I like how it just drops the problematic GPU. This is great for tuning a new multi-GPU rig.
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 [45] 46 47 48 49 50 51 52 53 54 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!