Nancarrow
|
|
September 01, 2012, 07:53:47 PM |
|
My latest problem with BAMT - a rig doesn't mine, I can't putty in, and when I bring up a local terminal and try atitweak -s, it tells me about an 'invalid MIT-MAGIC-COOKIE'. Having waded through pages of forum posts, a temporary fix seems to be to delete ~user/.Xauthority and ~root/.Xauthority and then reboot. But I have to keep going in manually to do this, and it only works til the next hangup maybe half a day later. Could somebody check my understanding of this? I *think* that it basically only happens when I have to power off the system at the switch... then the problem appears when I boot it up again. But it doesn't ALWAYS seem to happen. And it doesn't happen if the system was mining successfully and I power it down the way it should be done, with a shutdown or coldreboot command. I read the following from lodcrappo responding to another guy whose problem also seemed to lie with Xauthority somehow: Here is what happens at boot..
1 - normal linux stuff 2 - x windows config blown away 3 - ati device detection -> generate new x win config 4a - x starts in one thread 4b - mine_start starts in another thread, sleeps for start_delay 5 - X finally gets going 6 - default user logs in automatically to x session 7 - mine_start script runs xauth + to make the X server allow connections
now, in root's .bashrc there is a command: xauth merge /home/user/.Xauthority this runs when you log in as root. it gives root the ability to talk to the X server.
however, if you log in via ssh before step 6 completes, the .Xauthority file isn't there yet and you get no joy. or if step 7 happens before step 5, you'll get no joy.
some USB keys are much, much slower than others. same with cpu, etc. having more GPUs means step 3 can take much longer. some GPUs seem to make step 4a/5 take a really long time if no monitor is attached. there are many variables here. since the process splits in step 4, there isn't a strong tie between them and things can get out of order, or you can just be in a rush and logging in via ssh too soon.
Okay I think I understand all that. And I suppose what might be happening is that 'step 7 happens before step 5'. BUT, a quick perusal of the start_mining script suggests that this *cannot* happen, as mine_start won't run 'xauth +' until it sees the X server is running. Also, why the difference in behaviour between the two different ways of rebooting? Basically how I do stop this shit happening all the time? Or at least shove some extra perl in to detect it and do something clever?
|
If I've said anything amusing and/or informative and you're feeling generous: 1GNJq39NYtf7cn2QFZZuP5vmC1mTs63rEW
|
|
|
abracadabra
|
|
September 01, 2012, 08:55:10 PM |
|
My latest problem with BAMT - a rig doesn't mine, I can't putty in, and when I bring up a local terminal and try atitweak -s, it tells me about an 'invalid MIT-MAGIC-COOKIE'. Having waded through pages of forum posts, a temporary fix seems to be to delete ~user/.Xauthority and ~root/.Xauthority and then reboot. But I have to keep going in manually to do this, and it only works til the next hangup maybe half a day later. Could somebody check my understanding of this? I *think* that it basically only happens when I have to power off the system at the switch... then the problem appears when I boot it up again. But it doesn't ALWAYS seem to happen. And it doesn't happen if the system was mining successfully and I power it down the way it should be done, with a shutdown or coldreboot command. I read the following from lodcrappo responding to another guy whose problem also seemed to lie with Xauthority somehow: Here is what happens at boot..
1 - normal linux stuff 2 - x windows config blown away 3 - ati device detection -> generate new x win config 4a - x starts in one thread 4b - mine_start starts in another thread, sleeps for start_delay 5 - X finally gets going 6 - default user logs in automatically to x session 7 - mine_start script runs xauth + to make the X server allow connections
now, in root's .bashrc there is a command: xauth merge /home/user/.Xauthority this runs when you log in as root. it gives root the ability to talk to the X server.
however, if you log in via ssh before step 6 completes, the .Xauthority file isn't there yet and you get no joy. or if step 7 happens before step 5, you'll get no joy.
some USB keys are much, much slower than others. same with cpu, etc. having more GPUs means step 3 can take much longer. some GPUs seem to make step 4a/5 take a really long time if no monitor is attached. there are many variables here. since the process splits in step 4, there isn't a strong tie between them and things can get out of order, or you can just be in a rush and logging in via ssh too soon.
Okay I think I understand all that. And I suppose what might be happening is that 'step 7 happens before step 5'. BUT, a quick perusal of the start_mining script suggests that this *cannot* happen, as mine_start won't run 'xauth +' until it sees the X server is running. Also, why the difference in behaviour between the two different ways of rebooting? Basically how I do stop this shit happening all the time? Or at least shove some extra perl in to detect it and do something clever? redo the usb key or get a new one. the usb is flaking out.
|
|
|
|
lodcrappo (OP)
|
|
September 02, 2012, 12:56:32 AM |
|
My latest problem with BAMT - a rig doesn't mine, I can't putty in, and when I bring up a local terminal and try atitweak -s, it tells me about an 'invalid MIT-MAGIC-COOKIE'. Having waded through pages of forum posts, a temporary fix seems to be to delete ~user/.Xauthority and ~root/.Xauthority and then reboot. But I have to keep going in manually to do this, and it only works til the next hangup maybe half a day later. Could somebody check my understanding of this? I *think* that it basically only happens when I have to power off the system at the switch... then the problem appears when I boot it up again. But it doesn't ALWAYS seem to happen. And it doesn't happen if the system was mining successfully and I power it down the way it should be done, with a shutdown or coldreboot command. I read the following from lodcrappo responding to another guy whose problem also seemed to lie with Xauthority somehow: Here is what happens at boot..
1 - normal linux stuff 2 - x windows config blown away 3 - ati device detection -> generate new x win config 4a - x starts in one thread 4b - mine_start starts in another thread, sleeps for start_delay 5 - X finally gets going 6 - default user logs in automatically to x session 7 - mine_start script runs xauth + to make the X server allow connections
now, in root's .bashrc there is a command: xauth merge /home/user/.Xauthority this runs when you log in as root. it gives root the ability to talk to the X server.
however, if you log in via ssh before step 6 completes, the .Xauthority file isn't there yet and you get no joy. or if step 7 happens before step 5, you'll get no joy.
some USB keys are much, much slower than others. same with cpu, etc. having more GPUs means step 3 can take much longer. some GPUs seem to make step 4a/5 take a really long time if no monitor is attached. there are many variables here. since the process splits in step 4, there isn't a strong tie between them and things can get out of order, or you can just be in a rush and logging in via ssh too soon.
Okay I think I understand all that. And I suppose what might be happening is that 'step 7 happens before step 5'. BUT, a quick perusal of the start_mining script suggests that this *cannot* happen, as mine_start won't run 'xauth +' until it sees the X server is running. Also, why the difference in behaviour between the two different ways of rebooting? Basically how I do stop this shit happening all the time? Or at least shove some extra perl in to detect it and do something clever? I think the real question here is "why is your machine locking up in the first place?" BAMT rigs should run for many months without any issue. If yours doesn't, there is a problem. Most likely overzealous overclocking. If you're getting a few days between lockups, you probably only need to drop 5 or 10 mhz on the engine clock to resolve. As for the weirdness you are seeing... i suspect the entire machine is locking in some weird way that's corrupting the usb key. im not sure what else would explain that it never happens with a safe reboot, only on lockup/power off reboots. seriously though, once you've got a machine thats locking up regularly, all bets are off and it's going to be really difficult to trace anything down to a sensible solution. highly recommend you put your efforts into preventing lockups in the first place. chances are the Xauth thing is just a symptom of that.
|
|
|
|
Nancarrow
|
|
September 02, 2012, 01:56:08 AM |
|
Given the frequent and varied nature of the problems my rigs have been having, I'd be happy to accept 'overclocking' as the issue, but I sure hope it's not that, because for the vast majority of the time I have been UNDERclocking and UNDERvolting my cards. At one point I even had them running 600 engine, 200 memory, and full 1.05V, and they still kept screwing up. And GPU temps have generally been kept below 75C, only occasionally creeped to 82.
Now I'm running them at 800/300/1.05, and they're not locking up any *more* than they were on much lower speeds. It's incredibly frustrating.
However, I shall trying reflashing the USB stick (which is also something I've done two or three times for each rig already!). It's actually just one rig that's being a pain the last few days so maybe that'll do it, the other two seem to be behaving better.
|
If I've said anything amusing and/or informative and you're feeling generous: 1GNJq39NYtf7cn2QFZZuP5vmC1mTs63rEW
|
|
|
lodcrappo (OP)
|
|
September 02, 2012, 02:00:45 AM |
|
Given the frequent and varied nature of the problems my rigs have been having, I'd be happy to accept 'overclocking' as the issue, but I sure hope it's not that, because for the vast majority of the time I have been UNDERclocking and UNDERvolting my cards. At one point I even had them running 600 engine, 200 memory, and full 1.05V, and they still kept screwing up. And GPU temps have generally been kept below 75C, only occasionally creeped to 82.
Now I'm running them at 800/300/1.05, and they're not locking up any *more* than they were on much lower speeds. It's incredibly frustrating.
However, I shall trying reflashing the USB stick (which is also something I've done two or three times for each rig already!). It's actually just one rig that's being a pain the last few days so maybe that'll do it, the other two seem to be behaving better.
underclocking/undervolting will cause instability issues just as much as overclocking. screwing with your GPU == potential instability.
|
|
|
|
pehoko
|
|
September 02, 2012, 09:32:19 PM |
|
Test your clear formated flash memory with software for read/write errors. I had the same problem and now I am with new drive and waiting another for the same problem. In my opinion maybe 2-4 months is normal long life for cheap hardly used bamt flash drive.
|
|
|
|
Nancarrow
|
|
September 03, 2012, 02:38:10 AM Last edit: September 03, 2012, 02:50:18 AM by Nancarrow |
|
underclocking/undervolting will cause instability issues just as much as overclocking. screwing with your GPU == potential instability.
Well, bollocks. Okay, as and when each rig next screws up, I will revert their gpus back to 725 engine and 1.05 voltage. But can I at least keep memclock at 300? Because the stock 1000 really is a stupid setting for a miner. Thanks pehoko for recommending testing the USB sticks. That's certainly another avenue of pain. In unrelated news, one of my other, better-behaved rigs has been down seven hours (why so long? Because I was ASLEEP, a state we humans often need to be in when the big light in the sky goes away) because all of a sudden it just COULDN'T DETECT ANY NETWORKS ANYMORE. Soon I'm just going to take these rigs to the vet and ask her to put them down humanely. (ETA and my first draft of this post got bollixed because my router conked out. Not my old crappy router, the shiny new wireless-N one I got to replace it LAST WEEK, the one that cost 60 quid. I swear my whole house is just one big fucking Ancient Indian Burial Ground.)
|
If I've said anything amusing and/or informative and you're feeling generous: 1GNJq39NYtf7cn2QFZZuP5vmC1mTs63rEW
|
|
|
abracadabra
|
|
September 03, 2012, 02:55:02 AM |
|
underclocking/undervolting will cause instability issues just as much as overclocking. screwing with your GPU == potential instability.
In unrelated news, one of my other, better-behaved rigs has been down seven hours (why so long? Because I was ASLEEP, a state we humans often need to be in when the big light in the sky goes away) because all of a sudden it just COULDN'T DETECT ANY NETWORKS ANYMORE. Soon I'm just going to take these rigs to the vet and ask her to put them down humanely. That's a well known problem, most likely caused by network-manager problems. I've removed network-manager from the rigs I have that cause me that problem.
|
|
|
|
Nancarrow
|
|
September 03, 2012, 03:26:54 AM |
|
That's a well known problem, most likely caused by network-manager problems. I've removed network-manager from the rigs I have that cause me that problem.
Nuh-uh. In another thread I was advised to replace network-manager with wicd. I did.My rigs just hate me.
|
If I've said anything amusing and/or informative and you're feeling generous: 1GNJq39NYtf7cn2QFZZuP5vmC1mTs63rEW
|
|
|
lodcrappo (OP)
|
|
September 03, 2012, 03:34:44 PM |
|
underclocking/undervolting will cause instability issues just as much as overclocking. screwing with your GPU == potential instability.
Well, bollocks. Okay, as and when each rig next screws up, I will revert their gpus back to 725 engine and 1.05 voltage. But can I at least keep memclock at 300? Because the stock 1000 really is a stupid setting for a miner. It is a matter of degrees, not absolutes. If your GPU locks up in a matter of minutes, you're probably a long way from stable. If it takes hours, maybe 10Mhz. If it takes days, maybe another 5 or 10. Some GPUs will run for a couple weeks or so but eventually lock up at XXXMhz, but be stable for months and months at XXX-5 Mhz. To make it even more fun, two identical GPUs (same manufacture/model/etc) will find happiness at different clock and/or mem speeds. You also have factors such as one GPU effecting the behavior of another in the same rig. It can take a great deal of experimentation to squeeze that last little 2-3% out of a rig. Rarely worth it. Some people get lucky, a lot of people brag about mh/s speeds without waiting a month to see if they are actually stable, and even more people waste their time and money chasing the dream.
|
|
|
|
tnkflx
|
|
September 03, 2012, 04:31:03 PM |
|
Lodcrappo,
Once a machine stops responding to cgsnoop, mgpumon 'remembers' the last speed that rig had and still uses it to calculate the total speed of the farm. Is there a specific reasoning behind this?
|
| Operating electrum.be & us.electrum.be |
|
|
|
lodcrappo (OP)
|
|
September 03, 2012, 04:32:14 PM |
|
Lodcrappo,
Once a machine stops responding to cgsnoop, mgpumon 'remembers' the last speed that rig had and still uses it to calculate the total speed of the farm. Is there a specific reasoning behind this?
not really, just lazy programming
|
|
|
|
mameise
|
|
September 04, 2012, 07:53:05 PM |
|
Hi together,
i am now using BAMT for my little miner at home. First of all: Its great and really easy to use!! Thanks for that. But one little problem: I have 4 6870 cards. Till now i had windows running and i clocked the cards to 940mhz and 340mem using afterburner. i have set these settings in the bamt.conf. overclocking works, but the mem will not change. when i enter 340 or 550 or anything, it stays at 1050 when i look into the webstats.... so what could be the problem?
sorry if somebody asked it before, but as you see, my english is not the best, and there are too many sites to read them all. i do not know its because of that, but till now i had 300mhs with each card. the fans didnt run on full speed, and the temps where about 60-78 degrees. but now i have only 290mhs and fans run on full speed but the cards often get 80 or more degrees celsius. so something must be different now...
i hope somebody can help me. thanks and regards mameise
|
|
|
|
TheHarbinger
Sr. Member
Offline
Activity: 378
Merit: 250
Why is it so damn hot in here?
|
|
September 04, 2012, 08:03:54 PM |
|
Hi together,
i am now using BAMT for my little miner at home. First of all: Its great and really easy to use!! Thanks for that. But one little problem: I have 4 6870 cards. Till now i had windows running and i clocked the cards to 940mhz and 340mem using afterburner. i have set these settings in the bamt.conf. overclocking works, but the mem will not change. when i enter 340 or 550 or anything, it stays at 1050 when i look into the webstats.... so what could be the problem?
sorry if somebody asked it before, but as you see, my english is not the best, and there are too many sites to read them all. i do not know its because of that, but till now i had 300mhs with each card. the fans didnt run on full speed, and the temps where about 60-78 degrees. but now i have only 290mhs and fans run on full speed but the cards often get 80 or more degrees celsius. so something must be different now...
i hope somebody can help me. thanks and regards mameise
6xxx series cards are limited to a memclock speed that is 100Mhz less then the GPU clock unless you flash a hacked bois onto it. The voltage is also locked. Afterburner uses a hack to change the memory speed, and the voltage. It's a hack that is only available on windows. BAMT is a linux OS. So, you have 3 choices, use windows, use BAMT with higher memory speeds, or flash a new BIOS to the card.
|
12Um6jfDE7q6crm1s6tSksMvda8s1hZ3Vj
|
|
|
philips
|
|
September 04, 2012, 08:14:23 PM |
|
You also have factors such as one GPU effecting the behavior of another in the same rig
How? 6xxx series cards are limited to a memclock speed that is 100Mhz less then the GPU clock Not 125?
|
|
|
|
lodcrappo (OP)
|
|
September 04, 2012, 08:24:06 PM |
|
You also have factors such as one GPU effecting the behavior of another in the same rig
How? Dozens of ways. Beyond the obvious thermal interaction, all your GPUs share the same bus. GPU 0 might run fine at 900Mhz itself, but in doing so it holds the bus just a tiny bit too long or pushes the PCIX controller chip just a little too far in one way or another, and the next GPU down the line now has less time/a weird state/god knows what. As a very real example that took me weeks to sort out: for my test lab I have 5830s, 6870s, 6950s and a 7970 (thanks donors). If I run the 5830s by themselves, they will mine stable at 980/300 for weeks. However, if I put in a 6870 and o/c it to 940, *one of the 5830s will lock up*. The 6870 never locks up itself. Yet if I drop the 6870 to 920.. stable for weeks again. When you over/underclock you push the hardware out of its normal tolerance, plain and simple. This can cause that particular hardware to malfunction, or it can cause effects to other things that communicate with that hardware.
|
|
|
|
philips
|
|
September 04, 2012, 08:35:23 PM |
|
Thanks lodcrappo, that is good to know. Have another question: how big the new BAMT will be?
Why I am asking this, the old BAMT that was meant for at least a 2GB stick was in fact too big for some 2gb sticks, which were in fact 1.8GB. I always had to shrink the image prior to load it on my stick. So maybe you could make the image a biiiit smaller? Something like 1.7GB or eventually 3.7GB whatever is the case.
|
|
|
|
mameise
|
|
September 04, 2012, 08:37:22 PM |
|
6xxx series cards are limited to a memclock speed that is 100Mhz less then the GPU clock unless you flash a hacked bois onto it. The voltage is also locked. Afterburner uses a hack to change the memory speed, and the voltage. It's a hack that is only available on windows. BAMT is a linux OS.
So, you have 3 choices, use windows, use BAMT with higher memory speeds, or flash a new BIOS to the card.
thank you for that fast answer. Is there a howto for flashing bios? never did that before. windows is not so comfortable to use. the nice thing on bamt is that it just starts mining when the pc starts. so it makes it really simple.
|
|
|
|
lodcrappo (OP)
|
|
September 04, 2012, 08:42:39 PM |
|
Thanks lodcrappo, that is good to know. Have another question: how big the new BAMT will be?
Why I am asking this, the old BAMT that was meant for at least a 2GB stick was in fact too big for some 2gb sticks, which were in fact 1.8GB. I always had to shrink the image prior to load it on my stick. So maybe you could make the image a biiiit smaller? Something like 1.7GB or eventually 3.7GB whatever is the case.
bamt 0.6 has much smaller requirements and dynamically resizes the persistence partition to the size of the key. It may work on 512MB key, it will definitely work on 1GB keys.
|
|
|
|
philips
|
|
September 04, 2012, 08:50:16 PM |
|
...dynamically resizes the persistence partition to the size of the key. It may work on 512MB key, it will definitely work on 1GB keys.
Excellent!
|
|
|
|
|