bulanula
|
|
May 22, 2012, 06:41:38 PM |
|
Might be useful if somebody could setup a mirror for those 2 files for older cards. I understand AMD pulling the files because they don't want to support old products and software so it might happen soon. Thanks !
|
|
|
|
bulanula
|
|
May 22, 2012, 11:19:34 PM |
|
Also ( I cannot edit ) I seemed to notice a very strange behaviour with cgminer 2.4.1 and Linux and autofan / autogpu enabled.
When heat builds up around my rigs they do start to go to max set fan value ( say 80% ) and if this is then further accompanied by more heat so that the cards bypass the hysteresis limit ( but they are very far from overheat limit ) they throttle down on the clocks.
I observe that when the source of heat is shutdown and the rigs left to cool the following seems NOT to occur which is quite strange : the cards do not maximize the core clock back up and also the fans spin at maximum set instead of backing down now the rigs are cooler.
Anyone experience the same and know how to fix it ? The cards' clocks not going back up after they have throttled once I can understand but the fans keeping to the high RPMs instead of low RPM after the heat has passed is strangest. The fans were going back down on 2.3.6 if I remember correctly after the heat passed but this seems to be "stuck" on the 2.4.1 so that even after the heat has passed the fans stay at the maximum RPMs they reached during the heat period.
Thanks for the work, ckolivas ! I would appreciate any suggestions about what could be causing this behaviour.
|
|
|
|
bulanula
|
|
May 22, 2012, 11:30:20 PM |
|
Upon further investigation it looks like the fans do throttle down but only by a small amount.
I intervened manually once to change some settings so maybe that is the cause ?
The throttling behaviour still occurs so maybe it is a feature to let you know the cards can't take the heat ?
Once they throttle due to breaking past hysteresis and fans are maxed out they don't clock back up still occurs.
|
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4298
Merit: 1645
Ruu \o/
|
|
May 22, 2012, 11:58:47 PM |
|
They will only clock back up if the temperature drops enough again.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
MrTeal
Legendary
Offline
Activity: 1274
Merit: 1004
|
|
May 23, 2012, 02:38:19 AM |
|
The LW value is almost certainly completely unrelated. If it crashes at approximately 7 days, then you are being hit by the ATI Display Library crashing after a week bug that happens only on windows. Try starting cgminer with --no-restart . You will lose temperature and/or fanspeed monitoring after a week but it at least wont crash.
I searched this thread but didn't really find any mention of this bug. I have a 5970 that crashes somewhere in the vicinity of once a week, without fully crashing the display driver. Clocks and voltages that often change when a card crashes aren't affected in this case. When it does this cgminer declares it sick and attempts to restart it, at which point cgminer crashes. A quick restart of cgminer fixes the issue, and the system goes about mining again without issue. I recently restarted it with --no-restart and I haven't had an issue yet (7 days will be 12 hours from now). Does anyone have any links to information or a possible solution to this if it is an ATI driver bug? I assumed I had a bad core on my GPU, but it would be great if that's not the issue.
|
|
|
|
kano
Legendary
Offline
Activity: 4620
Merit: 1851
Linux since 1997 RedHat 4
|
|
May 23, 2012, 03:38:10 AM |
|
The LW value is almost certainly completely unrelated. If it crashes at approximately 7 days, then you are being hit by the ATI Display Library crashing after a week bug that happens only on windows. Try starting cgminer with --no-restart . You will lose temperature and/or fanspeed monitoring after a week but it at least wont crash.
I searched this thread but didn't really find any mention of this bug. I have a 5970 that crashes somewhere in the vicinity of once a week, without fully crashing the display driver. Clocks and voltages that often change when a card crashes aren't affected in this case. When it does this cgminer declares it sick and attempts to restart it, at which point cgminer crashes. A quick restart of cgminer fixes the issue, and the system goes about mining again without issue. I recently restarted it with --no-restart and I haven't had an issue yet (7 days will be 12 hours from now). Does anyone have any links to information or a possible solution to this if it is an ATI driver bug? I assumed I had a bad core on my GPU, but it would be great if that's not the issue. Solution As you said, use --no-restart (so cgminer doesn't crash when ATI screws up - yes it is the ATI librarys fault on windows) Also with a separate (simple) program using the API, check when any of the GPU's become sick or lose their ADL info then use an API 'restart' Edit: I'll also add that anyone expecting there to be a cgminer code fix for this is misunderstanding the problem. It doesn't happen on linux like this - only windows. If it happened the same way on both linux and windows then it could be a possible cgminer code fix, but since it is windows specific (and not reproducible on linux) it is extremely likely to be in the ATI windows library Also the fact that the GPU doesn't stop working (if you use --no-restart) but simply loses it's ADL information, again means it is in the ATI library that is the problem.
|
|
|
|
MrTeal
Legendary
Offline
Activity: 1274
Merit: 1004
|
|
May 23, 2012, 04:28:01 AM |
|
The LW value is almost certainly completely unrelated. If it crashes at approximately 7 days, then you are being hit by the ATI Display Library crashing after a week bug that happens only on windows. Try starting cgminer with --no-restart . You will lose temperature and/or fanspeed monitoring after a week but it at least wont crash.
I searched this thread but didn't really find any mention of this bug. I have a 5970 that crashes somewhere in the vicinity of once a week, without fully crashing the display driver. Clocks and voltages that often change when a card crashes aren't affected in this case. When it does this cgminer declares it sick and attempts to restart it, at which point cgminer crashes. A quick restart of cgminer fixes the issue, and the system goes about mining again without issue. I recently restarted it with --no-restart and I haven't had an issue yet (7 days will be 12 hours from now). Does anyone have any links to information or a possible solution to this if it is an ATI driver bug? I assumed I had a bad core on my GPU, but it would be great if that's not the issue. Solution As you said, use --no-restart (so cgminer doesn't crash when ATI screws up - yes it is the ATI librarys fault on windows) Also with a separate (simple) program using the API, check when any of the GPU's become sick or lose their ADL info then use an API 'restart' Edit: I'll also add that anyone expecting there to be a cgminer code fix for this is misunderstanding the problem. It doesn't happen on linux like this - only windows. If it happened the same way on both linux and windows then it could be a possible cgminer code fix, but since it is windows specific (and not reproducible on linux) it is extremely likely to be in the ATI windows library Also the fact that the GPU doesn't stop working (if you use --no-restart) but simply loses it's ADL information, again means it is in the ATI library that is the problem. If it keeps hashing after losing its ADL information, I won't even worry about it. I manually set the fan to 85% anyway regardless of how low temps are, so I'm not too concerned with the fan control. I can manually restart it whenever I notice. Still, if there's a thread on the issue I'd love to read it. A quick search on the forums either turned up not much or way too much.
|
|
|
|
kano
Legendary
Offline
Activity: 4620
Merit: 1851
Linux since 1997 RedHat 4
|
|
May 23, 2012, 05:06:01 AM |
|
... If it keeps hashing after losing its ADL information, I won't even worry about it. I manually set the fan to 85% anyway regardless of how low temps are, so I'm not too concerned with the fan control. I can manually restart it whenever I notice.
Still, if there's a thread on the issue I'd love to read it. A quick search on the forums either turned up not much or way too much.
It's been discussed a few times in this thread.
|
|
|
|
Aseras
|
|
May 23, 2012, 05:59:18 PM |
|
He's saying it's a bug in ati's drivers ( no stretch there ) not in cgminer. ATI would have to fix it.
However, if ATI could make a decent driver it would be a miracle. That's their Achilles heel. Their hardware is amazing, their drivers are piss poor.
|
|
|
|
crazyates
Legendary
Offline
Activity: 952
Merit: 1000
|
|
May 23, 2012, 08:03:15 PM |
|
I have run my CGminer about 7 days, and i have about 11 Rigs with 5x 5850 and until today morning i found that some of my CGminer Freeze and stop sending share and i found some funny figure when i freeze something like "LW:50" and high figure around beside "LW"
where some of my CGminer still running fine thich is "LW:0"
you can see that on top of the cgminer !
anyone pls help
i cannot provide screen shot because it is not convenient and it is far from my house to warehouse
so anyone pls help if it happen again i will get the screen shot
sry for my English !
The LW value is almost certainly completely unrelated. If it crashes at approximately 7 days, then you are being hit by the ATI Display Library crashing after a week bug that happens only on windows. Try starting cgminer with --no-restart . You will lose temperature and/or fanspeed monitoring after a week but it at least wont crash. I have my computer set to restart every day at a certain time, and CGMiner starts when windows starts. Ya lose 3 minutes of hashing a day, but no 7 day bug you're talking about. The temp/fanspeed/dynamic OC is what makes CGMiner great!
|
|
|
|
crazyates
Legendary
Offline
Activity: 952
Merit: 1000
|
|
May 23, 2012, 08:24:04 PM |
|
I have my computer set to restart every day at a certain time, and CGMiner starts when windows starts. Ya lose 3 minutes of hashing a day, but no 7 day bug you're talking about. The temp/fanspeed/dynamic OC is what makes CGMiner great!
Wrong! If cgminer crashes right after your reboot, you can lose a bit more than 3 minutes. I have a .bat that pings 127.0.0.1 a couple dozen times and then starts CGMiner. This makes sure the User is fully logged on, connected to the wifi, and the HD has stopped before I start mining. It's never crashed on me yet...
|
|
|
|
kano
Legendary
Offline
Activity: 4620
Merit: 1851
Linux since 1997 RedHat 4
|
|
May 23, 2012, 08:55:23 PM Last edit: May 23, 2012, 09:33:42 PM by kano |
|
It doesn't happen on linux like this - only windows. If it happened the same way on both linux and windows then it could be a possible cgminer code fix, but since it is windows specific (and not reproducible on linux) it is extremely likely to be in the ATI windows library Also the fact that the GPU doesn't stop working (if you use --no-restart) but simply loses it's ADL information, again means it is in the ATI library that is the problem.
Why do you expect AMD code to work the same way on Linux and Windows? If you unloaded the ADL dll before re-initializing opencl, and then reloaded it, you would not have "this ADL library bug". Just because it happens to work on Linux does not mean it must work on Windows. It is like saying: my cgminer works in my environment so it is not a bug. The point is this particular problem doesn't happen on linux in the first place - it only happens on windows. The problem is that the miner keeps hashing but the ADL stops working ... after a long period of time (something like 6 or 7 days) If you have a workaround to fix this ATI bug (yes it is a bug) then feel free to discuss this with ckolivas (if he is interested) There is also the issue of testing a workaround ...
|
|
|
|
smracer
Donator
Legendary
Offline
Activity: 1057
Merit: 1021
|
|
May 23, 2012, 09:39:48 PM |
|
I just setup another copy of Ubuntu running on another thin client.
On all my other thin clients I can see all the temps, speed etc on one screen.
On this new one they are just flying by in one line and it is very hard to see what is going on. I am guessing I changed the view somehow maybe in the conf file?
How can I get back to the regular cgminer screen?
2.4.1
Thanks
|
|
|
|
kano
Legendary
Offline
Activity: 4620
Merit: 1851
Linux since 1997 RedHat 4
|
|
May 23, 2012, 10:40:41 PM |
|
I just setup another copy of Ubuntu running on another thin client.
On all my other thin clients I can see all the temps, speed etc on one screen.
On this new one they are just flying by in one line and it is very hard to see what is going on. I am guessing I changed the view somehow maybe in the conf file?
How can I get back to the regular cgminer screen?
2.4.1
Thanks
'Thin client'? NFI what you mean by that. Not sure what you did wrong, but it could be one of the following: 1) You (or someone else you got it from) compiled cgminer without curses Solution - tell the person who compiled it to check the output of ./configure when they run it 2) You used -T and somehow got that into the ~/.cgminer/cgminer.conf file (look for "text-only" : true and remove it) You might also find "debug" : true and "verbose" : true that you want to also remove 3) Maybe curses failed to load and it went into text mode coz you are using 11.04 and forgot to: cd /lib64/ sudo ln -s libncurses.so.5 libtinfo.so.5 Not sure what else at the moment.
|
|
|
|
smracer
Donator
Legendary
Offline
Activity: 1057
Merit: 1021
|
|
May 23, 2012, 11:34:31 PM |
|
ty so much.
configure: WARNING: Could not find curses library - if you want a TUI, install libncurses-dev or pdcurses-dev
|
|
|
|
P_Shep
Legendary
Offline
Activity: 1804
Merit: 1230
This is not OK.
|
|
May 24, 2012, 06:24:40 AM |
|
bool hex2bin(unsigned char *p, const char *hexstr, size_t len) { while (*hexstr && len) { char hex_byte[3]; unsigned int v;
if (!hexstr[1]) { applog(LOG_ERR, "hex2bin str truncated"); return false; }
hex_byte[0] = hexstr[0]; hex_byte[1] = hexstr[1]; hex_byte[2] = 0;
if (sscanf(hex_byte, "%x", &v) != 1) { applog(LOG_ERR, "hex2bin sscanf '%s' failed", hex_byte); return false; }
*p = (unsigned char) v;
p++; hexstr += 2; len--; }
return (len == 0 && *hexstr == 0) ? true : false; } Should the return value not be || rather than &&?
|
|
|
|
cengique
Member
Offline
Activity: 64
Merit: 10
|
|
May 24, 2012, 02:17:32 PM |
|
... I did the sudo aticonfig -f --adapter=all --initial and rebooted, I also deleted the .bin file in cgminer's directory.
CGminer gave an error about the number of devices not matching. It saw three devices, but OpenCL was only seeing one? ...
opencl has nothing to do with xorg I'm afraid, but what you need before starting cgminer is: I'm not running as root. DISPLAY is already set. And I'm not using SSH, I'm on the local console. I just suffered from the same problem. If your DISPLAY is set to ":0.0", clinfo only finds one card!!! This is dumb. You can correct it by setting it to ":0" as suggested above. I found the solution haphazardly from here: http://devgurus.amd.com/thread/140667
|
|
|
|
crazyates
Legendary
Offline
Activity: 952
Merit: 1000
|
|
May 24, 2012, 02:25:41 PM |
|
... I did the sudo aticonfig -f --adapter=all --initial and rebooted, I also deleted the .bin file in cgminer's directory.
CGminer gave an error about the number of devices not matching. It saw three devices, but OpenCL was only seeing one? ...
opencl has nothing to do with xorg I'm afraid, but what you need before starting cgminer is: I'm not running as root. DISPLAY is already set. And I'm not using SSH, I'm on the local console. I just suffered from the same problem. If your DISPLAY is set to ":0.0", clinfo only finds one card!!! This is dumb. You can correct it by setting it to ":0" as suggested above. I found the solution haphazardly from here: http://devgurus.amd.com/thread/140667It's not dumb - it's the way Xorg works. Before I was using CGMiner, I had 3 instances of phoenix on :0.0 :0.1 and :0.2 . That's how you specify cards.
|
|
|
|
P_Shep
Legendary
Offline
Activity: 1804
Merit: 1230
This is not OK.
|
|
May 24, 2012, 03:24:38 PM |
|
bool hex2bin(unsigned char *p, const char *hexstr, size_t len) ... return (len == 0 && *hexstr == 0) ? true : false; } Should the return value not be || rather than &&? I think the code is correct. It also checks input parameters. If size of buffer (p) is too small for a given hex str,the caller would know that something is wrong. If the hexStr is truncated to size less than buffer (p) length, the caller would know. For example: if hexStr points to 32 hex numbers, but the size of buffer pointed by p is 64, the function would return false. If you had || instead of &&, the function would return true and the caller would assume that there are 64 valid (0-255) bytes. If you had a prototype: bool hex2bin(unsigned char *buf, int bufSize, const char *hexstr, int * outlen) it would allow callers to check the size of the converted binary buffer. Thing is, the BFL uses hex2bin the other way round: the string is longer than p. The BFL returns a string of nonces, hex2bin is used to extract these nonces. There's no null terminator on each nonce, so hex2bin is returning an error, even though everything is fine.
|
|
|
|
P_Shep
Legendary
Offline
Activity: 1804
Merit: 1230
This is not OK.
|
|
May 24, 2012, 04:28:24 PM |
|
Not sure which version of driver-bitforce.c you are looking at but 2.4.1 does not check return code from hex2bin(). Sure, terminating after first 4 hex numbers would be better for BFL and Icarus, but jobj_binary()/work_decode() also uses this and checks the return code.
I'm bug chasing, so I edited the code to use the return value.
|
|
|
|
|