MadHacker
|
|
September 07, 2011, 07:25:15 PM |
|
I have a feature request. Every once in a while i have a GPU that can't be restarted... but restarting the CG miner works fine.
I was wondering if you could add a cmd switch that exits after a GPU failure. i.e --gf # ( where # is number of GPUs to fail before exiting the CGMiner.exe)
this would save me on constantly having to check on the miner to make sure that all GPU's are constantly mining
Thanks.
|
|
|
|
PLaci1982
Full Member
Offline
Activity: 168
Merit: 100
Live long and prosper. \\//,
|
|
September 07, 2011, 08:14:34 PM Last edit: September 07, 2011, 08:24:38 PM by PLaci1982 |
|
I have a feature request. Every once in a while i have a GPU that can't be restarted... but restarting the CG miner works fine.
I was wondering if you could add a cmd switch that exits after a GPU failure. i.e --gf # ( where # is number of GPUs to fail before exiting the CGMiner.exe)
this would save me on constantly having to check on the miner to make sure that all GPU's are constantly mining
Thanks. Makes sense... Default should be 0, where: 0 = disabled function, no exit after GPU failure At Windows this batch file would make it: :cgminer cgminer.exe -blah -blah -blah -theargumentCkolivaschoosetouse 2 GOTO cgminer
|
Hardware Expert / WinXP, Win7 Expert
1J5oPkyGVdb4mv44KGZQYsHS2ch6e1t4rc
|
|
|
Endeavour79
|
|
September 07, 2011, 08:32:28 PM |
|
I have a feature request. Every once in a while i have a GPU that can't be restarted... but restarting the CG miner works fine.
I was wondering if you could add a cmd switch that exits after a GPU failure. i.e --gf # ( where # is number of GPUs to fail before exiting the CGMiner.exe)
this would save me on constantly having to check on the miner to make sure that all GPU's are constantly mining
Thanks. Makes sense... Default should be 0, where: 0 = disabled function, no exit after GPU failure At Windows this batch file would make it: :cgminer cgminer.exe -blah -blah -blah -theargumentCkolivaschoosetouse 2 GOTO cgminer +1
|
NSW, Australia - Rigs, Mining, Pools - Local help needed? Send me a message!
|
|
|
cablepair
|
|
September 07, 2011, 08:52:11 PM |
|
+1 p.s. os2sam: do you really still run os/2? I remember getting OS/2 Warp for Christmas one year when I was like 12 or 13, too bad for IBM win'95 came out that summer ;/
|
|
|
|
os2sam
Legendary
Offline
Activity: 3586
Merit: 1098
Think for yourself
|
|
September 07, 2011, 09:03:43 PM |
|
+1 p.s. os2sam: do you really still run os/2? I remember getting OS/2 Warp for Christmas one year when I was like 12 or 13, too bad for IBM win'95 came out that summer ;/ Well of course. Doesn't everyone? Still running OS/2 Warp 4 on a really old Thinkpad. The current versions are now called eComStation and I have that on my Personal Laptop and on a VPC on my company laptop. Having trouble finding a bitcoin miner though . Thanks for asking, Sam
|
A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?
|
|
|
Sekioh
|
|
September 07, 2011, 09:10:41 PM |
|
1) Dual GPU cards don't get adjusted/disabled correctly. For example, a 5970 will only disable the one GPU that has the temp sensor: GPU 3: [80.5 C] [DISABLED /78.5 Mh/s] [Q:2 A:11 R:1 HW:0 E:550% U:1.81/m] GPU 4: [327.3/324.5 Mh/s] [Q:25 A:23 R:1 HW:0 E:92% U:3.78/m] Strange, for me each core on my 5970's can be disabled seperately and correctly ([g] [d] 0, [g] [d] 1). 2) It'd be awesome if cgminer would record the current clocks on startup, and restore them on exit if the auto tuning changed them at all.
I believe I read it's supposed to already. Could be a bug or older version? 3) Pressing "G" when you have more than a couple of cards makes it impossible to read the output, because the window the output is displayed to is too small unless you have a HUGE screen.
Windows? Make shortcut and size the font down or increase the lines height from 25 to 50. 4) I'd love a way to specify different temperature thresholds per GPU on the command line. If I have different model cards in there, they have different points where they're happy. 5770s get crashy above 90-95, where 5970 and 6990 cards idle near there at times. You can, comma's, just have to know which ones are enumerated which. (--gpu-clock 1000,440,750,950) for example. 5) My ideal dream would be a way of somehow saying "Any 5970 cards you see, set the temperature thresholds to X/Y, the voltage to Z, etc. Any 5770 cards, the temperature threshold is..." so that I don't have to look up which cards are in which system, just to pass that along to cgminer.
CARDbased, would be nice to see, but there'd have to be a way to check by manufacturer and such too, I know some 5970's by like MSI have different heatsink/fan combinations than say PowerCooler or something, and they might not be tolerant the same way but still show up as 5970 by their ID. Don't know how you'd be specific with setting that maybe with grouping (--gpu-group {5970@950+300+1.5})? 7) Specifying an overclock/underclock range that cgminer is allowed to adjust the clock in would be handy.
A range with a dash in it would be cool. (--gpu-memory 200-700)! One step further, having it attempt to determine (maybe even saved into a local file) how high the clock was able to go without problems, and self-tuning the max clock rate while under the threshold temperature. Well if he's already parsing IN the config file, maybe if you specify a config, and change something inside it can be exported back out in addition tacking on these extra settings in the file? (--config myconfig.json) could get a line with card serial numbers or something unique? ({"_Safe_CARDID": "CLOCK,MEMORY,FAN"})
|
|
|
|
Sekioh
|
|
September 07, 2011, 09:15:01 PM |
|
(in a slightly off topic but related issue, I can't clock my card, I got drivers installed and can mine, but EVERY clocking app {cgminer, overdrive, ccc (wont even start), clock tool} either doesn't run or has all sliders grayed out... nobody's helping in the technical forums D:> tried different drivers and ccc versions 11.5 thru 11.8 and it can't be messed up installs, I got frustrated and even reformatted and installed windows over and fresh installed the drivers for two of the versions .6 and . Sorry if I'm stating the overly obvious, but in the ATI CCC did you unlock the overclocking page? When I first started messing with this stuff I looked at the overclock page a bunch of times before I realized that the lock was actually a button. I was really irritated about it being grayed out too. Sam That could be the locking issue, but I summarized here instead of the full text in the thread I made... I can't get CCC to even start. I did a search and a lot of people had the launching issues for years (articles dating 2005-2011) through all the versions, and some are giving random success stories by doing driver cleaners and reinstalling and stuff... but that can't possibly help if I reformatted and installed directly on a new copy of windows! So I'm pretty !@#%ed if I can't do any tweaking in third party apps because their own native code is messed up with their panel :|
|
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
September 07, 2011, 09:50:51 PM |
|
About changing GPU settings: cgminer already reports the so-called "safe" range of whatever it is you are modifying when you ask to modify it on the fly. However, you can change settings to values outside this range. Despite this, the card can easily refuse to accept your changes, or worse, to accept your changes and then silently ignore them. So there is absolutely no way for me to know how far to/from where/to I can set things safely or otherwise, and there is nothing stopping you from at least trying to set them outside this range. I'm very conscious of these possible failures and that's why cgminer will report back the current values for you to examine how exactly the card has responded. Even within the reported range of accepted values by the card, it is very easy to crash just about any card, so I cannot use those values to determine what range to set. You have to provide something meaningful manually for cgminer to work with through experimentation.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
toasty
Member
Offline
Activity: 90
Merit: 12
|
|
September 07, 2011, 09:59:44 PM |
|
1) Dual GPU cards don't get adjusted/disabled correctly. For example, a 5970 will only disable the one GPU that has the temp sensor: GPU 3: [80.5 C] [DISABLED /78.5 Mh/s] [Q:2 A:11 R:1 HW:0 E:550% U:1.81/m] GPU 4: [327.3/324.5 Mh/s] [Q:25 A:23 R:1 HW:0 E:92% U:3.78/m] Strange, for me each core on my 5970's can be disabled seperately and correctly ([g] [d] 0, [g] [d] 1). Mine can too, I mean the auto-disable that seems to be happening now if the temperature goes too high. The second GPU doesn't have a sensor, and I don't think cgminer realizes it's sharing a sensor with the first GPU. 3) Pressing "G" when you have more than a couple of cards makes it impossible to read the output, because the window the output is displayed to is too small unless you have a HUGE screen.
Windows? Make shortcut and size the font down or increase the lines height from 25 to 50. I've got boxes with 8 GPUs in them, even on a 30" monitor with pretty small print, there still isn't enough room to display them all. 4) I'd love a way to specify different temperature thresholds per GPU on the command line. If I have different model cards in there, they have different points where they're happy. 5770s get crashy above 90-95, where 5970 and 6990 cards idle near there at times. You can, comma's, just have to know which ones are enumerated which. (--gpu-clock 1000,440,750,950) for example. You can for --gpu-clock, but not --temp-overheat for example. --temp-overheat: '90,90,90,95,95' is not a number
|
|
|
|
Sekioh
|
|
September 07, 2011, 10:15:16 PM |
|
I read your post and walked away and came back and kinda skipped over the part of that line that was 'temperature' for each card but yeah it could easily be added as an array like the individual clock speeds I hope, that'd be a nice option.
|
|
|
|
ancow
|
|
September 07, 2011, 11:29:53 PM |
|
After running 2.0.0 (and whatever the previous version was) a while, I keep getting these (cgminer was started at 2011-09-07 06:20:55, and I probably missed the first few occurrences due to sleeping): [2011-09-08 00:36:19] LONGPOLL received after new block already detected [2011-09-08 00:36:19] New block detected on network before longpoll, waiting on fresh work [2011-09-08 00:36:57] Accepted 85c06c40 GPU 0 thread 0 pool 0 [2011-09-08 00:40:46] Accepted d0d8e92b GPU 0 thread 1 pool 2 [2011-09-08 00:41:26] Accepted 5a5db10b GPU 0 thread 0 pool 2 [2011-09-08 00:41:40] Accepted 700a7d23 GPU 0 thread 0 pool 2 [2011-09-08 00:42:54] Accepted 38ed493b GPU 0 thread 0 pool 2 [2011-09-08 00:44:28] LONGPOLL received after new block already detected [2011-09-08 00:44:28] New block detected on network before longpoll, waiting on fresh work [2011-09-08 00:45:16] Accepted 787f3450 GPU 0 thread 1 pool 0 [2011-09-08 00:45:38] Accepted 1e16a70c GPU 0 thread 0 pool 1 [2011-09-08 00:45:55] Accepted 4e0bab28 GPU 0 thread 1 pool 2 [2011-09-08 00:49:07] Accepted ae7b9c3a GPU 0 thread 0 pool 2 [2011-09-08 00:54:14] Accepted e2be9c38 GPU 0 thread 0 pool 2 [2011-09-08 00:54:32] LONGPOLL received after new block already detected [2011-09-08 00:54:32] New block detected on network before longpoll, waiting on fresh work
Another instance with no backup pools set and started a little earlier isn't exhibiting this behaviour. Last time this happened, a crash followed pretty soon after.
|
BTC: 1GAHTMdBN4Yw3PU66sAmUBKSXy2qaq2SF4
|
|
|
toasty
Member
Offline
Activity: 90
Merit: 12
|
|
September 07, 2011, 11:39:18 PM |
|
I'm also seeing something where the size of the queue on each GPU keeps growing over time, until there are hundreds of queued work per card. It may have something to do with cgminer reporting that the pool isn't providing work quickly enough (which I think is incorrect, the pushpoold it's talking to is a few feet away, on a totally unloaded box, that older versions of cgminer aren't complaining about), and after that the size of the queue keeps growing like crazy. This also seems to be causing some of the work that's being done to be so old it's being rejected by the time it actually gets a chance to run. Example from a box running for a few hours: [(5s):845.3 (avg):833.2 Mh/s] [Q:1336 A:1820 R:84 HW:0 E:136% U:10.99/m] TQ: 8 ST: 8 SS: 0 DW: 11 NB: 18 LW: 2745 GF: 4 RF: 0 I: 9 Connected to Block: 00000938f98c268dcf86a8cb4efa000a... Started: [18:28:21] -------------------------------------------------------------------------------- [P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit GPU 0: [95.0 C] [367.3/373.5 Mh/s] [Q:673 A:810 R:43 HW:0 E:120% U:4.89/m] GPU 1: [88.0 C] [187.7/186.9 Mh/s] [Q:251 A:400 R:11 HW:0 E:159% U:2.41/m] GPU 2: [99.0 C] [106.1/100.8 Mh/s] [Q:137 A:235 R:9 HW:0 E:172% U:1.42/m] GPU 3: [88.0 C] [179.8/172.1 Mh/s] [Q:225 A:375 R:21 HW:0 E:167% U:2.26/m]
|
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
September 07, 2011, 11:40:36 PM |
|
I'm also seeing something where the size of the queue on each GPU keeps growing over time, until there are hundreds of queued work per card. It may have something to do with cgminer reporting that the pool isn't providing work quickly enough (which I think is incorrect, the pushpoold it's talking to is a few feet away, on a totally unloaded box, that older versions of cgminer aren't complaining about), and after that the size of the queue keeps growing like crazy. This also seems to be causing some of the work that's being done to be so old it's being rejected by the time it actually gets a chance to run. Example from a box running for a few hours: [(5s):845.3 (avg):833.2 Mh/s] [Q:1336 A:1820 R:84 HW:0 E:136% U:10.99/m] TQ: 8 ST: 8 SS: 0 DW: 11 NB: 18 LW: 2745 GF: 4 RF: 0 I: 9 Connected to Block: 00000938f98c268dcf86a8cb4efa000a... Started: [18:28:21] -------------------------------------------------------------------------------- [P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit GPU 0: [95.0 C] [367.3/373.5 Mh/s] [Q:673 A:810 R:43 HW:0 E:120% U:4.89/m] GPU 1: [88.0 C] [187.7/186.9 Mh/s] [Q:251 A:400 R:11 HW:0 E:159% U:2.41/m] GPU 2: [99.0 C] [106.1/100.8 Mh/s] [Q:137 A:235 R:9 HW:0 E:172% U:1.42/m] GPU 3: [88.0 C] [179.8/172.1 Mh/s] [Q:225 A:375 R:21 HW:0 E:167% U:2.26/m] That's total queued to date, not current queued.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
toasty
Member
Offline
Activity: 90
Merit: 12
|
|
September 07, 2011, 11:45:07 PM |
|
That's total queued to date, not current queued.
Ack, ignore me. I had my own patch that was exporting some variables to an external program that didn't merge correctly with 2.0.0, that was reporting to my program the wrong number for the queue size. I normally didn't look at cgminer's own UI, and when I finally went to go look at it, I misread. Sorry.
|
|
|
|
ancow
|
|
September 07, 2011, 11:52:55 PM |
|
I just tried to debug the problem I'm seeing, but attaching gdb to the cgminer process or starting cgminer from gdb froze the whole display (except the mouse pointer) after a short while and maxed out all 4 CPU cores. X went back to working normally after I logged in remotely and killed cgminer & gdb.
|
BTC: 1GAHTMdBN4Yw3PU66sAmUBKSXy2qaq2SF4
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
September 07, 2011, 11:54:57 PM |
|
I just tried to debug the problem I'm seeing, but attaching gdb to the cgminer process or starting cgminer from gdb froze the whole display (except the mouse pointer) after a short while and maxed out all 4 CPU cores. X went back to working normally after I logged in remotely and killed cgminer & gdb.
Indeed you cannot gdb with opencl code running. Everything hangs and you have to kill both the app and gdb. Makes debugging tons of fun at my end.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
September 07, 2011, 11:58:16 PM |
|
1) Dual GPU cards don't get adjusted/disabled correctly. For example, a 5970 will only disable the one GPU that has the temp sensor: GPU 3: [80.5 C] [DISABLED /78.5 Mh/s] [Q:2 A:11 R:1 HW:0 E:550% U:1.81/m] GPU 4: [327.3/324.5 Mh/s] [Q:25 A:23 R:1 HW:0 E:92% U:3.78/m] 2) It'd be awesome if cgminer would record the current clocks on startup, and restore them on exit if the auto tuning changed them at all. 3) Pressing "G" when you have more than a couple of cards makes it impossible to read the output, because the window the output is displayed to is too small unless you have a HUGE screen. 4) I'd love a way to specify different temperature thresholds per GPU on the command line. If I have different model cards in there, they have different points where they're happy. 5770s get crashy above 90-95, where 5970 and 6990 cards idle near there at times. 5) My ideal dream would be a way of somehow saying "Any 5970 cards you see, set the temperature thresholds to X/Y, the voltage to Z, etc. Any 5770 cards, the temperature threshold is..." so that I don't have to look up which cards are in which system, just to pass that along to cgminer. 6) Temperatures >100C should be allowed, no matter how bad of an idea that sounds. We have some cards that go up to 105-107C without issue. 7) Specifying an overclock/underclock range that cgminer is allowed to adjust the clock in would be handy. One step further, having it attempt to determine (maybe even saved into a local file) how high the clock was able to go without problems, and self-tuning the max clock rate while under the threshold temperature. 1: Not sure how to fix that since they don't return a different adapter id. I'll poke around some more. 2: It already does. 3: I'll consider trimming it somehow 4: Doable. 5: *cough* 6: Doable 7: Doable 8: *cough
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
ancow
|
|
September 08, 2011, 12:02:02 AM |
|
I just tried to debug the problem I'm seeing, but attaching gdb to the cgminer process or starting cgminer from gdb froze the whole display (except the mouse pointer) after a short while and maxed out all 4 CPU cores. X went back to working normally after I logged in remotely and killed cgminer & gdb.
Indeed you cannot gdb with opencl code running. Everything hangs and you have to kill both the app and gdb. Makes debugging tons of fun at my end. Great, and here I was hoping I had missed something... I'm running it with -T -D now to see whether debug output will shed any light on this, but it may be a while as this issue doesn't appear too soon.
|
BTC: 1GAHTMdBN4Yw3PU66sAmUBKSXy2qaq2SF4
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
September 08, 2011, 12:07:09 AM |
|
I just tried to debug the problem I'm seeing, but attaching gdb to the cgminer process or starting cgminer from gdb froze the whole display (except the mouse pointer) after a short while and maxed out all 4 CPU cores. X went back to working normally after I logged in remotely and killed cgminer & gdb.
Indeed you cannot gdb with opencl code running. Everything hangs and you have to kill both the app and gdb. Makes debugging tons of fun at my end. Great, and here I was hoping I had missed something... I'm running it with -T -D now to see whether debug output will shed any light on this, but it may be a while as this issue doesn't appear too soon. Did it actually crash this time? Sometimes they're literally milliseconds apart (longpoll and detection) and their messages just happen to get posted to the output in the wrong order (you'll note they're always on the same second).
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
ancow
|
|
September 08, 2011, 12:13:21 AM |
|
I just tried to debug the problem I'm seeing, but attaching gdb to the cgminer process or starting cgminer from gdb froze the whole display (except the mouse pointer) after a short while and maxed out all 4 CPU cores. X went back to working normally after I logged in remotely and killed cgminer & gdb.
Indeed you cannot gdb with opencl code running. Everything hangs and you have to kill both the app and gdb. Makes debugging tons of fun at my end. Great, and here I was hoping I had missed something... I'm running it with -T -D now to see whether debug output will shed any light on this, but it may be a while as this issue doesn't appear too soon. Did it actually crash this time? Sometimes they're literally milliseconds apart (longpoll and detection) and their messages just happen to get posted to the output in the wrong order (you'll note they're always on the same second). I had to kill it before it had a chance to crash. I know that this can happen. However, from a certain point on, according to cgminer output it *always* happens, so this is likely a bug.
|
BTC: 1GAHTMdBN4Yw3PU66sAmUBKSXy2qaq2SF4
|
|
|
|