FalconFour
|
|
April 17, 2013, 08:13:23 PM |
|
I'm curious about the "S" option though (and drilling through 17 past pages is, well... wtf). It's said to be beneficial for older cards, but the oldest card you could run it on is an 8000-series card which seems to not get any performance gain/difference by using it - other than that I can't squeeze as many 'warps' into the configuration without entering "bad result mode". And it seems to actually be slower anyway. So what's up with "S"? Do you have anything to test its development on? I've got a whole pile of testers
|
feed the bird: 187CXEVzakbzcANsyhpAAoF2k6KJsc55P1 (BTC) / LiRzzXnwamFCHoNnWqEkZk9HknRmjNT7nU (LTC)
|
|
|
cbuchner1 (OP)
|
|
April 17, 2013, 08:17:09 PM Last edit: April 17, 2013, 08:30:01 PM by cbuchner1 |
|
I'm curious about the "S" option though (and drilling through 17 past pages is, well... wtf). It's said to be beneficial for older cards, but the oldest card you could run it on is an 8000-series card which seems to not get any performance gain/difference by using it
My GTX 260 loved S27x3 the most. Got 44kHash. The other kernels would achieve 10kHash less or so. According to pure shader count (216) it should be beating your 9800 cards by nearly factor 2. I should have tested with Aero off, but that information came too late I cannot really test now, as I have ripped the cards out already, and I am waiting for a 560Ti 448 core edition to go into the last available slot (out of 4). Here's my GPU park - all in the same desktop PC. Can only use 2 out of the 3 high power cards simultaneously. GTX 560Ti (EVGA) GTX 460 GTX 560Ti 448core edition (Palit) GTX 640 4GB (Club3D) and in some other PCs: GTX 660Ti (EVGA) GTX 640 4GB (Club3D) 9600M GT (in a Laptop) already end of life'd (anyone want 'em?) 4 9600GSO cards 1 8800GT 2 GTX 260 total hashing power would be in the 700 kHash range I guess. Christian
|
|
|
|
KnowBuddy
Member
Offline
Activity: 69
Merit: 10
|
|
April 17, 2013, 09:22:11 PM |
|
I have released an April-17th version. Source code on github will be updated tomorrow.
It reduces the CPU usage greatly when run in interactive mode on Windows (in fact I think Windows is not measuring it correctly now, because it seems to hover around 0-1% for me). But it might be hashing a bit slower than before when running interactively. ATTENTION: I just remembered that I default to interactive mode now when not otherwise specified. This is done for all CUDA devices that have the watchdog timer active (i.e. they are driving a display and no registry fix was put in place to disable the watchdog for that card).
The texture cache feature now works but is detrimental to performance on all but fast Kepler cards where it doesn't make things neither better nor worse. Note the -C option now takes "1" or "2" as input, corresponding to 1D or 2D texture layouts. Consider this experimental, still.
Interesting, testing the tex-cache on a GTX 670MX I get a 2-2.5 kH/s increase (to 70 kH/s stable) over the 3/14 build using -i 0 -C 2 -l 320x2 Autotune tries to go to 10x8, however, which is much lower. Using -C 1 gets an additional 0.5 kH/s, but causes some of the hashes to be not validated by the CPU (up to 50% in short testing). -C 0 gives consistent results as the 3/14 build (66-67.5 kH/s) I'll do a longer run with 2D tex-cache to evaluate stability.
|
|
|
|
cbuchner1 (OP)
|
|
April 17, 2013, 09:50:48 PM Last edit: April 17, 2013, 10:07:08 PM by cbuchner1 |
|
I think with -C 1 you need to watch out that you don't throw too large work units at the device. -l 320x2 is really large. 1D textures have a size limit, and if that is exceeded I think I do print a warning on the console - but the programm will continue (and some hashes may fail silently). Autotune will automatically make sure not to exceed the texture size limit.
about those texture size limits. If I remember correctly -C 1 has a size limit of 511 for regular kernels (block x warps). e.g. 320x2 results in 640 and that is too big -C 1 has a size limit of 255 for "S" kernels.
|
|
|
|
nst6563
|
|
April 17, 2013, 10:04:40 PM |
|
Tried autotune with -C option set to anything other than 0 and got a program close and the gpu clocks went into the recovery state (half speed). Needed to reset to fix it.
Setting -C to 0 and letting autotune run came up with nearly the same results as the 4/14 build. Adding the -C 1 or -C 2 variable had almost no effect (maybe .5kh/s).
2 gtx560se (gpu's 0 and 2) and a gt430 (gpu 1) with the following commandline generate about 277kh/s
cudaminer.exe -i 0,0,0 -d 0,1,2 -C 2,2,2 -l 12x8,20x8,12x8
Will let it run for a while and check back.
|
|
|
|
Lacan82
|
|
April 17, 2013, 10:12:32 PM |
|
Hash rate went up about 5 KH for both cards. -C 1 or C 2 both show the 5 KH increase.
|
|
|
|
cbuchner1 (OP)
|
|
April 17, 2013, 10:19:31 PM |
|
I have a first LOOKUP_GAP implementation, and I am getting some problems here. I am using way too many registers in these kernels (96). And I am having a warp divergence problem (I know, that is something Captain Picard or Geordi Laforge might say). GTX 560Ti: for -l 32x4 (no autotune yet, just a random choice) LOOKUP_GAP of 2: abysmal performance of 70 kHash/s LOOKUP_GAP of 4: about 60 kHash/s with with texture cache enabled 50kHash/d without texture cache. original performance: ~150 kHash. DUH! The upside: it only needs 1/2 or 1/4th of the memory The warp divergence (also known as branch divergence) problem is really the main issue, and I fear that it is not fixable. I will throw it at my Kepler card tomorrow to see if it barfs.
|
|
|
|
mg27341
Newbie
Offline
Activity: 28
Merit: 0
|
|
April 17, 2013, 10:30:06 PM Last edit: April 17, 2013, 10:53:01 PM by mg27341 |
|
Here's what I am getting with an NVidia 570 with the 4/17 version compared to 4/13: C0, a boost of about 3-4 kH (from 218 to 221/222) C1, a boost of about 13 (from 218 to 231) C2, crashes the display driver and fails to validate on CPU An NVidia 670 I also have running went from 141 to 149 with C1 and this version (vs 4/13). I'll stick with C1 and THANKS!!! This version + C1 covers my pool costs and then some... Hats off to you and I just sent you a token 0.1 LTC for all your hard work (and this version, specifically). I hope everyone else using your program donates at least that much to you to reimburse you for your effort.
|
|
|
|
cbuchner1 (OP)
|
|
April 17, 2013, 10:33:14 PM |
|
Here's what I am getting with the 4/17 version compared to 4/13: C0, a boost of about 3-4 kH (from 218-221/222) C1, a boost of about 13 (from 218-231) C2, crashes the display driver and fails to validate on CPU I'll stick with C1 and THANKS!!! about the texture cache related crashes with autotune: I think I am doing my memory allocations too aggressively when the cache is enabled. This does not leave enough breathing room for the nVidia WDDM graphics driver and results in a crash. You can try -C 2 specifying the same launch configuration manually that works for -C 1 I envy all of you who are getting a boost - because I don't. 0.1 LTC arrived, thanks Christian
|
|
|
|
mg27341
Newbie
Offline
Activity: 28
Merit: 0
|
|
April 17, 2013, 10:36:25 PM Last edit: April 17, 2013, 10:57:30 PM by mg27341 |
|
about the texture cache related crashes with autotune: I think I am doing my memory allocations too aggressively when the cache is enabled. This does not leave enough breathing room for the nVidia WDDM graphics driver and results in a crash. You can try -C 2 specifying the same launch configuration manually that works for -C 1
Christian
Yep, exactly what I tried: Same settings for C1 as C2, both gotten from an original autotune with C0. Maybe I should try re-autotuning with C1 to see what happens. Let me do that... Update: Retest results for 670 card: C1: Attempt 1. Did an autotune, it crashed the driver. Attempt 2. Autotune succeeded and tuned to the usual (i.e., the ones I get from C0) settings of 30x8, so those seem to be good settings regardless of the C switch C2: Attempt 1: Surprisingly, no crash this time, also tuned to the usual C0 settings of 30x8 and getting same peak performance as C1 (about 231-232). So, in the end C2==C1 if it doesn't crash the driver on startup. I think I am going to revert back to C1 for now as my normal operating mode with a fixed -l 30x8, until you grace us with the next iteration of your wonderful mods. Michael P.S. A special thanks in this version for making Ctrl-C / Ctrl-Break work properly. It used to have about an 80% chance of crashing the display driver with prior versions of your app. P.P.S An amusing unrelated note: I've got an old AMD 5450 also running cgminer 2.11.4 and generating 15 kH/s, but using only 20 (additional) watts to do it according to my kilowatt meter. My gains from this version change alone (i.e., 4/13 -> 4/17) are more than that AMD card is producing .
|
|
|
|
cbuchner1 (OP)
|
|
April 17, 2013, 11:04:18 PM |
|
I don't see how you guys can get any gains with GT5xx cards. If I enable any -C option with my 560Ti, I lose a third in kHash throughput. MEH!
|
|
|
|
Lacan82
|
|
April 17, 2013, 11:06:22 PM |
|
I don't see how you guys can get any gains with GT5xx cards. If I enable any -C option with my 560Ti, I lose a third in kHash throughput. MEH!
I'm using -C 1 -i 0
|
|
|
|
nst6563
|
|
April 17, 2013, 11:16:40 PM |
|
I tried my gtx560se's with -C 1 and got a very slight drop in performance. Hardly measurable in my opinion.
I used to average 276kh/s with the previous version, with this new one and the -C 2 variable I'm averaging 278.5kh/s.
edit: sorry...the 278 is a total combined hashrate of all three cards
|
|
|
|
mg27341
Newbie
Offline
Activity: 28
Merit: 0
|
|
April 17, 2013, 11:17:11 PM |
|
I don't see how you guys can get any gains with GT5xx cards. If I enable any -C option with my 560Ti, I lose a third in kHash throughput. MEH!
I should mention that I am using -i 1, because there's only one card in this host and I need to use it for other things. In any case, I am ever hopeful that whatever -C tweaks you make going forward to increase your rate won't hamper ours (so perhaps some backwards compatibility on the -C front just in case?).
|
|
|
|
mg27341
Newbie
Offline
Activity: 28
Merit: 0
|
|
April 17, 2013, 11:18:54 PM |
|
I tried my gtx560se's with -C 1 and got a very slight drop in performance. Hardly measurable in my opinion.
I used to average 276kh/s with the previous version, with this new one and the -C 2 variable I'm averaging 278.5kh/s.
edit: sorry...the 278 is a total combined hashrate of all three cards
I'm glad you ptu that edit in, I was thinking you were getting 278 on a single 560se and trumping my 570
|
|
|
|
cbuchner1 (OP)
|
|
April 17, 2013, 11:19:47 PM |
|
edit: sorry...the 278 is a total combined hashrate of all three cards
Yeah, the number looked a little high for just one or two SEs ;-) Did you run these in a triple-SLI configuration, or why do you have three of em? Christian
|
|
|
|
mg27341
Newbie
Offline
Activity: 28
Merit: 0
|
|
April 17, 2013, 11:25:21 PM |
|
BTW: Overclocking my 570 from the stock 770 MHz Core Clock to 900 MHz (a 17% increase in the Core Clock and a corresponding increase in the linked Shader Clock), which my card can do when slightly overvolted (1.0 v -> 1.038 v) raises the rate from about 230 kH/s to 266 kH/s (a 14% increase in hash rate). It does not appear that we are very memory speed bound. I am leaving my card at stock frequency though, because I want desktop stability and do not want to overheat my card any more than necessary with the torture it is already getting from mining. 670s don't grow on trees you know . Michael
|
|
|
|
nst6563
|
|
April 17, 2013, 11:26:53 PM |
|
edit: sorry...the 278 is a total combined hashrate of all three cards
Yeah, the number looked a little high for just one or two SEs ;-) Did you run these in a triple-SLI configuration, or why do you have three of em? Christian I have 2 gtx560se and 1 gt430. I got 1 560se for free (I asked for a 560 but they weren't that computer literate so whatever lol ) and figured I may as well sli it with another for $120 (still not sure why...other than I always thought it would be cool to have sli. I'm not a big gamer). The 430 was a leftover from an htpc build. The 560se I must say I'm impressed with. It overclocks like a dream. Stock clocks are 776/1915mem/1552shader. I have it running at 971core/2121mem/1938shader at 1.062v. Stock cooling. Never gets above 80c when cranking away mining.
|
|
|
|
mg27341
Newbie
Offline
Activity: 28
Merit: 0
|
|
April 17, 2013, 11:29:26 PM |
|
The 560se I must say I'm impressed with. It overclocks like a dream. Stock clocks are 776/1915mem/1552shader. I have it running at 971core/2121mem/1938shader at 1.062v. Stock cooling. Never gets above 80c when cranking away mining.
Yeah, a 6.2% overvolt is more than I am willing to stomach to push my card. I think a 3.8% overvolt to 1.038 v is about the most I am comfortable with for short gaming bursts (and that nets me the 770MHz -> 900 MHz Core Clock increase I mentioned in my previous post) on the 570. I've very careful with the 570, because it was one of those uber-expensive Sparkle jobs with the tricked out silent fans and stuff that I paid an extra $50-$75 for. Here are some pics that may bring back memories, I'm sure you've all seen them: http://www.calibrestyle.com.tw/productDetail.asp?id=43Michael
|
|
|
|
Lacan82
|
|
April 17, 2013, 11:30:40 PM |
|
BTW: Overclocking my 570 from the stock 770 MHz Core Clock to 900 MHz (a 17% increase in the Core Clock and a corresponding increase in the linked Shader Clock), which my card can do when slightly overvolted (1.0 v -> 1.038 v) raises the rate from about 230 kH/s to 266 kH/s (a 14% increase in hash rate). It does not appear that we are very memory speed bound. I am leaving my card at stock frequency though, because I want desktop stability and do not want to overheat my card any more than necessary with the torture it is already getting from mining. 670s don't grow on trees you know . Michael It is probably safe to over clock to 822. My card is standard at 822 mhz factory OC
|
|
|
|
|