cbuchner1 (OP)
|
|
January 22, 2014, 09:11:23 AM |
|
thanks for trying to help me. I am very greatful
passing -C 2 might help a bit. Also -i 0 if you can accept some sluggish video output. Also remember the strongest configurations that autotune found for you and pass them with the -l flag. Saves some time the next time you start it and it will always deliver the same performance. Christian
|
|
|
|
Morgahl
Member
Offline
Activity: 70
Merit: 10
|
|
January 22, 2014, 09:27:30 AM |
|
Many Thanks for this. Using Patoberli's build of commit 111 I was able to play around a bit. T kernel in Windows on my Titan is very unstable during autotune unfortunately anything that allocates more then 3GB of VRAM just crashes Cudaminer outright. Not sure what direct limitation is causing this but this is a consistent observation with several hours of manual configurations. The Titan Kernel also heavily favors multiples of the old T16x1 such as T64x1 -L 1, T64x2 -L 2, etc. Not sure why but it makes picking out optimal settings easy On my Titan I was able to test and get 5.6-5.8 kh/s (varies but fairly even spread) using -i 0 -H 1 -l T32x8 -L 4 -a scrypt-jane:YAC with a mild Core OC of +250. I will submit this and full details to the spreadsheet after a full night of stable submissions Edit: I have broken 6Kh/s, but only about 80% were validated nice to have a high range but 80% of 6 is 4.8 so no real benefit lol.
|
|
|
|
cbuchner1 (OP)
|
|
January 22, 2014, 09:57:44 AM |
|
Edit: I have broken 6Kh/s, but only about 80% were validated nice to have a high range but 80% of 6 is 4.8 so no real benefit lol. I am also having some validation issues with -L 5 on my GTX 780 Ti cards at 4.7 kHash/s. I wonder what is causing this. Christian
|
|
|
|
justafool76
Member
Offline
Activity: 85
Merit: 10
|
|
January 22, 2014, 10:01:48 AM |
|
Two new experimental kernels added to github - currently for Linux only. The Visual C++ project has not yet been updated. You will want to run ./autogen.sh and configure after doing a git pull.
"Z" code submission by nVidia for Compute 3.5 devices (GTX 780 etc...). Good for scrypt. "Y" code submission by nVidia, modified to run on Compute 3.0 devices also. Good for scrypt.
I find that scrypt-jane still runs faster with the "X" (Fermi) and "K/T" (Kepler/Titan) kernels from the current github code.
Test away... Especially the Z kernel is expected to rule. I haven't tested it yet in detail. Best config for "Z" is No. of SMX x 24, according to the engineer who wrote it. Best config for "Y" is (guessing) No. of SMX x 32 - or just autotune.
The Z kernel is best run with -C 0 (it supports C 1 and C2, but that is mostly pointless).
When you make kHash/s benchmarks compare with the best scrypt values achieved with the 2013-12-18 release.
I got 86 kHash/s on GTX 750M with the -C2 flag and -l Y4x32 in some quick tests, which might be slightly faster than what the 2013-12-18 release delivered.
Christian
thanks for the help but you have lost me I don't understand what you mean I am not as smart as you and others here.. here is my .bat file below what else should I be putting in the bat file cudaminer -o stratum+tcp://asia.middlecoin.com:3333 -u 1MU4EAB6p5xcRPhZ8gFKZSq9znchJpt2iE -p 123 what else do I need to put to try to get a better hash rate. my second lappy has a nvida gtx 670 m 3gb gpu and its getting about 75khps and has some thingy f56x2 and I use the same bat file I know its a different card so I know I will have to put some extra in it what do I do please can some help me please
|
|
|
|
justafool76
Member
Offline
Activity: 85
Merit: 10
|
|
January 22, 2014, 10:09:44 AM |
|
so sorry I forgot to say thanks for the new v111 cudaminer I have tried it and it runs no diff but thanks for all of your very hard work
very very greatfull to everybody for your help.
|
|
|
|
justafool76
Member
Offline
Activity: 85
Merit: 10
|
|
January 22, 2014, 10:18:34 AM |
|
oops I did it again forgot to thank patoberli you rock
|
|
|
|
ktf
Newbie
Offline
Activity: 24
Merit: 0
|
|
January 22, 2014, 10:25:18 AM |
|
Hi Christian,
Any idea why the cudaminer fails when I run it with -l parameter ? If I let it autotune with -L 2 and I see what value it select and try to start it again manually using that value, I get loads of errors :
[2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaEventRecord(context_serialize[stream][thr_id], context_streams[stream][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 820) [2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaMemcpyAsync(X, context_odata[stream][thr_id], mem_size, cudaMemcpyDeviceToHost, context_streams[stream][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 852) [2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaStreamQuery(context_streams[stream][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 826) [2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaStreamSynchronize(context_streams[0][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 163) [2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaStreamSynchronize(context_streams[1][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 164)
I used :
cudaminer.exe --algo=scrypt-jane -d 1 -l K59x2 -H 0 -o stratum+tcp://yac.coinmine.pl:9088 -O user:pwd
With :
cudaminer.exe --algo=scrypt-jane -d 1 -l K59x1 -H 0 -o stratum+tcp://yac.coinmine.pl:9088 -O user:pwd
it works, but ofc it is way too slow.
And with :
cudaminer.exe --algo=scrypt-jane -d 1 -L 2 -H 0 -o stratum+tcp://yac.coinmine.pl:9088 -O user:pwd
it works, but sometimes it doesn't select the best performance, plus it takes quite a long time to autotune.
|
|
|
|
EndymioN666
Newbie
Offline
Activity: 11
Merit: 0
|
|
January 22, 2014, 10:25:50 AM |
|
At first glance, 111 is much slower then the previous... but im off to work dont have really much time to test it right now but from 0.53 went to 0.18/0.20.
|
|
|
|
bathrobehero
Legendary
Offline
Activity: 2002
Merit: 1051
ICO? Not even once.
|
|
January 22, 2014, 10:57:16 AM |
|
Edit: I have broken 6Kh/s, but only about 80% were validated nice to have a high range but 80% of 6 is 4.8 so no real benefit lol. I am also having some validation issues with -L 5 on my GTX 780 Ti cards at 4.7 kHash/s. I wonder what is causing this. Christian I haven't tested this, but I suspect it's caused by too much overclock.
|
Not your keys, not your coins!
|
|
|
cbuchner1 (OP)
|
|
January 22, 2014, 10:58:16 AM |
|
cudaminer.exe --algo=scrypt-jane -d 1 -l K59x2 -H 0 -o stratum+tcp://yac.coinmine.pl:9088 -O user:pwd
You forgot an -L 2 there. The -L is not yet rolled into the kernel launch configurations. This is intended, but not done yet. Later on the launch config might look like this instead -K59x2/2. Then the only use for passing -L would be to tell autotune about the intended Lookup gap.
|
|
|
|
cbuchner1 (OP)
|
|
January 22, 2014, 10:58:54 AM |
|
I haven't tested this, but I suspect it's caused by too much overclock.
except that I am barely overclocking them. It must be some kind of code bug. Christian
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
January 22, 2014, 11:21:23 AM |
|
Two new experimental kernels added to github - currently for Linux only. The Visual C++ project has not yet been updated. You will want to run ./autogen.sh and configure after doing a git pull.
"Z" code submission by nVidia for Compute 3.5 devices (GTX 780 etc...). Good for scrypt. "Y" code submission by nVidia, modified to run on Compute 3.0 devices also. Good for scrypt.
I find that scrypt-jane still runs faster with the "X" (Fermi) and "K/T" (Kepler/Titan) kernels from the current github code.
Test away... Especially the Z kernel is expected to rule. I haven't tested it yet in detail. Best config for "Z" is No. of SMX x 24, according to the engineer who wrote it. Best config for "Y" is (guessing) No. of SMX x 32 - or just autotune.
The Z kernel is best run with -C 0 (it supports C 1 and C2, but that is mostly pointless).
When you make kHash/s benchmarks compare with the best scrypt values achieved with the 2013-12-18 release.
I got 86 kHash/s on GTX 750M with the -C2 flag and -l Y4x32 in some quick tests, which might be slightly faster than what the 2013-12-18 release delivered.
Christian
I did a rapid test on the Z kernel on the gtx780ti on windows (I just added the line to vcxproj and vcxproj.user in the same way it was done for the other kernel using the compiler option given in the nv_kernel). It is slightly faster I was able to get to 724khash/s (against 700~705khash/s I usually got with the post lookup_gap files). However, the core clock runs a bit higher which may-be the reason why I get that extra 20 khash/s. However, I have a problem, each time I give it a config (the one it just found in the previous run), it says it does not validate and start a new autotune...
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
Flo354
Newbie
Offline
Activity: 12
Merit: 0
|
|
January 22, 2014, 11:33:39 AM |
|
Thank you for the commit ! However, I'm experiencing some problems with this commit. First : The hasharate is slower than the other unofficial commit And sometime my NVIDIA pilote crash.
|
|
|
|
cbuchner1 (OP)
|
|
January 22, 2014, 11:33:55 AM Last edit: January 22, 2014, 12:07:42 PM by cbuchner1 |
|
However, I have a problem, each time I give it a config (the one it just found in the previous run), it says it does not validate and start a new autotune...
fixed already [2014-01-22 12:32:31] GPU #0: GeForce GTX 660 Ti with compute capability 3.0 [2014-01-22 12:32:31] GPU #0: interactive: 0, tex-cache: 0 , single-alloc: 1 [2014-01-22 12:32:31] GPU #0: 32 hashes / 4.0 MB per warp. [2014-01-22 12:32:31] GPU #0: using launch configuration Y21x28 [2014-01-22 12:32:31] GPU #0: GeForce GTX 660 Ti, 155.33 khash/s [2014-01-22 12:32:42] GPU #0: GeForce GTX 660 Ti, 299.98 khash/s [2014-01-22 12:32:42] accepted: 1/1 (100.00%), 299.98 khash/s (yay!!!)
[2014-01-22 12:45:40] GPU #0: GeForce GTX 660 Ti with compute capability 3.0 [2014-01-22 12:45:40] GPU #0: interactive: 0, tex-cache: 1D, single-alloc: 1 [2014-01-22 12:45:40] GPU #0: 32 hashes / 4.0 MB per warp. [2014-01-22 12:45:40] GPU #0: using launch configuration Y14x32 [2014-01-22 12:45:40] GPU #0: GeForce GTX 660 Ti, 153.06 khash/s [2014-01-22 12:46:00] GPU #0: GeForce GTX 660 Ti, 304.43 khash/s [2014-01-22 12:46:00] accepted: 1/1 (100.00%), 304.43 khash/s (yay!!!)
A GTX 660Ti (Asus Direct CU II OC) breaking 300 kHash/s on Linux. Nice. In comparison, here is the 29ae4821fc31e8e55060f8aed7f8ae13e33b1827 revision from github (the one before I started committing anything scrypt-jane related). This one already supports the texture cache in David Andersen's kernels. [2014-01-22 13:02:08] GPU #0: GeForce GTX 660 Ti with compute capability 3.0 [2014-01-22 13:02:08] GPU #0: interactive: 0, tex-cache: 1D, single-alloc: 1 [2014-01-22 13:02:08] GPU #0: using launch configuration K14x32 [2014-01-22 13:02:08] GPU #0: GeForce GTX 660 Ti, 103.61 khash/s [2014-01-22 13:02:32] GPU #0: GeForce GTX 660 Ti, 269.33 khash/s
So that's a 13% improvement then? NOTE: The nVidia submitted kernels are now also in the Windows project files. We're now in the strange situation that scrypt-jane and scrypt require completely different kernel implementations to run at best efficiency. I need to think about how I can come up with a good auto-selection of kernels based on whether scrypt or scrypt-jane is used.
|
|
|
|
CaptainBeck
|
|
January 22, 2014, 11:47:57 AM |
|
However, I have a problem, each time I give it a config (the one it just found in the previous run), it says it does not validate and start a new autotune...
fixed already [2014-01-22 12:32:31] GPU #0: GeForce GTX 660 Ti with compute capability 3.0 [2014-01-22 12:32:31] GPU #0: interactive: 0, tex-cache: 0 , single-alloc: 1 [2014-01-22 12:32:31] GPU #0: 32 hashes / 4.0 MB per warp. [2014-01-22 12:32:31] GPU #0: using launch configuration Y21x28 [2014-01-22 12:32:31] GPU #0: GeForce GTX 660 Ti, 155.33 khash/s [2014-01-22 12:32:42] GPU #0: GeForce GTX 660 Ti, 299.98 khash/s [2014-01-22 12:32:42] accepted: 1/1 (100.00%), 299.98 khash/s (yay!!!)
[2014-01-22 12:45:40] GPU #0: GeForce GTX 660 Ti with compute capability 3.0 [2014-01-22 12:45:40] GPU #0: interactive: 0, tex-cache: 1D, single-alloc: 1 [2014-01-22 12:45:40] GPU #0: 32 hashes / 4.0 MB per warp. [2014-01-22 12:45:40] GPU #0: using launch configuration Y14x32 [2014-01-22 12:45:40] GPU #0: GeForce GTX 660 Ti, 153.06 khash/s [2014-01-22 12:46:00] GPU #0: GeForce GTX 660 Ti, 304.43 khash/s [2014-01-22 12:46:00] accepted: 1/1 (100.00%), 304.43 khash/s (yay!!!)
A GTX 660Ti breaking 300 kHash/s. Nice. Not with my miners right now, but my 660ti break 300kh all the time, on the 18/12/13 code.
|
|
|
|
patoberli
Member
Offline
Activity: 106
Merit: 10
|
|
January 22, 2014, 11:54:23 AM |
|
so sorry I forgot to say thanks for the new v111 cudaminer I have tried it and it runs no diff but thanks for all of your very hard work
very very greatfull to everybody for your help.
For normal scrypt use the official release 2013-12-18. It's on the first page of this thread. At least for me that is the fastest for normal scrypt. Also there is not much to be done for tuning in that release. For middlecoin.com and the 2013-12-18 release I use this start line: cudaminer.exe -i 1 -H 2 -C 1 -l F8x16 -o stratum+tcp://middlecoin.com:3333 -O bitcoinaddress:password Please note that this line is for a Fermi based card (Quadro 4000) with 2 GB of VRAM on a desktop where I also work on (-i 1 and not 0).
|
YAC: YA86YiWSvWEGSSSerPTMy4kwndabRUNftf BTC: 16NqvkYbKMnonVEf7jHbuWURFsLeuTRidX LTC: LTKCoiDwqEjaRCoNXfFhDm9EeWbGWouZjE
|
|
|
cbuchner1 (OP)
|
|
January 22, 2014, 11:54:47 AM |
|
Not with my miners right now, but my 660ti break 300kh all the time, on the 18/12/13 code.
I did not apply extra overclocking (apart from the factory OC), as I am on Linux here.
|
|
|
|
CaptainBeck
|
|
January 22, 2014, 12:01:36 PM |
|
Not with my miners right now, but my 660ti break 300kh all the time, on the 18/12/13 code.
I did not apply extra overclocking (apart from the factory OC), as I am on Linux here. Ohhhh, yeah i'm applying default 1 for the overclock because it works the best. Is this the new code then i take it??
|
|
|
|
ther0x
Newbie
Offline
Activity: 1
Merit: 0
|
|
January 22, 2014, 12:05:31 PM |
|
Hi Guys I have a GTX 460 and with this setting (cudaminer.exe -H 1 -d 0 -i 1 -l F28x4 -C 1 -m 0 -o stratum+tcp://coinotron.com:3334 -O USER:PWD) I have reached 122 khash/s With new v111 cudamine and the same setting i reached just the half which could be the reason? Sorry, i'm a newbie Thanks!
|
|
|
|
cbuchner1 (OP)
|
|
January 22, 2014, 12:06:56 PM |
|
With new v111 cudamine and the same setting i reached just the half which could be the reason? Sorry, i'm a newbie Don't do scrypt mining with the current code from github. You want to stick with the official 2013-12-18 release, unless you are really into adventure and experiments.
|
|
|
|
|