Bitcoin Forum
November 03, 2024, 01:41:43 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 [142] 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 ... 1135 »
  Print  
Author Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]  (Read 3426930 times)
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 22, 2014, 09:11:23 AM
 #2821


thanks for trying to help me. I am very greatful

passing -C 2 might help a bit.

Also -i 0 if you can accept some sluggish video output.

Also remember the strongest configurations that autotune found for you and pass them with the -l flag.
Saves some time the next time you start it and it will always deliver the same performance.

Christian
Morgahl
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
January 22, 2014, 09:27:30 AM
 #2822

I built the latest commit (111) for you.
Please note that this comes without any warranties or anything. Donations please go to cbuchner!
Thanks @cbuchner for your continued work!
64-bit: https://www.dropbox.com/s/7qp3cwgufivu5jt/cudaminer_commit_111_x64.rar
32-bit: https://www.dropbox.com/s/z6aenjphoew7xs1/cudaminer_commit_111_x86.rar

Many Thanks for this.

Using Patoberli's build of commit 111 I was able to play around a bit. T kernel in Windows on my Titan is very unstable during autotune unfortunately anything that allocates more then 3GB of VRAM just crashes Cudaminer outright. Not sure what direct limitation is causing this but this is a consistent observation with several hours of manual configurations. The Titan Kernel also heavily favors multiples of the old T16x1 such as T64x1 -L 1, T64x2 -L 2, etc. Not sure why but it makes picking out optimal settings easy Smiley

On my Titan I was able to test and get 5.6-5.8 kh/s (varies but fairly even spread) using -i 0 -H 1 -l T32x8 -L 4 -a scrypt-jane:YAC with a mild Core OC of +250.

I will submit this and full details to the spreadsheet after a full night of stable submissions Smiley

Edit: I have broken 6Kh/s, but only about 80% were validated Sad nice to have a high range but 80% of 6 is 4.8 so no real benefit lol.
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 22, 2014, 09:57:44 AM
 #2823

Edit: I have broken 6Kh/s, but only about 80% were validated Sad nice to have a high range but 80% of 6 is 4.8 so no real benefit lol.

I am also having some validation issues with -L 5 on my GTX 780 Ti cards at 4.7 kHash/s. I wonder what is causing this.

Christian
justafool76
Member
**
Offline Offline

Activity: 85
Merit: 10


View Profile
January 22, 2014, 10:01:48 AM
 #2824

Two new experimental kernels added to github - currently for Linux only. The Visual C++
project has not yet been updated. You will want to run ./autogen.sh and configure after
doing a git pull.

"Z" code submission by nVidia for Compute 3.5 devices (GTX 780 etc...). Good for scrypt.
"Y" code submission by nVidia, modified to run on Compute 3.0 devices also. Good for scrypt.

I find that scrypt-jane still runs faster with the "X" (Fermi) and "K/T" (Kepler/Titan) kernels
from the current github code.

Test away... Especially the Z kernel is expected to rule. I haven't tested it yet in detail.
Best config for "Z" is No. of SMX x 24, according to the engineer who wrote it.
Best config for "Y" is (guessing) No. of SMX x 32   - or just autotune.

The Z kernel is best run with -C 0 (it supports C 1 and C2, but that is mostly pointless).

When you make kHash/s benchmarks compare with the best scrypt values achieved with the
2013-12-18 release.

I got 86 kHash/s on GTX 750M with the -C2 flag and -l Y4x32 in some quick tests, which
might be slightly faster than what the 2013-12-18 release delivered.

Christian


thanks for the help but you have lost me I don't understand what you mean

I am not as smart as you and others here.. here is my .bat file below what else should I be putting in the bat file

cudaminer -o stratum+tcp://asia.middlecoin.com:3333 -u 1MU4EAB6p5xcRPhZ8gFKZSq9znchJpt2iE -p 123

what else do I need to put to try to get a better hash rate.

my second lappy has a nvida gtx 670 m 3gb gpu and its getting about 75khps and has some thingy f56x2 and I use the same bat file I know its a different card so I know I will have to put some extra in it what do I do please can some help me please



justafool76
Member
**
Offline Offline

Activity: 85
Merit: 10


View Profile
January 22, 2014, 10:09:44 AM
 #2825

so sorry I forgot to say thanks for the new v111 cudaminer I have tried it and it runs no diff but thanks for all of your very hard work

very very greatfull to everybody for your help.

justafool76
Member
**
Offline Offline

Activity: 85
Merit: 10


View Profile
January 22, 2014, 10:18:34 AM
 #2826

oops I did it again  forgot to thank patoberli  you rock
ktf
Newbie
*
Offline Offline

Activity: 24
Merit: 0


View Profile
January 22, 2014, 10:25:18 AM
 #2827

Hi Christian,

 Any idea why the cudaminer fails when I run it with -l parameter ? If I let it autotune with -L 2 and I see what value it select and try to start it again manually using that value, I get loads of errors :

[2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaEventRecord(context_serialize[stream][thr_id], context_streams[stream][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 820)
[2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaMemcpyAsync(X, context_odata[stream][thr_id], mem_size, cudaMemcpyDeviceToHost, context_streams[stream][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 852)
[2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaStreamQuery(context_streams[stream][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 826)
[2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaStreamSynchronize(context_streams[0][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 163)
[2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaStreamSynchronize(context_streams[1][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 164)

 I used :

cudaminer.exe  --algo=scrypt-jane -d 1 -l K59x2  -H 0 -o stratum+tcp://yac.coinmine.pl:9088 -O user:pwd

With :

cudaminer.exe  --algo=scrypt-jane -d 1 -l K59x1  -H 0 -o stratum+tcp://yac.coinmine.pl:9088 -O user:pwd

it works, but ofc it is way too slow.

 And with :

cudaminer.exe  --algo=scrypt-jane -d 1 -L 2  -H 0 -o stratum+tcp://yac.coinmine.pl:9088 -O user:pwd

 it works, but sometimes it doesn't select the best performance, plus it takes quite a long time to autotune.
EndymioN666
Newbie
*
Offline Offline

Activity: 11
Merit: 0


View Profile
January 22, 2014, 10:25:50 AM
 #2828

At first glance, 111 is much slower then the previous... but im off to work dont have really much time to test it right now but from 0.53 went to 0.18/0.20.
bathrobehero
Legendary
*
Offline Offline

Activity: 2002
Merit: 1051


ICO? Not even once.


View Profile
January 22, 2014, 10:57:16 AM
 #2829

Edit: I have broken 6Kh/s, but only about 80% were validated Sad nice to have a high range but 80% of 6 is 4.8 so no real benefit lol.

I am also having some validation issues with -L 5 on my GTX 780 Ti cards at 4.7 kHash/s. I wonder what is causing this.

Christian


I haven't tested this, but I suspect it's caused by too much overclock.

Not your keys, not your coins!
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 22, 2014, 10:58:16 AM
 #2830


cudaminer.exe  --algo=scrypt-jane -d 1 -l K59x2  -H 0 -o stratum+tcp://yac.coinmine.pl:9088 -O user:pwd


You forgot an -L 2 there.

The -L is not yet rolled into the kernel launch configurations. This is intended, but not done yet.

Later on the launch config might look like this instead -K59x2/2. Then the only use for passing -L would be to tell autotune about the intended Lookup gap.
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 22, 2014, 10:58:54 AM
 #2831


I haven't tested this, but I suspect it's caused by too much overclock.

except that I am barely overclocking them. It must be some kind of code bug.

Christian
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
January 22, 2014, 11:21:23 AM
 #2832

Two new experimental kernels added to github - currently for Linux only. The Visual C++
project has not yet been updated. You will want to run ./autogen.sh and configure after
doing a git pull.

"Z" code submission by nVidia for Compute 3.5 devices (GTX 780 etc...). Good for scrypt.
"Y" code submission by nVidia, modified to run on Compute 3.0 devices also. Good for scrypt.

I find that scrypt-jane still runs faster with the "X" (Fermi) and "K/T" (Kepler/Titan) kernels
from the current github code.

Test away... Especially the Z kernel is expected to rule. I haven't tested it yet in detail.
Best config for "Z" is No. of SMX x 24, according to the engineer who wrote it.
Best config for "Y" is (guessing) No. of SMX x 32   - or just autotune.

The Z kernel is best run with -C 0 (it supports C 1 and C2, but that is mostly pointless).

When you make kHash/s benchmarks compare with the best scrypt values achieved with the
2013-12-18 release.

I got 86 kHash/s on GTX 750M with the -C2 flag and -l Y4x32 in some quick tests, which
might be slightly faster than what the 2013-12-18 release delivered.

Christian


I did a rapid test on the Z kernel on the gtx780ti on windows (I just added the line to vcxproj and vcxproj.user in the same way it was done for the other kernel using the compiler option given in the nv_kernel).
It is slightly faster I was able to get to 724khash/s (against 700~705khash/s I usually got with the post lookup_gap files).
However, the core clock runs a bit higher which may-be the reason why I get that extra 20 khash/s.

However, I have a problem, each time I give it a config (the one it just found in the previous run), it says it does not validate and start a new autotune...

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
Flo354
Newbie
*
Offline Offline

Activity: 12
Merit: 0


View Profile
January 22, 2014, 11:33:39 AM
 #2833

I built the latest commit (111) for you.
Please note that this comes without any warranties or anything. Donations please go to cbuchner!
Thanks @cbuchner for your continued work!
64-bit: https://www.dropbox.com/s/7qp3cwgufivu5jt/cudaminer_commit_111_x64.rar
32-bit: https://www.dropbox.com/s/z6aenjphoew7xs1/cudaminer_commit_111_x86.rar

Thank you for the commit !

However, I'm experiencing some problems with this commit.
First : The hasharate is slower than the other unofficial commit
And sometime my NVIDIA pilote crash.
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 22, 2014, 11:33:55 AM
Last edit: January 22, 2014, 12:07:42 PM by cbuchner1
 #2834


However, I have a problem, each time I give it a config (the one it just found in the previous run), it says it does not validate and start a new autotune...

fixed already Wink

Code:
[2014-01-22 12:32:31] GPU #0: GeForce GTX 660 Ti with compute capability 3.0
[2014-01-22 12:32:31] GPU #0: interactive: 0, tex-cache: 0 , single-alloc: 1
[2014-01-22 12:32:31] GPU #0: 32 hashes / 4.0 MB per warp.
[2014-01-22 12:32:31] GPU #0: using launch configuration Y21x28
[2014-01-22 12:32:31] GPU #0: GeForce GTX 660 Ti, 155.33 khash/s
[2014-01-22 12:32:42] GPU #0: GeForce GTX 660 Ti, 299.98 khash/s
[2014-01-22 12:32:42] accepted: 1/1 (100.00%), 299.98 khash/s (yay!!!)

[2014-01-22 12:45:40] GPU #0: GeForce GTX 660 Ti with compute capability 3.0
[2014-01-22 12:45:40] GPU #0: interactive: 0, tex-cache: 1D, single-alloc: 1
[2014-01-22 12:45:40] GPU #0: 32 hashes / 4.0 MB per warp.
[2014-01-22 12:45:40] GPU #0: using launch configuration Y14x32
[2014-01-22 12:45:40] GPU #0: GeForce GTX 660 Ti, 153.06 khash/s
[2014-01-22 12:46:00] GPU #0: GeForce GTX 660 Ti, 304.43 khash/s
[2014-01-22 12:46:00] accepted: 1/1 (100.00%), 304.43 khash/s (yay!!!)

A GTX 660Ti (Asus Direct CU II OC) breaking 300 kHash/s on Linux. Nice.

In comparison, here is the 29ae4821fc31e8e55060f8aed7f8ae13e33b1827 revision from github (the one before I started committing anything scrypt-jane related). This one already supports the texture cache in David Andersen's kernels.

Code:
[2014-01-22 13:02:08] GPU #0: GeForce GTX 660 Ti with compute capability 3.0
[2014-01-22 13:02:08] GPU #0: interactive: 0, tex-cache: 1D, single-alloc: 1
[2014-01-22 13:02:08] GPU #0: using launch configuration K14x32
[2014-01-22 13:02:08] GPU #0: GeForce GTX 660 Ti, 103.61 khash/s
[2014-01-22 13:02:32] GPU #0: GeForce GTX 660 Ti, 269.33 khash/s

So that's a 13% improvement then?

NOTE: The nVidia submitted kernels are now also in the Windows project files.

We're now in the strange situation that scrypt-jane and scrypt require completely different kernel implementations to run at best efficiency.  I need to think about how I can come up with a good auto-selection of kernels based on whether scrypt or scrypt-jane is used.
CaptainBeck
Full Member
***
Offline Offline

Activity: 168
Merit: 100


View Profile
January 22, 2014, 11:47:57 AM
 #2835


However, I have a problem, each time I give it a config (the one it just found in the previous run), it says it does not validate and start a new autotune...

fixed already Wink

Code:
[2014-01-22 12:32:31] GPU #0: GeForce GTX 660 Ti with compute capability 3.0
[2014-01-22 12:32:31] GPU #0: interactive: 0, tex-cache: 0 , single-alloc: 1
[2014-01-22 12:32:31] GPU #0: 32 hashes / 4.0 MB per warp.
[2014-01-22 12:32:31] GPU #0: using launch configuration Y21x28
[2014-01-22 12:32:31] GPU #0: GeForce GTX 660 Ti, 155.33 khash/s
[2014-01-22 12:32:42] GPU #0: GeForce GTX 660 Ti, 299.98 khash/s
[2014-01-22 12:32:42] accepted: 1/1 (100.00%), 299.98 khash/s (yay!!!)

[2014-01-22 12:45:40] GPU #0: GeForce GTX 660 Ti with compute capability 3.0
[2014-01-22 12:45:40] GPU #0: interactive: 0, tex-cache: 1D, single-alloc: 1
[2014-01-22 12:45:40] GPU #0: 32 hashes / 4.0 MB per warp.
[2014-01-22 12:45:40] GPU #0: using launch configuration Y14x32
[2014-01-22 12:45:40] GPU #0: GeForce GTX 660 Ti, 153.06 khash/s
[2014-01-22 12:46:00] GPU #0: GeForce GTX 660 Ti, 304.43 khash/s
[2014-01-22 12:46:00] accepted: 1/1 (100.00%), 304.43 khash/s (yay!!!)

A GTX 660Ti breaking 300 kHash/s. Nice.


Not with my miners right now, but my 660ti break 300kh all the time, on the 18/12/13 code.
patoberli
Member
**
Offline Offline

Activity: 106
Merit: 10


View Profile
January 22, 2014, 11:54:23 AM
 #2836

so sorry I forgot to say thanks for the new v111 cudaminer I have tried it and it runs no diff but thanks for all of your very hard work

very very greatfull to everybody for your help.


For normal scrypt use the official release 2013-12-18. It's on the first page of this thread. At least for me that is the fastest for normal scrypt.
Also there is not much to be done for tuning in that release.
For middlecoin.com and the 2013-12-18 release I use this start line:
cudaminer.exe -i 1 -H 2 -C 1 -l F8x16 -o stratum+tcp://middlecoin.com:3333 -O bitcoinaddress:password
Please note that this line is for a Fermi based card (Quadro 4000) with 2 GB of VRAM on a desktop where I also work on (-i 1 and not 0).

YAC: YA86YiWSvWEGSSSerPTMy4kwndabRUNftf
BTC: 16NqvkYbKMnonVEf7jHbuWURFsLeuTRidX
LTC: LTKCoiDwqEjaRCoNXfFhDm9EeWbGWouZjE
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 22, 2014, 11:54:47 AM
 #2837

Not with my miners right now, but my 660ti break 300kh all the time, on the 18/12/13 code.

I did not apply extra overclocking (apart from the factory OC), as I am on Linux here.
CaptainBeck
Full Member
***
Offline Offline

Activity: 168
Merit: 100


View Profile
January 22, 2014, 12:01:36 PM
 #2838

Not with my miners right now, but my 660ti break 300kh all the time, on the 18/12/13 code.

I did not apply extra overclocking (apart from the factory OC), as I am on Linux here.



Ohhhh, yeah i'm applying default 1 for the overclock because it works the best.

Is this the new code then i take it??
ther0x
Newbie
*
Offline Offline

Activity: 1
Merit: 0


View Profile
January 22, 2014, 12:05:31 PM
 #2839

Hi Guys

I have a GTX 460 and with this setting (cudaminer.exe -H 1 -d 0 -i 1 -l F28x4 -C 1 -m 0 -o stratum+tcp://coinotron.com:3334 -O USER:PWD)
I have reached 122 khash/s

With new v111 cudamine and the same setting i reached just the half
which could be the reason? Sorry, i'm a newbie Smiley

Thanks!
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 22, 2014, 12:06:56 PM
 #2840

With new v111 cudamine and the same setting i reached just the half
which could be the reason? Sorry, i'm a newbie Smiley

Don't do scrypt mining with the current code from github. You want to stick with the official 2013-12-18 release, unless you are really into adventure and experiments.


Pages: « 1 ... 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 [142] 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 ... 1135 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!