Bitcoin Forum
November 09, 2024, 12:33:57 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 [135] 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 ... 1135 »
  Print  
Author Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]  (Read 3426930 times)
ktf
Newbie
*
Offline Offline

Activity: 24
Merit: 0


View Profile
January 19, 2014, 03:51:41 PM
 #2681

It seems that running -L 2 it was set to K59x2, which was netting almost 3khash/s.

 If I try to specify however -l K59x2 I get errors :

[2014-01-19 17:49:25] GPU #1: cudaError 4 (unspecified launch failure) calling ' cudaStreamSynchronize(context_streams[1][thr_id])' (C:/__test/CudaMiner-master/s alsa_kernel.cu line 164)

 I tried with different values and I get the same error. It only works if I don't use the -l flag.
Silverwolf_Ru
Full Member
***
Offline Offline

Activity: 120
Merit: 100

Astrophotographer and Ham Radioist!


View Profile
January 19, 2014, 03:59:48 PM
 #2682

OP, how about autotune crashing on Fermi kernels? I think they need some love as well, any news on their progress?

Bitcoin: 17kz4pWKoMoVupGUYgj8kGomxXUkDHNtVe
Shadowcoin: Seta8CFwP6yvbeCkgfjxXjpkokrQMQovGF ~Coin of the Future!
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 19, 2014, 04:26:57 PM
 #2683

The lookup gap has turned my 10 kHash/s 450 Watts Yacoin mining rig into a devilish 14 kHash/s 666 Watts mining rig. Not quite as high as I had hoped for, but the new Wattage is nice.

I run GTX 780 with -L 6 -l 12x32    up to 3.65 kHash/s
and GTX 780Ti with -L 6 -l 15x32   up to 4.7 kHash/s

still quite an easy to remember formula with a decent performance. There may be better values but that is what I found within an hour of tinkering.

Christian

ManIkWeet
Full Member
***
Offline Offline

Activity: 182
Merit: 100


View Profile
January 19, 2014, 04:35:23 PM
 #2684

The lookup gap has turned my 10 kHash/s 450 Watts Yacoin mining rig into a devilish 14 kHash/s 666 Watts mining rig. Not quite as high as I had hoped for, but the new Wattage is nice.

I run GTX 780 with -L 6 -l 12x32    up to 3.65 kHash/s
and GTX 780Ti with -L 6 -l 15x32   up to 4.7 kHash/s

still quite an easy to remember formula with a decent performance. There may be better values but that is what I found within an hour of tinkering.

Christian


I am sure you can squeeze more out of your GTX 780, I get 3.87-3.90 khash/s with -l T64x2 -b 8192 -L 2 -i 0 --algo=scrypt-jane.

BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
January 19, 2014, 04:56:00 PM
 #2685

The lookup gap has turned my 10 kHash/s 450 Watts Yacoin mining rig into a devilish 14 kHash/s 666 Watts mining rig. Not quite as high as I had hoped for, but the new Wattage is nice.

I run GTX 780 with -L 6 -l 12x32    up to 3.65 kHash/s
and GTX 780Ti with -L 6 -l 15x32   up to 4.7 kHash/s

still quite an easy to remember formula with a decent performance. There may be better values but that is what I found within an hour of tinkering.

Christian


Here what I got with my 780ti: L3   29x7  => 4,78 khash/s
                                                   L4  137x2 => 5.09
                                                   L5  169x2 => 5.1
                                                   L6  60x8   => 5.22
In principle there should be somewhat better timing. In script the best one are multiple of the cuda cores number (no reason it doesn't work this way for scrypt-jane).
I can't monitor the power usage on linux, but I use a self modbios to allow up to 150% of the tdp, but I don'tthink it has any impact, since I can't change the power limit)

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
primeomega
Member
**
Offline Offline

Activity: 63
Merit: 10


View Profile
January 19, 2014, 06:30:32 PM
 #2686

Call me stupid, but why all of a sudden did YAC become a thing?  I use cudaminer for a while mining alt coins, and check this thread once in while.  But it's all about yac now.  Is it the most profitable coin to mine with a Nvid card now? Did not see it traded on Cryptsy at all, so not sure on what it's all about.   Huh

Join Cryptsy today: https://www.cryptsy.com/users/register?refid=102496
BTC: 139MV449UuLAcdndGATzDfwZ6aJTtgcJv8
LTC: LS1D74RomV9CmoMEBiomQZX9Hywqe8Ndq8
bathrobehero
Legendary
*
Offline Offline

Activity: 2002
Merit: 1051


ICO? Not even once.


View Profile
January 19, 2014, 06:35:43 PM
 #2687

CudaMiner at the moment is the strongest around an N factor of 14 (compared to ATI/AMD GPU's and CPU's) and YaC is the only one around which makes it the most profitable.
YaC has some issues though so I'm waiting for other coins to get close to N 14.


On another note, if anyone wants to speed up the autotuning process for the cost of some accuracy, you could decrease the number of measurements in salsa_kernel.cu (538)
Code:
while (repeat < 3)  // average up to 3 measurements for better exactness


Also, you can interrupt autotuning with CTRL+C in windows anytime and while it will close cudaMiner, it will show you the best kernel launch config it has found up to that point (handy for skipping the last part in some cases).


Not your keys, not your coins!
relm9
Hero Member
*****
Offline Offline

Activity: 840
Merit: 1000



View Profile
January 19, 2014, 06:42:16 PM
 #2688

Really? Dude drop the entitlement...

Excuse me, but you need to drop something yourself. That being the assumption that you know my motives or what type of person I am. You don't, so knock it off.

It was a sort of tongue-in-cheek comment, but I can see how the humor doesn't come across very well without knowing the intent of the post. If it were intended as you framed it, why would I follow up the comment with a polite request for updated binaries? Anyway I'm getting the prerequisites together as we speak s I can compile it myself. I was not aware that a trial of VS2010 could be used to compile, but now I know.

Thanks for the snap judgment, though. Makes my day when some snooty know-it-all gets something totally wrong. Next time drop the egoistic notion that you've got everything figured out, and you'll be less likely to make the same mistake again.

Thanks cbuchner1 for your continued effort.

Ok - I just don't find posts like that constructive when you could have just asked for help instead (I compiled a version of this for a guy that asked). You're right I shouldn't have judged what type of person you are from that post. I apologize, let's move on.

On-topic: I tried the new build today, getting up to 4.5kh/s with T68x4 and -L4 on a GTX780. It usually hovers more around 4.3.

bathrobehero
Legendary
*
Offline Offline

Activity: 2002
Merit: 1051


ICO? Not even once.


View Profile
January 19, 2014, 07:48:10 PM
 #2689

On-topic: I tried the new build today, getting up to 4.5kh/s with T68x4 and -L4 on a GTX780. It usually hovers more around 4.3.

Hovering or jittering to me occurs when there's too much memory being used or at least it's borderline.
So for example for me N 14 with L 3 results in 181 warps.
Autotune comes up with K59x3 (= 177) which results in a very stable hashrate, using 1931 VRAM. (using the default 3 measurements)
But using K10x18 (= 180) jitters a bit but on average it's better, even if the VRAM usage keeps jumping between 1942-1963, which if I have to guess is causing the jittering.

Here's a screenshot (with minimum/average/maximum hashrates added in the brackets).

So in addition to my previous post, you can find these borderline kernel configs if you don't touch, or maybe even increase the number of measurements done by autotune, but if you're card is used as primary (has a monitor attached to it), you will be fine with a less accurate autotune since VRAM usage is not static (desktop, background apps, etc).


Also, I guess most of us have their cards overclocked at this point but as the new lookup gap puts more pressure on the cards, our pre-lookup gap overclocks are not that stable anymore, causing crashes.

Not your keys, not your coins!
manofcolombia
Member
**
Offline Offline

Activity: 84
Merit: 10

SizzleBits


View Profile WWW
January 19, 2014, 08:08:21 PM
 #2690

When I go to compile to get lookup_gap I end up with this error

C:\Users\Zak Lantz\Desktop\cudaminer_vc2010_prerequisites\CudaMiner-master\cudaminer.vcxproj : error  : Unable to read the project file "cudaminer.vcxproj".
C:\Users\Zak Lantz\Desktop\cudaminer_vc2010_prerequisites\CudaMiner-master\cudaminer.vcxproj(50,5): The imported project "C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V120\BuildCustomizations\CUDA 5.5.props" was not found. Confirm that the path in the <Import> declaration is correct, and that the file exists on disk.

I understand what the error is because its an error that its not finding cuda installed because I have it installed on my H drive since my C is a 120 gb SSD so how would I point Visual Studio to look where CUDA is actually installed?

ktf
Newbie
*
Offline Offline

Activity: 24
Merit: 0


View Profile
January 19, 2014, 08:13:31 PM
 #2691

Anyone having issues with the YAC wallet ? Mine crashes as soon as I start it on windows 7 64 bit...
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 19, 2014, 10:12:58 PM
Last edit: January 19, 2014, 11:03:21 PM by cbuchner1
 #2692

Should we roll the Lookup-Gap into kernel launch configurations?

how does T12x32/6 look like to you? ;-)

No issues with the YAC wallet on Windows here, but mine does start horribly slowly on Linux (takes up to an hour). I pulled it from the official PPA repository for stable builds.

The reason for autotune crashes on Windows with lookup gap seems to be rising memory usage during the autotune process. e.g on my 780Ti as soon as the "Memory Used" value shown in GPU-z hits 3072MB, the driver will crash. I could fix it by adding a configurable "backoff" parameter in percent. The default value on Windows should be higher than on Linux, probably around 10% on Windows and 2% on Linux. Alternatively I could allow giving the backoff in MB also.

For a very quick fix in the current source code, increment the parameter 2 in this for loop in salsa_kernel.cu to something higher - like e.g. 2*LOOKUP_GAP. It should fix auto-tuning when single-memory allocation is not enabled.

Code:
                for (int i=0; warp > 0 && i < 2; ++i) {
                    warp--;
                    checkCudaErrors(cudaFree(h_V[thr_id][warp]-h_V_extra[thr_id][warp]));
                    h_V[thr_id][warp] = NULL; h_V_extra[thr_id][warp] = 0;
                }

UPDATE: I also find that CUDA sometimes kills the autotuning process with the error message "the launch timed out and was terminated. This might be fixed by auto-tuning with smaller batchsize (-b) parameters, like e.g. 1024. CUDA has a watchdog timer that will kill kernel calls that take longer than 5 seconds. This is to avoid permanent display freeze when some computation gets stuck.

I am also considering to also allow specifying the devices like in the following example because whenever I swap cards around on my mainboards, all the device IDs get shuffled by CUDA which is annoying. The strings however would keep working as is, unless you remove the card with the given name.

-d "GT 640, GTX 780 Ti, GTX 660 Ti, GTX 660 Ti#2"

Christian
orrett3
Newbie
*
Offline Offline

Activity: 33
Merit: 0


View Profile
January 19, 2014, 10:22:00 PM
 #2693

Anyone having issues with the YAC wallet ? Mine crashes as soon as I start it on windows 7 64 bit...

What is the error you're getting if there is one?

I was getting not able to load block index, but was able to fix it.
Magister1
Newbie
*
Offline Offline

Activity: 9
Merit: 0


View Profile
January 20, 2014, 12:01:01 AM
 #2694

Should we roll the Lookup-Gap into kernel launch configurations?

I am also considering to also allow specifying the devices like in the following example because whenever I swap cards around on my mainboards, all the device IDs get shuffled by CUDA which is annoying. The strings however would keep working as is, unless you remove the card with the given name.

-d "GT 640, GTX 780 Ti, GTX 660 Ti, GTX 660 Ti#2"

Christian


This is your baby, but those sound like good ideas, in addition to the idea about setting warp ranges for auto tuning.

I would suggest clarifying/cleaning up the display and help pages for new people. You are beginning to make a real dent in the struggle for viable NVidia mining and getting attention across the web. Your baby ought to look its best, right? Maybe once you do a new release even open a new thread (with a link to this one obviously) so people aren't overwhelmed by 130+ pages of old comments pertaining mainly to old versions.

Keep up the good work!

PS. Do you take Yacoin donations?
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 20, 2014, 12:36:09 AM
 #2695

Keep up the good work!

PS. Do you take Yacoin donations?

Yeah, you can donate to YBQ4hrUQqEb2EDip1NFwMAgZbvK8hJx5Tn

Good idea about starting a new thread for the scrypt-jane enabled cudaminer, once it is released.

I have made some changes to autotune reliability and speed. It will not assign less blocks than half the multiprocessor count in your card. For example on a GTX 780 it will start autotuning at 6 blocks now (the card has 12 SMX).

Also I made changes to how memory is allocated. The backoff value on Windows is currently 12% of the largest allocation it was able to make. On Linux it is a mere 2%. If I don't back off, autotune will crash pretty badly. It can still occasionally crash with launch timeouts though.

I find that my GTX 660Ti is a better investment than my new GTX 780 card (3 GB each, but 7 vs 12 SMX). At -L 2 the 660Ti totally beats my 780. Meh.

My GT 660 Ti uses -L 2 -l K64x2 -C 1 -b 32768 -i 0 and gets 3.7 kHash/s

Christian
ozie
Full Member
***
Offline Offline

Activity: 239
Merit: 103


View Profile
January 20, 2014, 12:42:54 AM
 #2696

No issues with the YAC wallet on Windows here, but mine does start horribly slowly on Linux (takes up to an hour). I pulled it from the official PPA repository for stable builds.

There is a new stable release on github which speeds up the time it takes to open the wallet on Linux. Not sure if it is in PPA already.
Magister1
Newbie
*
Offline Offline

Activity: 9
Merit: 0


View Profile
January 20, 2014, 12:49:58 AM
 #2697

Keep up the good work!

PS. Do you take Yacoin donations?

Yeah, you can donate to YBQ4hrUQqEb2EDip1NFwMAgZbvK8hJx5Tn

Good idea about starting a new thread for the scrypt-jane enabled cudaminer, once it is released.

I have made some changes to autotune reliability and speed. It will not assign less blocks than half the multiprocessor count in your card. For example on a GTX 780 it will start autotuning at 6 blocks now (the card has 12 SMX).

Also I made changes to how memory is allocated. The backoff value on Windows is currently 12% of the largest allocation it was able to make. On Linux it is a mere 2%. If I don't back off, autotune will crash pretty badly. It can still occasionally crash with launch timeouts though.

I find that my GTX 660Ti is a better investment than my new GTX 780 card (3 GB each, but 7 vs 12 SMX). At -L 2 the 660Ti totally beats my 780. Meh.

My GT 660 Ti uses -L 2 -l K64x2 -C 1 -b 32768 -i 0 and gets 3.7 kHash/s

Christian


Donation sent.

In case you guys didn't know they just released an update to the Yacoin wallet 0.42.
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
January 20, 2014, 01:27:09 AM
 #2698

I just tried to run the latest version on windows on scrypt with  my newest config of yesterday without L and it seems I lost 100khash/h (was running at 700 (OC...) and now it barely makes 600...)
Do I need to retune ? Or something has changed more drastically ?

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
muliukov
Newbie
*
Offline Offline

Activity: 55
Merit: 0


View Profile
January 20, 2014, 01:36:29 AM
 #2699

Sorry for question, but can you help to create cudaminer for Microcoin? As I see it must be like for YAC so it won't be difficult, but I never did it before and have no skills Sad
orrett3
Newbie
*
Offline Offline

Activity: 33
Merit: 0


View Profile
January 20, 2014, 01:37:21 AM
 #2700

I just tried to run the latest version on windows on scrypt with  my newest config of yesterday without L and it seems I lost 100khash/h (was running at 700 (OC...) and now it barely makes 600...)
Do I need to retune ? Or something has changed more drastically ?

I would try using autotune to get another config and see what happens.
Pages: « 1 ... 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 [135] 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 ... 1135 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!