Bitcoin Forum
April 23, 2024, 06:52:23 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 [187] 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 ... 1135 »
  Print  
Author Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]  (Read 3426867 times)
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
February 04, 2014, 01:34:18 PM
 #3721

A brave tester with 8 Fermi cards Tesla M2090 (thanks Choseh) just figured out the performance regression between 2013-12-18 and 2014-02-02.

If you change the #if 0 in the fermi_kernel.cu to #if 1 (thereby enabling the previous version of the Salsa20/8 round function) you should see the previous performance figures again. Those who can compile the code themselves and want to mine on Fermi are welcome to make this change themselves.

also there seems to be a bug in the autotuning code in salsa_kernel.cu

                            hash_sec = (double)WU_PER_LAUNCH / tdelta;

should very likely be

                            hash_sec = (double)WU_PER_LAUNCH * repeat / tdelta;

to factor in the number of repetitions in the measurement (we want to measure for 50ms minimum for better timer accuracy). So autotune was drunk after all!

So, it seems I should release fixes (new binary release) for these problems tonight.

Christian

Yes, It works better this way. However there are still the problem with power increase between config but it is less apparent.
(strangely, I don't have that problem with the gtx660, its power stays at 100% and doesn't fluctuate)



djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713898343
Hero Member
*
Offline Offline

Posts: 1713898343

View Profile Personal Message (Offline)

Ignore
1713898343
Reply with quote  #2

1713898343
Report to moderator
1713898343
Hero Member
*
Offline Offline

Posts: 1713898343

View Profile Personal Message (Offline)

Ignore
1713898343
Reply with quote  #2

1713898343
Report to moderator
morbooo
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
February 04, 2014, 01:43:01 PM
 #3722

I think your bug report is the one that made my mind go http://www.digitalsherpa.com/wp-content/uploads/2012/11/lightbulb1.gif

The CUDA constant memory (the c_N loop trip count, etc...) of most CUDA kernels is only initialized properly for the first GPU (use of a single static variable to mark initialization instead of a thread-specific static variable). Which explains the majority of the crashes people are seeing with multi-GPU. Thank you. The Fermi owners use a kernel that doesn't yet make use of such constants, and hence the multi-GPU support is working fine for them.

So this is also on the FIXME list for tonight.
Awesome, looking forward to the fix. Thanks for the support Smiley

However I think that in your case where you run two cudaminer instances this cannot be the root cause. So we will have to keep looking.
Oh no I don't run two instances, I meant that one of the GPU's within the same cudaMiner instance produced invalid results. Which is in line with your explanation above. Running two instances of cudaMiner (one for each GPU) actually works perfectly, so this also confirms your hypothesis.
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
February 04, 2014, 01:54:05 PM
 #3723

Mate any idea why although
I have 2 GTX 780 (and two cuda miners) one shows 520 khps and the other 605? Can it be that the one that the monitor is plugged loses hash power because of it? Any idea what is going on?

I have 3 GTX 780Ti in one PC and two of them hash 10-20 kHash/s less than the fastest one. I attribute this to subtle differences in the PCI express connectivity.

But 100 kHash/s difference - ouch? played with the -H options yet?
trell0z
Newbie
*
Offline Offline

Activity: 43
Merit: 0


View Profile
February 04, 2014, 02:11:09 PM
 #3724

Mate any idea why although
I have 2 GTX 780 (and two cuda miners) one shows 520 khps and the other 605? Can it be that the one that the monitor is plugged loses hash power because of it? Any idea what is going on?

I have 3 GTX 780Ti in one PC and two of them hash 10-20 kHash/s less than the fastest one. I attribute this to subtle differences in the PCI express connectivity.

But 100 kHash/s difference - ouch? played with the -H options yet?

Have you guys monitored your cards in afterburner? The topmost cards might be throttling more than the bottom one, or just that they boost to different mhz. Custom bios with disabled boost is awesome in general.
bathrobehero
Legendary
*
Offline Offline

Activity: 2002
Merit: 1051


ICO? Not even once.


View Profile
February 04, 2014, 02:24:11 PM
 #3725

Mate any idea why although
I have 2 GTX 780 (and two cuda miners) one shows 520 khps and the other 605? Can it be that the one that the monitor is plugged loses hash power because of it? Any idea what is going on?

Primary cards are always going to perform worse as they are stressed by the OS, your browser, background apps and so on. Also,the -H flag could cause it so try -H 2 to exclude the CPU. If you're not using risers, chances are one of your card is hotter than the other, or at least requires a higher fan speed to keep it at lower temps so the fans are using more power on one card which very well means lower core frequencies when it comes to kepler. And those are not the only possible explanations, but my brake is over...

Not your keys, not your coins!
xblackdemonx
Newbie
*
Offline Offline

Activity: 6
Merit: 0


View Profile
February 04, 2014, 02:41:34 PM
Last edit: February 04, 2014, 03:06:01 PM by xblackdemonx
 #3726

Hi, i'm using 2xGTX560 here

with 2013-12-10 version I get about 290kh/s
with 2013-12-18 version I get about 310kh/s but it freezes often
with 2014-02-02 version I get about 270kh/s  Huh

I'm using: cudaminer.exe -d 0,1 -i 0,0 -l F7x16,F7x16 -H 1,1 -C 1,1
lordaccess
Member
**
Offline Offline

Activity: 69
Merit: 10


View Profile
February 04, 2014, 02:42:36 PM
 #3727

Mate any idea why although
I have 2 GTX 780 (and two cuda miners) one shows 520 khps and the other 605? Can it be that the one that the monitor is plugged loses hash power because of it? Any idea what is going on?

I have 3 GTX 780Ti in one PC and two of them hash 10-20 kHash/s less than the fastest one. I attribute this to subtle differences in the PCI express connectivity.

But 100 kHash/s difference - ouch? played with the -H options yet?

Since they are the same I use the same configuration. My problem is that if i start either one alone. It does reach 605. If I start em together They both reach 605 but after 2-3 minutes the gpu clock drops and the Voltage and the hash with them (drops to 520). It s the upper card and the the monitor is plugged in meaning the pcie is the most powerful. Also this card appears to have more temp (85C) than the one that works with max hash (75C). (but probably due to the limited space that it has to breath).

EDIT: I also saw the 2 next answers. Thanks I ll try to play with the H (althouh I doubt i ll see any difference).

YinCoin YangCoin ☯☯First Ever POS/POW Alternator! Multipool! ☯ ☯ http://yinyangpool.com/ 
https://bitcointalk.org/index.php?topic=623937
Bwincoin - 100% Free POS. BRz1SNnSs6bGkJkG4kvw5ADSin5dBat3Cx
tron666
Member
**
Offline Offline

Activity: 112
Merit: 10


View Profile
February 04, 2014, 02:46:34 PM
Last edit: February 04, 2014, 03:02:51 PM by tron666
 #3728

When recompiling this, is there anything wrong with doing a git pull, running autogen, configure and then make.  Or is it better to just delete and start from scratch.
Doesnt git make sure your not mixing any files?

BTW the new commits are definately improving the performance on Fermi, but still under what it was with 2014-01-20.

CCO   MNVPaetsHpxr97mRDqqPuV6PQoSVbFgPVE    NEM-test  TBZXHE-TD6AO6-PHSFZL-SZ7MWS-JEEI7C-EFCUC2-7Y7V
LTC    LcAQUMNhqDYesRRYMxMAsE5rhAAseDMDp7
XPM   AHDtLd993oYke4Zrm5dDG5WGtgyaUaMTCK
NXT   16706883867271464458
DOGE DBGiKBD1HZ8yfTdTcX5m8T7mY4X4cUVnEz
Lacan82
Sr. Member
****
Offline Offline

Activity: 247
Merit: 250


View Profile
February 04, 2014, 02:51:51 PM
 #3729

Well I just got ripped off of a YACoin block, it said the Yay!!! thing but the damn client never actually showed the block.
The client even said it found a block but it never appeared in my wallet, so sad...
Yacoin takes ~520 confirms. That usually takes a few hours after a found block.

The same happened to me with an UltraCoin block. My wallet lists 3 transaction in total, but only 2 incoming transactions from mining are actually displayed.

If you find a way to recover that missing transaction, please let me know.


command-line command:   -rescan                Rescan the block chain for missing wallet transactions


Have you tried this?

trell0z
Newbie
*
Offline Offline

Activity: 43
Merit: 0


View Profile
February 04, 2014, 02:53:15 PM
 #3730

Mate any idea why although
I have 2 GTX 780 (and two cuda miners) one shows 520 khps and the other 605? Can it be that the one that the monitor is plugged loses hash power because of it? Any idea what is going on?

I have 3 GTX 780Ti in one PC and two of them hash 10-20 kHash/s less than the fastest one. I attribute this to subtle differences in the PCI express connectivity.

But 100 kHash/s difference - ouch? played with the -H options yet?

Since they are the same I use the same configuration. My problem is that if i start either one alone. It does reach 605. If I start em together They both reach 605 but after 2-3 minutes the gpu clock drops and the Voltage and the hash with them (drops to 520). It s the upper card and the the monitor is plugged in meaning the pcie is the most powerful. Also this card appears to have more temp (85C) than the one that works with max hash (75C). (but probably due to the limited space that it has to breath).

EDIT: I also saw the 2 next answers. Thanks I ll try to play with the H (althouh I doubt i ll see any difference).

Have you tried undervolting your cards? With the new kernel I can undervolt massively and still have a high overclock (gtx 780). Currently running +310 on core which gives me 1254mhz, no memory oc, -50mv voltage which makes it 1.100 on load.
The lowered heat might make your cards stay at higher clocks more, you should also use afterburner to set the priority to the power target and not the temp target.
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
February 04, 2014, 04:25:19 PM
 #3731

Something strange (or not that's the question...).
For most of the coins, the formerly known as Z kernel is the fastest especially with script coins.
However, for Vertcoin (script:2048)  it is way much slower (difference>50khash) than the formerly known as T kernel.
Is there any reason for this ?

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
February 04, 2014, 04:31:39 PM
Last edit: February 04, 2014, 05:01:41 PM by cbuchner1
 #3732

Something strange (or not that's the question...).
For most of the coins, the formerly known as Z kernel is the fastest especially with script coins.
However, for Vertcoin (script:2048)  it is way much slower (difference>50khash) than the formerly known as T kernel.
Is there any reason for this ?

so you're saying the current "T" (alias name Z) kernel is slower than the current "t" kernel (formerly known as T) for VertCoin?

or do you compare current cudaminer performance with some older prerelease version?

either way, this is surprising. There should not be much of a difference between N=1024 and N=2048 scrypt coins, really. At high N the low register count kernels have a significant advantage - they reach higher occupancy under tight memory constraints. And they can do a lookup gap without running into much register pressure. But N=2048 isn't high...

Christian
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
February 04, 2014, 04:37:04 PM
 #3733

yes the Z kernel is the slowest for the Vertcoin (it has always been the case since it was introduced)

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
bathrobehero
Legendary
*
Offline Offline

Activity: 2002
Merit: 1051


ICO? Not even once.


View Profile
February 04, 2014, 05:44:47 PM
 #3734

either way, this is surprising. There should not be much of a difference between N=1024 and N=2048 scrypt coins, really. At high N the low register count kernels have a significant advantage - they reach higher occupancy under tight memory constraints. And they can do a lookup gap without running into much register pressure. But N=2048 isn't high...

Christian


GTX 660:
N:1024, Y5x32, ~240 kH/s
N:2048, Y5x32, ~128 kH/s

Edit: Y5x32 seems to be the fastest kernel/config, even though autotune tends to find Y5x28 the fastest most of the time.

Not your keys, not your coins!
sin242
Sr. Member
****
Offline Offline

Activity: 280
Merit: 250


View Profile
February 04, 2014, 06:03:03 PM
 #3735

either way, this is surprising. There should not be much of a difference between N=1024 and N=2048 scrypt coins, really. At high N the low register count kernels have a significant advantage - they reach higher occupancy under tight memory constraints. And they can do a lookup gap without running into much register pressure. But N=2048 isn't high...

Christian


GTX 660:
N:1024, Y5x32, ~240 kH/s
N:2048, Y5x32, ~128 kH/s

Edit: Y5x32 seems to be the fastest kernel/config, even though autotune tends to find Y5x28 the fastest most of the time.


Hi there, long time lurker.  Reg'd to post up for this.


I'm seeing the same trend on GTX 670s.
 
1024 would get me ~280 with best results from Y14x20
2048 it's ~133 with the best results from K7x32

Dark:  Xk9BoVerBd41JCjWQEhnxoowP7YNUK439z
BTC:  1JzPN2h8WGSi7kQeY5wuP4PjVD2hxkHJQM
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
February 04, 2014, 06:04:21 PM
 #3736


1024 would get me ~280 with best results from Y14x20
2048 it's ~133 with the best results from K7x32

isn't Y an alias for K? Wink
sin242
Sr. Member
****
Offline Offline

Activity: 280
Merit: 250


View Profile
February 04, 2014, 06:13:14 PM
 #3737


1024 would get me ~280 with best results from Y14x20
2048 it's ~133 with the best results from K7x32

isn't Y an alias for K? Wink


Actually, it's funny you mention that.  If i try to run Y7x32, it fails horribly aand crashes the driver

Dark:  Xk9BoVerBd41JCjWQEhnxoowP7YNUK439z
BTC:  1JzPN2h8WGSi7kQeY5wuP4PjVD2hxkHJQM
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
February 04, 2014, 06:21:25 PM
 #3738


1024 would get me ~280 with best results from Y14x20
2048 it's ~133 with the best results from K7x32

isn't Y an alias for K? Wink


Actually, it's funny you mention that.  If i try to run Y7x32, it fails horribly aand crashes the driver

according to the code, it shouldn't crash.... K and Y really do the same thing.

Code:
            switch (kernelid)
            {
                case 'T': case 'Z': *kernel = new NV2Kernel(); break;
                case 't':           *kernel = new TitanKernel(); break;
                case 'K': case 'Y': *kernel = new NVKernel(); break;
                case 'k':           *kernel = new KeplerKernel(); break;
                case 'F': case 'L': *kernel = new FermiKernel(); break;
                case 'f': case 'X': *kernel = new TestKernel(); break;
                case ' ': // choose based on device architecture
                    *kernel = Best_Kernel_Heuristics(props);
                break;
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
February 04, 2014, 06:22:00 PM
 #3739

either way, this is surprising. There should not be much of a difference between N=1024 and N=2048 scrypt coins, really. At high N the low register count kernels have a significant advantage - they reach higher occupancy under tight memory constraints. And they can do a lookup gap without running into much register pressure. But N=2048 isn't high...

Christian


GTX 660:
N:1024, Y5x32, ~240 kH/s
N:2048, Y5x32, ~128 kH/s

Edit: Y5x32 seems to be the fastest kernel/config, even though autotune tends to find Y5x28 the fastest most of the time.
same here with my gtx 660 oem 1.5gb for the kernel, although in 2048 it is rather Y6x20
but it is well known that the gtx660oem is not really a gtx660  Grin

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
sin242
Sr. Member
****
Offline Offline

Activity: 280
Merit: 250


View Profile
February 04, 2014, 06:28:09 PM
 #3740


1024 would get me ~280 with best results from Y14x20
2048 it's ~133 with the best results from K7x32

isn't Y an alias for K? Wink


Actually, it's funny you mention that.  If i try to run Y7x32, it fails horribly aand crashes the driver

according to the code, it shouldn't crash.... K and Y really do the same thing.

Code:
            switch (kernelid)
            {
                case 'T': case 'Z': *kernel = new NV2Kernel(); break;
                case 't':           *kernel = new TitanKernel(); break;
                case 'K': case 'Y': *kernel = new NVKernel(); break;
                case 'k':           *kernel = new KeplerKernel(); break;
                case 'F': case 'L': *kernel = new FermiKernel(); break;
                case 'f': case 'X': *kernel = new TestKernel(); break;
                case ' ': // choose based on device architecture
                    *kernel = Best_Kernel_Heuristics(props);
                break;




After tinkering some more K7x32 isn't working.  Has to be something on my end.  There's a seperate 670 in another machine that's happily hashing away with K7x32, but now the 670s won't. Going to reinstall drivers/cuda and see if anything changes

Dark:  Xk9BoVerBd41JCjWQEhnxoowP7YNUK439z
BTC:  1JzPN2h8WGSi7kQeY5wuP4PjVD2hxkHJQM
Pages: « 1 ... 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 [187] 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 ... 1135 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!