Bitcoin Forum
November 14, 2024, 12:10:55 AM *
News: Check out the artwork 1Dq created to commemorate this forum's 15th anniversary
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [26] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 ... 1135 »
  Print  
Author Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]  (Read 3426932 times)
afixane
Newbie
*
Offline Offline

Activity: 28
Merit: 0


View Profile
April 24, 2013, 05:35:29 AM
 #501

For anyone that have Optimus NVIDIA card and using Linux, executing ./cudaminer will return
Code:
Floating point exception (core dumped)

Here's my solution :

1. Install Bumblebee.
2. run cudaminer with optirun (yes, you can use primusrun, but it had no effect)
Code:
optirun cudaminer blablabla
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 24, 2013, 11:55:50 AM
Last edit: April 24, 2013, 12:15:55 PM by cbuchner1
 #502

is there any way to determine where I can (safely) set my texture-cache variable?

try -l 48x5 -C 2 then

The worst that could happen is a temporary driver crash (which it should recover from in 99% of all cases)

I had an idea how to further cut shared memory use. So Kepler based cards would see full occupancy on their SMX'es, maybe gaining 10% performance on these devices.

Christian

Aggrophobia
Legendary
*
Offline Offline

Activity: 1106
Merit: 1001



View Profile
April 24, 2013, 03:00:55 PM
 #503

new nvidia beta driver increases my khashes/s  Shocked 140x2



cudaMiner 22-04 third
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 24, 2013, 04:08:03 PM
 #504

new nvidia beta driver increases my khashes/s  Shocked 140x2



cudaMiner 22-04 third

the cudaminer directory says 2013-04-17 though.

zeroedout
Newbie
*
Offline Offline

Activity: 9
Merit: 0


View Profile
April 24, 2013, 06:26:01 PM
 #505

Thank you *very* much for this. I only have an Nvidia card and before your project mining anything was just stupid. Cgminer would give me ~28 Mh/s fot BTC but LTC or FC is was only ~2.0 KH/s :< Cudaminer gives me ~32 Kh/s!!!!! This is on a 9800GT.

tl;dr You are frekin awesome and have my gratitude! I have never been able to mine a full unit of any crypto currency but IMHO my best shot is Feather Coin. As soon as I'm able to mine some, I will be more than happy to donate a portion to you. Thanks again!!
Cheshyr
Full Member
***
Offline Offline

Activity: 168
Merit: 100


View Profile
April 24, 2013, 06:50:25 PM
 #506

new nvidia beta driver increases my khashes/s  Shocked 140x2



cudaMiner 22-04 third

the cudaminer directory says 2013-04-17 though.
nVidia released a new driver today.  He was talking about that, not the new version of cudaMiner

I'd like to echo the other thoughts here... this tool has made mining on nVidia cards enjoyable again.  Not epicuberdoomminer, but effective when it wasn't previously.  Thanks.
Lacan82
Sr. Member
****
Offline Offline

Activity: 247
Merit: 250


View Profile
April 24, 2013, 07:03:38 PM
 #507

new nvidia beta driver increases my khashes/s  Shocked 140x2



cudaMiner 22-04 third

the cudaminer directory says 2013-04-17 though.
nVidia released a new driver today.  He was talking about that, not the new version of cudaMiner

I'd like to echo the other thoughts here... this tool has made mining on nVidia cards enjoyable again.  Not epicuberdoomminer, but effective when it wasn't previously.  Thanks.

But the user was saying he was using cudaMINER 22-04 with new drivers, but the screenshots show it is running 2013-04-17. Unless he extracted to the same directory everytime.

Check the File Path in the menu bar

cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 24, 2013, 07:09:14 PM
 #508


So, lol. I touch scrypt_core_kernelA() a bit, rearrange shared memory and yay! The hash rate drops from 220kHash/s to 32kHash/s on my fastest card, a GTX 560Ti 448core edition.

Well, lol. Results are still correct though Wink

Christian
Lacan82
Sr. Member
****
Offline Offline

Activity: 247
Merit: 250


View Profile
April 24, 2013, 07:11:35 PM
 #509


So, lol. I touch scrypt_core_kernelA() a bit, rearrange shared memory and yay! The hash rate drops from 220kHash/s to 32kHash/s on my fastest card, a GTX 560Ti 448core edition.

Well, lol. Results are still correct though Wink

Christian


Sounds like you're on course to program for Apple! Cheesy

coalescent
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
April 24, 2013, 07:56:52 PM
 #510

My 680 was mining at 205 kH/s yesterday afternoon on the older, WHQL drivers. Ill be interested to see if the betas help performance at all.

@cbuchner1: if you need anything, don't hesitate to PM me (w/ email since I dont have PM privs yet).
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 24, 2013, 08:18:09 PM
 #511

So I eliminate shared memory alltogether from salsa kernel A, like this (pure elegance). Note the switch to uint4.

Code:
template <int WARPS_PER_BLOCK> __global__ void
scrypt_core_kernelA(uint4 *g_idata)
{
    int warpIdx        = threadIdx.x / warpSize;
    int warpThread     = threadIdx.x % warpSize;

    uint4 *V = (uint4*)(c_V[blockIdx.x * WARPS_PER_BLOCK + warpIdx] + SCRATCH*warpThread);
    g_idata += 8 * (blockIdx.x * WU_PER_BLOCK + warpIdx * WU_PER_WARP + warpThread);

    uint4 B[4], C[4]; // registers to store an entire work unit

#define idxloop __pragma(unroll 4) for (int idx=0; idx < 4; idx++)

    idxloop { *V++ = B[idx] = *g_idata++; }
    idxloop { *V++ = C[idx] = *g_idata++; }

    for (int i = 1; i < 1024; i++) {

        xor_salsa8_uint4(B, C); xor_salsa8_uint4(C, B);

        idxloop { *V++ = B[idx]; }
        idxloop { *V++ = C[idx]; }
    }
}

And I get 108 kHash/sec on the 560 Ti 448 core.

I'm like "WTF dude?!"  I expected 250 from this. PTX is more compact than ever, global memory loads and stores are all .v4.u32 (vectorized). So what's going wrong here?
Cheshyr
Full Member
***
Offline Offline

Activity: 168
Merit: 100


View Profile
April 24, 2013, 08:25:00 PM
 #512

But the user was saying he was using cudaMINER 22-04 with new drivers, but the screenshots show it is running 2013-04-17. Unless he extracted to the same directory everytime.

Check the File Path in the menu bar
I saw the file path bar.  Maybe he did extract it that way.  We have no way of knowing.

So I eliminate shared memory alltogether from salsa kernel A, like this (pure elegance). Note the switch to uint4.

Code:
template <int WARPS_PER_BLOCK> __global__ void
scrypt_core_kernelA(uint4 *g_idata)
{
    int warpIdx        = threadIdx.x / warpSize;
    int warpThread     = threadIdx.x % warpSize;

    uint4 *V = (uint4*)(c_V[blockIdx.x * WARPS_PER_BLOCK + warpIdx] + SCRATCH*warpThread);
    g_idata += 8 * (blockIdx.x * WU_PER_BLOCK + warpIdx * WU_PER_WARP + warpThread);

    uint4 B[4], C[4]; // registers to store an entire work unit

#define idxloop __pragma(unroll 4) for (int idx=0; idx < 4; idx++)

    idxloop { *V++ = B[idx] = *g_idata++; }
    idxloop { *V++ = C[idx] = *g_idata++; }

    for (int i = 1; i < 1024; i++) {

        xor_salsa8_uint4(B, C); xor_salsa8_uint4(C, B);

        idxloop { *V++ = B[idx]; }
        idxloop { *V++ = C[idx]; }
    }
}

And I get 108 kHash/sec on the 560 Ti 448 core.

I'm like "WTF dude?!"  I expected 250 from this.
Is this like the old sorting memory allocation problem?  Next Fit vs First Fit or whatever?  Sorting is expensive, so forcing organization is actually costing performance?
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 24, 2013, 08:30:35 PM
 #513

My bet is that this new code now violates the memory coalescing rules for CUDA devices, resulting in low throughput-

Back to the drawing board... Let's see if I can keep the elegance while maintaining memory coalescing.
falconae
Newbie
*
Offline Offline

Activity: 20
Merit: 0


View Profile
April 24, 2013, 09:02:44 PM
 #514

Thank you *very* much for this. I only have an Nvidia card and before your project mining anything was just stupid. Cgminer would give me ~28 Mh/s fot BTC but LTC or FC is was only ~2.0 KH/s :< Cudaminer gives me ~32 Kh/s!!!!! This is on a 9800GT.

tl;dr You are frekin awesome and have my gratitude! I have never been able to mine a full unit of any crypto currency but IMHO my best shot is Feather Coin. As soon as I'm able to mine some, I will be more than happy to donate a portion to you. Thanks again!!

Can you post your settings? I'm only pulling 15Kh that was up from 2 but still, seeing you 32 makes me curious
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 24, 2013, 09:05:56 PM
 #515

Can you post your settings? I'm only pulling 15Kh that was up from 2 but still, seeing you 32 makes me curious

on these old cards it makes a difference whether you run Windows XP, Linux --- or Windows Vista/7/8. It's not a settings issue - I believe the WDDM driver model has issues with these old cards.
Aggrophobia
Legendary
*
Offline Offline

Activity: 1106
Merit: 1001



View Profile
April 24, 2013, 09:57:57 PM
 #516

Quote

But the user was saying he was using cudaMINER 22-04 with new drivers, but the screenshots show it is running 2013-04-17. Unless he extracted to the same directory everytime.

Check the File Path in the menu bar


This! Extracted in the same directory, thats why i've edited my post
FalconFour
Full Member
***
Offline Offline

Activity: 176
Merit: 100



View Profile WWW
April 25, 2013, 02:08:59 AM
 #517

FYI: oughtta post the "put your results here" GDocs link into the first post so it's easy to find... I'm off to go look for it, since I just got access to a GTX 660 to play around with on cudaMiner. I'm installing Win7 on that test machine Smiley

edit: yeah, hidden on page 14, and I'm not even about to try following the discussion around to that post where I found the "sorted" one...

feed the bird: 187CXEVzakbzcANsyhpAAoF2k6KJsc55P1 (BTC) / LiRzzXnwamFCHoNnWqEkZk9HknRmjNT7nU (LTC)
Lacan82
Sr. Member
****
Offline Offline

Activity: 247
Merit: 250


View Profile
April 25, 2013, 02:25:12 AM
 #518

FYI: oughtta post the "put your results here" GDocs link into the first post so it's easy to find... I'm off to go look for it, since I just got access to a GTX 660 to play around with on cudaMiner. I'm installing Win7 on that test machine Smiley

edit: yeah, hidden on page 14, and I'm not even about to try following the discussion around to that post where I found the "sorted" one...

I fixed the sorted sheet. sort2  Smiley

FalconFour
Full Member
***
Offline Offline

Activity: 176
Merit: 100



View Profile WWW
April 25, 2013, 02:27:03 AM
 #519

FYI: oughtta post the "put your results here" GDocs link into the first post so it's easy to find... I'm off to go look for it, since I just got access to a GTX 660 to play around with on cudaMiner. I'm installing Win7 on that test machine Smiley

edit: yeah, hidden on page 14, and I'm not even about to try following the discussion around to that post where I found the "sorted" one...

I fixed the sorted sheet. sort2  Smiley

Right, but the problem is trying to find the link to these docs in this complete clusterfuck of a 27-page thread. Forum layout isn't always conducive to information archiving and retrieval... posts between the first two and last two pages are pretty much lost in history. :/

feed the bird: 187CXEVzakbzcANsyhpAAoF2k6KJsc55P1 (BTC) / LiRzzXnwamFCHoNnWqEkZk9HknRmjNT7nU (LTC)
gork68
Newbie
*
Offline Offline

Activity: 16
Merit: 0


View Profile
April 25, 2013, 03:00:54 AM
 #520

I have MSI 660 ti pe running stock.

I am using the 4/22 build and the following flags < -i 0 -C 2 -l 42x7 >

           *** CudaMiner for nVidia GPUs by Christian Buchner ***
                     This is version 2013-04-22 (alpha)
        based on pooler-cpuminer 2.2.3 (c) 2010 Jeff Garzik, 2012 pooler
               Cuda additions Copyright 2013 Christian Buchner
           My donation address: LKS1WDKGED647msBQfLBHV3Ls8sveGncnm

[2013-04-24 21:56:45] 1 miner threads started, using 'scrypt' algorithm.
[2013-04-24 21:56:45] Long-polling activated for http://127.0.0.1:8332/lp
[2013-04-24 21:56:45] GPU #0: GeForce GTX 660 Ti with compute capability 3.0
[2013-04-24 21:56:45] GPU #0: interactive: 0, tex-cache: 2D, single-alloc: 1
[2013-04-24 21:56:45] GPU #0: using launch configuration  42x7
[2013-04-24 21:56:45] GPU #0: GeForce GTX 660 Ti, 9408 hashes, 67.01 khash/s
[2013-04-24 21:56:48] GPU #0: GeForce GTX 660 Ti, 573888 hashes, 176.02 khash/s
[2013-04-24 21:56:48] accepted: 1/1 (100.00%), 176.02 khash/s (yay!!!)
[2013-04-24 21:57:03] GPU #0: GeForce GTX 660 Ti, 2671872 hashes, 179.91 khash/s

[2013-04-24 21:57:04] accepted: 2/2 (100.00%), 179.91 khash/s (yay!!!)
[2013-04-24 21:57:11] GPU #0: GeForce GTX 660 Ti, 1430016 hashes, 179.74 khash/s

[2013-04-24 21:57:12] accepted: 3/3 (100.00%), 179.74 khash/s (yay!!!)
[2013-04-24 21:57:45] GPU #0: GeForce GTX 660 Ti, 6115200 hashes, 180.31 khash/s

[2013-04-24 21:58:03] GPU #0: GeForce GTX 660 Ti, 3226944 hashes, 180.03 khash/s

[2013-04-24 21:58:03] accepted: 4/4 (100.00%), 180.03 khash/s (yay!!!)


After it stabalizes I maintain +180 KH/s although the pool stats show higher for some reason...
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [26] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 ... 1135 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!