[ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]

peacefulmind

Full Member

Offline

Activity: 196
Merit: 100

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 05:34:53 AM
Last edit: April 22, 2013, 06:33:43 AM by peacefulmind

#441

Christian,

Success,

copied from settings above but seems to be only 260kH/s per TITAN.

Quote from: FrictionlessCoin

"I think you are to hung up on this notion about 'pre-mining' being a No-No."

- from journeys into the dark depths of the alt coin forum....

peacefulmind

Full Member

Offline

Activity: 196
Merit: 100

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 07:54:06 AM

#442

Quote from: cbuchner1 on April 21, 2013, 09:53:26 PM

A 570 though would be significantly faster (but also run significantly hotter). I am still trying to understand why the Kepler architecture has such a performance disadvantage with my current code.

I did try some inline PTX assembly (looks horrid, check it out)

Code:

__device__ void ROTL7(uint32_t &A0, const uint32_t &A1, const uint32_t &A2,
                      uint32_t &B0, const uint32_t &B1, const uint32_t &B2,
                      uint32_t &C0, const uint32_t &C1, const uint32_t &C2,
                      uint32_t &D0, const uint32_t &D1, const uint32_t &D2)
{
    asm("{\n\t"
    "  .reg .u32 tA1, tA2;\n\t"
    "  .reg .u32 tB1, tB2;\n\t"
    "  .reg .u32 tC1, tC2;\n\t"
    "  .reg .u32 tD1, tD2;\n\t"
    "  add.u32 tA1, %4, %5;\n\t"
    "  add.u32 tB1, %6, %7;\n\t"
    "  add.u32 tC1, %8, %9;\n\t"
    "  add.u32 tD1, %10, %11;\n\t"
    "  shl.b32 tA2, tA1, 7;\n\t"
    "  shl.b32 tB2, tB1, 7;\n\t"
    "  shl.b32 tC2, tC1, 7;\n\t"
    "  shl.b32 tD2, tD1, 7;\n\t"
    "  shr.b32 tA1, tA1, 25;\n\t"
    "  shr.b32 tB1, tB1, 25;\n\t"
    "  shr.b32 tC1, tC1, 25;\n\t"
    "  shr.b32 tD1, tD1, 25;\n\t"
    "  or.b32 tA1, tA1, tA2;\n\t"
    "  or.b32 tB1, tB1, tB2;\n\t"
    "  or.b32 tC1, tC1, tC2;\n\t"
    "  or.b32 tD1, tD1, tD2;\n\t"
    "  xor.b32 %0, %0, tA1;\n\t"
    "  xor.b32 %1, %1, tB1;\n\t"
    "  xor.b32 %2, %2, tC1;\n\t"
    "  xor.b32 %3, %3, tD1;\n\t"
    "}"
    : "+r"(A0), "+r"(B0), "+r"(C0), "+r"(D0) : "r" (A1), "r" (A2), "r" (B1), "r" (B2), "r" (C1), "r" (C2), "r" (D1), "r" (D2));
}

as well as added instruction level parallelism by formulating the CUDA code like this:

Code:

#define ROTL7(A0, A1, A2, B0, B1, B2, C0, C1, C2, D0, D1, D2)  \
{\
    volatile uint32_t tA1 = A1 + A2, tB1 = B1 + B2, tC1 = C1 + C2, tD1 = D1 + D2;\
    volatile uint32_t tA2 = tA1<< 7, tB2 = tB1<< 7, tC2 = tC1<< 7, tD2 = tD1<< 7;\
                      tA1 = tA1>>25; tB1 = tB1>>25; tC1 = tC1>>25; tD1 = tD1>>25;\
                      tA2|= tA1    ; tB2|= tB1    ; tC2|= tC1    ; tD2|= tD1    ;\
                      A0 ^= tA2    ; B0 ^= tB2    ; C0 ^= tC2    ; D0 ^= tD2    ;\
}

but actually I couldn't get performance above what is already achieved. So in case you're wondering why there haven't been any updates. That is because my experiments in getting more speed haven't been fruitful yet.

TITAN Update -

Both Titans - 980Mhz Core, 6286Mhz Effective memory, 106% power target.

Both together ~500-530kH/s. Seems a Titan/K20 ought to do more in theory. Would love to see EACH at 500kH!

I really appreciate that this NVIDIA development may help increase LTC interest and adoption!

Quote from: FrictionlessCoin

"I think you are to hung up on this notion about 'pre-mining' being a No-No."

- from journeys into the dark depths of the alt coin forum....

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 08:49:34 AM
Last edit: April 22, 2013, 11:06:30 AM by cbuchner1

#443

I've just run into the same compiler issue that borked the Titan kernels when I tried to compile salsa_kernel.cu for sm_30. The kernel will just crash.

Maybe using the NSight debugger I can figure out why this occurs.

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 11:07:46 AM

#444

I've seen reports of a single overclocked Titan doing 290 kHash/s, using a somewhat earlier code version.

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 11:10:28 AM
Last edit: April 22, 2013, 11:43:39 AM by cbuchner1

#445

Quote from: InqBit on April 21, 2013, 11:05:39 PM

I assume you've seen this Kepler thread?

https://bitcointalk.org/index.php?topic=163750.0;topicseen

Seen this.

The challenges with the scrypt hashing are a bit greater than just using the funnel shifter for rotation. One issue is the speed and efficiency of memory access, the other issue is getting enough occupancy on Kepler's SMX (multiprocessor) units - shared memory and register limits are an issue. This mainly affects the GTX 660Ti, GTX 670, 680 and Titan devices which currently perform rather poor in comparison to the 5xx series.

K1773R

Legendary

Offline

Activity: 1792
Merit: 1008

/dev/null

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 11:47:10 AM

#446

how much are you guys getting with a 580?

[GPG Public Key]
BTC/DVC/TRC/FRC: 1K1773RbXRZVRQSSXe9N6N2MUFERvrdu6y ANC/XPM AK1773RTmRKtvbKBCrUu95UQg5iegrqyeA NMC: NK1773Rzv8b4ugmCgX789PbjewA9fL9Dy1 LTC: LKi773RBuPepQH8E6Zb1ponoCvgbU7hHmd EMC: EK1773RxUes1HX1YAGMZ1xVYBBRUCqfDoF BQC: bK1773R1APJz4yTgRkmdKQhjhiMyQpJgfN

SubNoize

Newbie

Offline

Activity: 47
Merit: 0

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 12:13:30 PM

#447

Quote from: K1773R on April 22, 2013, 11:47:10 AM

how much are you guys getting with a 580?

240KH/s give or take 10KH/s

K1773R

Legendary

Offline

Activity: 1792
Merit: 1008

/dev/null

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 12:34:44 PM

#448

Quote from: SubNoize on April 22, 2013, 12:13:30 PM

Quote from: K1773R on April 22, 2013, 11:47:10 AM

how much are you guys getting with a 580?

240KH/s give or take 10KH/s

sweet, i got ~257

(slightly OC)
as soon ive mined some coins il send a donation for sure Wink

[GPG Public Key]
BTC/DVC/TRC/FRC: 1K1773RbXRZVRQSSXe9N6N2MUFERvrdu6y ANC/XPM AK1773RTmRKtvbKBCrUu95UQg5iegrqyeA NMC: NK1773Rzv8b4ugmCgX789PbjewA9fL9Dy1 LTC: LKi773RBuPepQH8E6Zb1ponoCvgbU7hHmd EMC: EK1773RxUes1HX1YAGMZ1xVYBBRUCqfDoF BQC: bK1773R1APJz4yTgRkmdKQhjhiMyQpJgfN

Misiolap

Newbie

Offline

Activity: 14
Merit: 0

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 12:42:05 PM

#449

Quote from: cbuchner1 on April 22, 2013, 08:49:34 AM

I've just run into the same compiler issue that borked the Titan kernels when I tried to compile salsa_kernel.cu for sm_30. The kernel will just crash.

Maybe using the NSight debugger I can figure out why this occurs.

Does the crash produce: CUDA_EXCEPTION_6, Warp Misaligned Address ?

I've been able to compile & run salsa_kernel for sm_21, without tex-cache, when accesses to X variable are 128-bit aligned,

ie. when it's declared like this:

Code:

_shared__ uint32_t X[WARPS_PER_BLOCK][WU_PER_WARP][16+4];

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 01:28:48 PM

#450

Quote from: Misiolap on April 22, 2013, 12:42:05 PM

_shared__ uint32_t X[WARPS_PER_BLOCK][WU_PER_WARP][16+4];

Thanks! This helped. I did not know about newly added alignment restrictions in shared memory targeting SM 2.0 and higher. I guess that's because they're now having a unified pointer and addressing scheme. So if there's an alignment requirement, it applies to everything.

Finally the Titan kernel will get my large memory transaction fixes, which should boost performance notably.

Christian

Aggrophobia

Legendary

Offline

Activity: 1064
Merit: 1000

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 02:22:47 PM
Last edit: April 22, 2013, 02:44:06 PM by Aggrophobia

#451

autoadjust does not find the best values for my titan, had to find the best values Sad

e: now i checked -D option
it works with 70x4 280khash/s

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 02:50:06 PM

#452

Quote from: Aggrophobia on April 22, 2013, 02:22:47 PM

autoadjust does not find the best values for my titan, had to find the best values Sad

it works with 70x4 280khash/s

it's autotune (TM) (R).

how's 35x8 ?

Christian

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 07:55:22 PM

#453

Posted an April 22nd release.

Please let me know how it compiles on Linux 64 bit, and how it performs on Titan now.

The patch posted earlier wasn't really doing things right. CUDA textures should have stayed ulong2 and ulong4 type, but the uint32_t type needed to be moved over to unsigned long (from unsigned int previously) because otherwise there would be a mismatch with the texture types.

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 08:20:09 PM

#454

hmm, the patch posted earlier suggests the following configure line for 64 bits

./configure "CFLAGS=-O3" "CXXFLAGS=-O3" "LDFLAGS=-Wl,-O1" --with-cuda=/usr/local/cuda

not sure what the -Wl,-O1 linker flag is supposed to do.

Misiolap

Newbie

Offline

Activity: 14
Merit: 0

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 08:26:04 PM

#455

My mistake, it shouldn't be there - at the moment -O1 for ld only turns on some optimizations for shared libraries, not the program binary.

dbabo

Newbie

Offline

Activity: 41
Merit: 0

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 08:26:29 PM

#456

Quote from: cbuchner1 on April 22, 2013, 07:55:22 PM

Posted an April 22nd release.

Please let me know how it compiles on Linux 64 bit, and how it performs on Titan now.

The patch posted earlier wasn't really doing things right. CUDA textures should have stayed ulong2 and ulong4 type, but the uint32_t type needed to be moved over to unsigned long (from unsigned int previously) because otherwise there would be a mismatch with the texture types.

Christian,
configure works fine if i run:
./configure -with-cuda=/usr/local/cuda

instead of ./configure.sh

And it almost compiles - http://pastebin.com/raw.php?i=JZb62Jtd

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 08:29:08 PM

#457

That doesn't qualify as almost! Wink

dbabo

Newbie

Offline

Activity: 41
Merit: 0

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 08:36:08 PM

#458

Quote from: cbuchner1 on April 22, 2013, 08:29:08 PM

That doesn't qualify as almost! Wink

xa-xa close enough. I think i observed same errors before the patch. so it 9hopefully) something simple.

dbabo

Newbie

Offline

Activity: 41
Merit: 0

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 08:37:49 PM

#459

Quote from: Misiolap on April 22, 2013, 08:26:04 PM

My mistake, it shouldn't be there - at the moment -O1 for ld only turns on some optimizations for shared libraries, not the program binary.

-O3 takes whopping 5Khs/ out of mine super fast GT460

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

April 22, 2013, 08:41:31 PM

#460

I am like so close -----> <----- to throwing out the texture cache support in 64 bit builds.