Bitcoin Forum
November 09, 2024, 05:55:24 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 [22] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 ... 1135 »
  Print  
Author Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]  (Read 3426930 times)
420
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500



View Profile
April 21, 2013, 12:27:34 AM
 #421

What speed should I get with a GTX 460M

or a GTX 680

Donations: 1JVhKjUKSjBd7fPXQJsBs5P3Yphk38AqPr - TIPS
the hacks, the hacks, secure your bits!
nst6563
Sr. Member
****
Offline Offline

Activity: 252
Merit: 254


View Profile
April 21, 2013, 12:50:19 AM
 #422

What speed should I get with a GTX 460M

or a GTX 680

check here
https://docs.google.com/spreadsheet/ccc?key=0AjMqJzI7_dCvdG9fZFN1Vjd0WkFOZmtlejltd0JXbmc#gid=1


that should give you an idea
Misiolap
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
April 21, 2013, 12:51:51 AM
 #423

@K1773R: check out patch for 64bit systems, a few post above yours.
K1773R
Legendary
*
Offline Offline

Activity: 1792
Merit: 1008


/dev/null


View Profile
April 21, 2013, 02:28:21 AM
Last edit: April 21, 2013, 02:49:33 AM by K1773R
 #424

ERR: nvm i got the old 4.0 cuda...
works with 5.0 Smiley

[GPG Public Key]
BTC/DVC/TRC/FRC: 1K1773RbXRZVRQSSXe9N6N2MUFERvrdu6y ANC/XPM AK1773RTmRKtvbKBCrUu95UQg5iegrqyeA NMC: NK1773Rzv8b4ugmCgX789PbjewA9fL9Dy1 LTC: LKi773RBuPepQH8E6Zb1ponoCvgbU7hHmd EMC: EK1773RxUes1HX1YAGMZ1xVYBBRUCqfDoF BQC: bK1773R1APJz4yTgRkmdKQhjhiMyQpJgfN
hammz
Member
**
Offline Offline

Activity: 143
Merit: 10



View Profile
April 21, 2013, 05:35:47 AM
 #425

Is there a way I can periodically terminate the cudaminer program without it crashing my video card driver?

I want to run a script to restart cudaminer, and my stratum proxy connection, on a timer but task killing the cudaminer process in windows also crashes the video card, so I lose my overclock settings.





Lacan82
Sr. Member
****
Offline Offline

Activity: 247
Merit: 250


View Profile
April 21, 2013, 05:56:38 AM
 #426

Is there a way I can periodically terminate the cudaminer program without it crashing my video card driver?

I want to run a script to restart cudaminer, and my stratum proxy connection, on a timer but task killing the cudaminer process in windows also crashes the video card, so I lose my overclock settings.








Do you have the latest drivers? and version? this isn't an issue in the newer one

borgopio
Newbie
*
Offline Offline

Activity: 8
Merit: 0


View Profile
April 21, 2013, 06:53:30 AM
 #427

Cudaminer 2013-04-17 compiling errors

Cudaminer 2013-04-14 compiled fine on my system. I'm running it now and getting about 65 khash/s.
My system is:
AMD Phenom II
Linux 12.04
GeForce GTS 450

When I attempt to compile 2013-04-17, I get several errors like this:
"./salsa_kernel.cu(164): Warning: Cannot tell what pointer points to, assuming global memory space"
and this one:
"titan_kernel.cu(377): error: identifier "usleep" is undefined"

I assume they are coding errors and thought this would be helpful info.

Thanks for your work Christian. As soon as I get some coins I can send you something.
hammz
Member
**
Offline Offline

Activity: 143
Merit: 10



View Profile
April 21, 2013, 11:07:44 AM
 #428

Is there a way I can periodically terminate the cudaminer program without it crashing my video card driver?

I want to run a script to restart cudaminer, and my stratum proxy connection, on a timer but task killing the cudaminer process in windows also crashes the video card, so I lose my overclock settings.








Do you have the latest drivers? and version? this isn't an issue in the newer one

04-17...version from a few days ago.

I'm not talking about using ctrl-c from within the program, I'm task killing it via an automated script.  
dbabo
Newbie
*
Offline Offline

Activity: 41
Merit: 0


View Profile
April 21, 2013, 02:04:56 PM
 #429

Cudaminer 2013-04-17 compiling errors

Cudaminer 2013-04-14 compiled fine on my system. I'm running it now and getting about 65 khash/s.
My system is:
AMD Phenom II
Linux 12.04
GeForce GTS 450

When I attempt to compile 2013-04-17, I get several errors like this:
"./salsa_kernel.cu(164): Warning: Cannot tell what pointer points to, assuming global memory space"
and this one:
"titan_kernel.cu(377): error: identifier "usleep" is undefined"

I assume they are coding errors and thought this would be helpful info.

Thanks for your work Christian. As soon as I get some coins I can send you something.

some deps are missing - check orig post. Looks like you are on ubuntu right?
borgopio
Newbie
*
Offline Offline

Activity: 8
Merit: 0


View Profile
April 21, 2013, 03:01:41 PM
 #430

No dependencies listed for Linux on original post. I am running Cuda 5.0 and graphics driver meets minimum release requirements. 2013-04-24 compiles fine.
dbabo
Newbie
*
Offline Offline

Activity: 41
Merit: 0


View Profile
April 21, 2013, 03:10:46 PM
 #431

No dependencies listed for Linux on original post. I am running Cuda 5.0 and graphics driver meets minimum release requirements. 2013-04-24 compiles fine.

g++-multilib and ia32-libs, possibly also libcurl4-dev

 r u on 32?
borgopio
Newbie
*
Offline Offline

Activity: 8
Merit: 0


View Profile
April 21, 2013, 03:33:26 PM
 #432

Yes - 32 bit.
I installed g++multilib and ia32-libs and recompiled. Same errors. There is more than one libcurl4 to chose from.
KnowBuddy
Member
**
Offline Offline

Activity: 69
Merit: 10


View Profile
April 21, 2013, 03:59:34 PM
 #433

Is there a way I can periodically terminate the cudaminer program without it crashing my video card driver?

I want to run a script to restart cudaminer, and my stratum proxy connection, on a timer but task killing the cudaminer process in windows also crashes the video card, so I lose my overclock settings.








Do you have the latest drivers? and version? this isn't an issue in the newer one

04-17...version from a few days ago.

I'm not talking about using ctrl-c from within the program, I'm task killing it via an automated script.  

Isn't it possible to make the cudaminer.exe window active and enter ctrl-c using an automated script?
bitg
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile
April 21, 2013, 07:01:42 PM
 #434

I am seeing an small increment with 17-04 version too, about 3-5%.

I did notice that Ctrl-C is ignored when the connection to pool isn't succesfull, forcing me to close the cmd window directly.

Anyways great job with the CUDA implementation.
logdog16
Newbie
*
Offline Offline

Activity: 19
Merit: 0



View Profile WWW
April 21, 2013, 07:51:54 PM
 #435

On my 670 with 1D tex cache I am getting about 175 kHash/s.

Great work!
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 21, 2013, 09:53:26 PM
Last edit: April 21, 2013, 10:04:09 PM by cbuchner1
 #436

On my 670 with 1D tex cache I am getting about 175 kHash/s.

A 570 though would be significantly faster (but also run significantly hotter). I am still trying to understand why the Kepler architecture has such a performance disadvantage with my current code.

I did try some inline PTX assembly (looks horrid, check it out)

Code:
__device__ void ROTL7(uint32_t &A0, const uint32_t &A1, const uint32_t &A2,
                      uint32_t &B0, const uint32_t &B1, const uint32_t &B2,
                      uint32_t &C0, const uint32_t &C1, const uint32_t &C2,
                      uint32_t &D0, const uint32_t &D1, const uint32_t &D2)
{
    asm("{\n\t"
    "  .reg .u32 tA1, tA2;\n\t"
    "  .reg .u32 tB1, tB2;\n\t"
    "  .reg .u32 tC1, tC2;\n\t"
    "  .reg .u32 tD1, tD2;\n\t"
    "  add.u32 tA1, %4, %5;\n\t"
    "  add.u32 tB1, %6, %7;\n\t"
    "  add.u32 tC1, %8, %9;\n\t"
    "  add.u32 tD1, %10, %11;\n\t"
    "  shl.b32 tA2, tA1, 7;\n\t"
    "  shl.b32 tB2, tB1, 7;\n\t"
    "  shl.b32 tC2, tC1, 7;\n\t"
    "  shl.b32 tD2, tD1, 7;\n\t"
    "  shr.b32 tA1, tA1, 25;\n\t"
    "  shr.b32 tB1, tB1, 25;\n\t"
    "  shr.b32 tC1, tC1, 25;\n\t"
    "  shr.b32 tD1, tD1, 25;\n\t"
    "  or.b32 tA1, tA1, tA2;\n\t"
    "  or.b32 tB1, tB1, tB2;\n\t"
    "  or.b32 tC1, tC1, tC2;\n\t"
    "  or.b32 tD1, tD1, tD2;\n\t"
    "  xor.b32 %0, %0, tA1;\n\t"
    "  xor.b32 %1, %1, tB1;\n\t"
    "  xor.b32 %2, %2, tC1;\n\t"
    "  xor.b32 %3, %3, tD1;\n\t"
    "}"
    : "+r"(A0), "+r"(B0), "+r"(C0), "+r"(D0) : "r" (A1), "r" (A2), "r" (B1), "r" (B2), "r" (C1), "r" (C2), "r" (D1), "r" (D2));
}

as well as added instruction level parallelism by formulating the CUDA code like this:

Code:
#define ROTL7(A0, A1, A2, B0, B1, B2, C0, C1, C2, D0, D1, D2)  \
{\
    volatile uint32_t tA1 = A1 + A2, tB1 = B1 + B2, tC1 = C1 + C2, tD1 = D1 + D2;\
    volatile uint32_t tA2 = tA1<< 7, tB2 = tB1<< 7, tC2 = tC1<< 7, tD2 = tD1<< 7;\
                      tA1 = tA1>>25; tB1 = tB1>>25; tC1 = tC1>>25; tD1 = tD1>>25;\
                      tA2|= tA1    ; tB2|= tB1    ; tC2|= tC1    ; tD2|= tD1    ;\
                      A0 ^= tA2    ; B0 ^= tB2    ; C0 ^= tC2    ; D0 ^= tD2    ;\
}

but actually I couldn't get performance above what is already achieved. So in case you're wondering why there haven't been any updates. That is because my experiments in getting more speed haven't been fruitful yet.

Bakemono
Member
**
Offline Offline

Activity: 85
Merit: 10



View Profile
April 21, 2013, 09:58:02 PM
 #437

13kh/s with GeForce GT 330M  Tongue Tongue

LAPTOP POWAh  Cool

BTC : 1Ct9opEdmq4ZuZmNQmhGBDcurrePFykTRt
LTC : LLqGtKpAdx6Ci8ZaSrvWG6WXfFF3mPK4V9
InqBit
Newbie
*
Offline Offline

Activity: 27
Merit: 0



View Profile
April 21, 2013, 11:05:39 PM
 #438

On my 670 with 1D tex cache I am getting about 175 kHash/s.

A 570 though would be significantly faster (but also run significantly hotter). I am still trying to understand why the Kepler architecture has such a performance disadvantage with my current code.

I did try some inline PTX assembly (looks horrid, check it out)

Code:
__device__ void ROTL7(uint32_t &A0, const uint32_t &A1, const uint32_t &A2,
                      uint32_t &B0, const uint32_t &B1, const uint32_t &B2,
                      uint32_t &C0, const uint32_t &C1, const uint32_t &C2,
                      uint32_t &D0, const uint32_t &D1, const uint32_t &D2)
{
    asm("{\n\t"
    "  .reg .u32 tA1, tA2;\n\t"
    "  .reg .u32 tB1, tB2;\n\t"
    "  .reg .u32 tC1, tC2;\n\t"
    "  .reg .u32 tD1, tD2;\n\t"
    "  add.u32 tA1, %4, %5;\n\t"
    "  add.u32 tB1, %6, %7;\n\t"
    "  add.u32 tC1, %8, %9;\n\t"
    "  add.u32 tD1, %10, %11;\n\t"
    "  shl.b32 tA2, tA1, 7;\n\t"
    "  shl.b32 tB2, tB1, 7;\n\t"
    "  shl.b32 tC2, tC1, 7;\n\t"
    "  shl.b32 tD2, tD1, 7;\n\t"
    "  shr.b32 tA1, tA1, 25;\n\t"
    "  shr.b32 tB1, tB1, 25;\n\t"
    "  shr.b32 tC1, tC1, 25;\n\t"
    "  shr.b32 tD1, tD1, 25;\n\t"
    "  or.b32 tA1, tA1, tA2;\n\t"
    "  or.b32 tB1, tB1, tB2;\n\t"
    "  or.b32 tC1, tC1, tC2;\n\t"
    "  or.b32 tD1, tD1, tD2;\n\t"
    "  xor.b32 %0, %0, tA1;\n\t"
    "  xor.b32 %1, %1, tB1;\n\t"
    "  xor.b32 %2, %2, tC1;\n\t"
    "  xor.b32 %3, %3, tD1;\n\t"
    "}"
    : "+r"(A0), "+r"(B0), "+r"(C0), "+r"(D0) : "r" (A1), "r" (A2), "r" (B1), "r" (B2), "r" (C1), "r" (C2), "r" (D1), "r" (D2));
}

as well as added instruction level parallelism by formulating the CUDA code like this:

Code:
#define ROTL7(A0, A1, A2, B0, B1, B2, C0, C1, C2, D0, D1, D2)  \
{\
    volatile uint32_t tA1 = A1 + A2, tB1 = B1 + B2, tC1 = C1 + C2, tD1 = D1 + D2;\
    volatile uint32_t tA2 = tA1<< 7, tB2 = tB1<< 7, tC2 = tC1<< 7, tD2 = tD1<< 7;\
                      tA1 = tA1>>25; tB1 = tB1>>25; tC1 = tC1>>25; tD1 = tD1>>25;\
                      tA2|= tA1    ; tB2|= tB1    ; tC2|= tC1    ; tD2|= tD1    ;\
                      A0 ^= tA2    ; B0 ^= tB2    ; C0 ^= tC2    ; D0 ^= tD2    ;\
}

but actually I couldn't get performance above what is already achieved. So in case you're wondering why there haven't been any updates. That is because my experiments in getting more speed haven't been fruitful yet.



I assume you've seen this Kepler thread?

https://bitcointalk.org/index.php?topic=163750.0;topicseen
jasonharty24
Newbie
*
Offline Offline

Activity: 27
Merit: 0


View Profile
April 22, 2013, 04:12:35 AM
 #439

this is my 670gtx (GIGABYTE GV-N670OC-2GD) doing over 200khash/s
http://s23.postimg.org/m7lni155j/cuda_miner.jpg



Code:
cudaminer.exe --url http://notroll.in:6332/ --userpass jasonharty24.4:12345 -i 0 -m 1 -C 2 -l 70x4
termhn
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
April 22, 2013, 05:31:37 AM
 #440

this is my 670gtx (GIGABYTE GV-N670OC-2GD) doing over 200khash/s




Code:
cudaminer.exe --url http://notroll.in:6332/ --userpass jasonharty24.4:12345 -i 0 -m 1 -C 2 -l 70x4
JESUS CHRIST that is a great OC!
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 [22] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 ... 1135 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!