Bitcoin Forum
April 28, 2024, 05:20:41 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 [87] 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 ... 1135 »
  Print  
Author Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]  (Read 3426869 times)
ak84
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
December 22, 2013, 07:07:24 PM
 #1721

Ponn that is excellent. Thank you.

BTW please specify that -i 0 will fully use GPU and slow down your computer display a little
while -i 1 will leave a little power for your display needs.

IN my case using -i 1  reduces hash speed ~20khash/s but when switching browser tabs or typing I'm not noticeably lagging. With -i 0 I have some display lag.

▬▬▬▬▬▬▬▬▬ Edutainment.Tech ▬▬▬▬▬▬▬▬▬
Double ICO: Games for smart and games for business
SmartGames    ◼ CorpEdu
"Governments are good at cutting off the heads of a centrally controlled networks like Napster, but pure P2P networks like Gnutella and Tor seem to be holding their own." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714324841
Hero Member
*
Offline Offline

Posts: 1714324841

View Profile Personal Message (Offline)

Ignore
1714324841
Reply with quote  #2

1714324841
Report to moderator
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
December 22, 2013, 07:23:41 PM
 #1722

2 x EVGA 780 Classified overclocked to 1300MHz
cudaminer settings: -i 0 -H 1 -m 1 -l T12x16



yeah, keep making me sorry for buying 3x 780 Ti ...

The extra power draw of the 3 additional SMX in the Ti prevents any meaningful overclocking, it seems.

Christian

Antivanity
Newbie
*
Offline Offline

Activity: 26
Merit: 0


View Profile
December 22, 2013, 07:25:38 PM
 #1723

2 x EVGA 780 Classified overclocked to 1300MHz
cudaminer settings: -i 0 -H 1 -m 1 -l T12x16

https://i.imgur.com/O2dXdTf.jpg

yeah, keep making me sorry for buying 3x 780 Ti ...

The extra power draw of the 3 additional SMX in the Ti prevents any meaningful overclocking, it seems.

Christian



Im actually in line for the Step-Up program by EVGA to upgrade to TI's for free. You saying its not worth it ?
Ponn
Newbie
*
Offline Offline

Activity: 5
Merit: 0


View Profile
December 22, 2013, 07:29:32 PM
 #1724

2 x EVGA 780 Classified overclocked to 1300MHz
cudaminer settings: -i 0 -H 1 -m 1 -l T12x16

https://i.imgur.com/O2dXdTf.jpg

yeah, keep making me sorry for buying 3x 780 Ti ...

The extra power draw of the 3 additional SMX in the Ti prevents any meaningful overclocking, it seems.

Christian



Just want to say thanks a bunch for CUDA Miner.. I'm learning something for once rather than twiddling my thumbs on the internet! Wish I could tip ya, but I don't have anything to tip with atm.
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
December 22, 2013, 07:34:54 PM
 #1725

Im actually in line for the Step-Up program by EVGA to upgrade to TI's for free. You saying its not worth it ?

I am peaking out at 590 kHash/s per device. Meh. limited to 106% TDP, by the way.

Antivanity
Newbie
*
Offline Offline

Activity: 26
Merit: 0


View Profile
December 22, 2013, 07:36:49 PM
 #1726

Im actually in line for the Step-Up program by EVGA to upgrade to TI's for free. You saying its not worth it ?

I am peaking out at 590 kHash/s per device. Meh. limited to 106% TDP, by the way.



Oh, well that makes me sad. My 2 cards have 2 diff TDP.. 110% and 115%.. which i find odd..

Anyway, thanks for cudaminer, specially the December 18th update.. amazing speed boots!
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
December 22, 2013, 07:37:10 PM
 #1727

The github repo now contains some more optimized SHA256 code that gives a little extra speed with the -H 2 option. Less time spent on the SHA256 part means more time spent on the scrypt core part means higher hash rate.

Christian
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
December 22, 2013, 07:39:11 PM
 #1728

Oh, well that makes me sad. My 2 cards have 2 diff TDP.. 110% and 115%.. which i find odd..

Anyway, thanks for cudaminer, specially the December 18th update.. amazing speed boots!

If these are identical cards and hardware revisions, you could try flashing the VGA bios of the 115% card over the BIOS of the slower card. Credit for the giant speed boost goes to David Andersen. He implemented the code in such a way that I couldn't really wrap my mind around before I actually saw his code (I've tried, but couldn't picture it).


cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
December 22, 2013, 08:47:04 PM
Last edit: December 22, 2013, 09:02:41 PM by cbuchner1
 #1729

Semi-idle question:  Is there community interest in sponsoring some more optimization of the cudaminer code for Kepler GK104 and GK110-based cards?

In honesty, I think I threw my best optimization ideas at the version most of you are already running.  *grins*  But there's probably another 5% here and there, which could translate into, say, at least a 10-20kh/sec boost on faster cards.

  -Dave

You could also optimize on the low-end GT 640 (GDDR 5 version). It has Compute 3.5, does around 105 kHsh/s with OC and I generally found its results to scale up pretty linearly with the number of SMX. i.e. scaling it up to 12 SMX (like the GTX 780) yields some 630 kHash/s which people actually seem to be hitting when overclocking their devices.

EDIT: I did some profiling with Cuda 5.5 Visual Profiler on a Compute 3.0 device recently (also 2 SMX, a laptop part). I found that the artithmetic units were pretty maxed out. And it also showed an 80% efficiency in the instruction scheduler. Meaning that the dual issue feature in each SMX four warp schedulers was pretty nicely utilized. The occupancy on each SMX was 100%, which is perfect. Memory accesses were fully coalesced 128 byte transactions. Can't get any better than this.

My recent optimizations in the SHA256 code were aimed at lower register use and higher occupancy. There is still some extra efficiency to be had coalescing the memory access, I guess.




dga
Hero Member
*****
Offline Offline

Activity: 737
Merit: 511


View Profile WWW
December 22, 2013, 08:59:51 PM
 #1730

Semi-idle question:  Is there community interest in sponsoring some more optimization of the cudaminer code for Kepler GK104 and GK110-based cards?

In honesty, I think I threw my best optimization ideas at the version most of you are already running.  *grins*  But there's probably another 5% here and there, which could translate into, say, at least a 10-20kh/sec boost on faster cards.

  -Dave

You could also optimize on the low-end GT 640 (GDDR 5 version). It has Compute 3.5, does around 105 kHsh/s with OC and I generally found its results to scale up pretty linearly with the number of SMX. i.e. scaling it up to 12 SMX (like the GTX 780) yields some 630 kHash/s which people actually seem to be hitting when overclocking their devices.

EDIT: I did some profiling with Cuda 5.5 Visual Profiler on a Compute 3.0 device recently (also 2 SMX, a laptop part). I found that the artithmetic units were pretty maxed out. And it also showed an 80% efficiency in the instruction scheduler. Meaning that the dual issue feature in each SMX four warp schedulers was pretty nicely utilized. The occupancy on each SMX was 100%, which is perfect. Memory accesses were fully coalesced 128 byte transactions. Can't get any better than this.


Ooh.  Good idea for the cheaper device - thank you. 

Re "can't get any better" - that's the other reason I was thinking about grubbing for help.  I'm guessing that the remainder of the optimization is going to be ugly.  I've been staring at, e.g., the cuobjdump assembly output and the instruction throughput tables and trying to figure out if there are ways to improve it (nothing obvious).  And, as you note, 80% instruction scheduling is already quite high.  Doubling up keys in a clever way might get that to 90 but at the cost of probably unacceptable register pressure.  I tried it once and threw away the code, but there are a few other ways to imagine doing it.

It's really hard to beat the raw number of ALUs those AMD devices have when the code is as trivially parallel as brute-force hashing.

polarbear7217008
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
December 22, 2013, 09:04:57 PM
 #1731

So I am just mining a bit for the experience and its mostly going well but this weird thing. After a couple accepted my hash rate goes out of control and it never works again until I reboot my computer. If this is a known issue or someone else has this if you could let me know. Thanks and sorry if this is newbish.



Code:
[2013-12-21 14:15:56] GPU #0: using launch configuration K10x14
[2013-12-21 14:15:56] GPU #0: GeForce GT 750M, 4480 hashes, 9.72 khash/s
[2013-12-21 14:16:04] GPU #0: GeForce GT 750M, 586880 hashes, 76.56 khash/s
[2013-12-21 14:16:33] GPU #0: GeForce GT 750M, 2298240 hashes, 77.26 khash/s
[2013-12-21 14:16:33] accepted: 1/1 (100.00%), 77.26 khash/s (yay!!!)
[2013-12-21 14:16:40] GPU #0: GeForce GT 750M, 546560 hashes, 76.78 khash/s
[2013-12-21 14:16:41] accepted: 2/2 (100.00%), 76.78 khash/s (yay!!!)
[2013-12-21 14:17:06] GPU #0: GeForce GT 750M, 1944320 hashes, 77.23 khash/s
[2013-12-21 14:17:06] accepted: 3/3 (100.00%), 77.23 khash/s (yay!!!)
[2013-12-21 14:17:53] GPU #0: GeForce GT 750M, 4636800 hashes, 97.35 khash/s
[2013-12-21 14:17:53] GPU #0: GeForce GT 750M, 5841920 hashes, 364482 khash/s
[2013-12-21 14:18:48] Stratum detected new block
[2013-12-21 14:18:48] GPU #0: GeForce GT 750M, 1978523776 hashes, 35912 khash/s
[2013-12-21 14:18:55] GPU #0: GeForce GT 750M, 2154705280 hashes, 347175 khash/s
[2013-12-21 14:23:55] Stratum detected new block
[2013-12-21 14:23:55] GPU #0: GeForce GT 750M, 605833216 hashes, 2014 khash/s
[2013-12-21 14:23:56] GPU #0: GeForce GT 750M, 120843520 hashes, 340147 khash/s
[2013-12-21 14:26:11] Stratum detected new block
[2013-12-21 14:26:11] GPU #0: GeForce GT 750M, 3904416000 hashes, 28782 khash/s
[2013-12-21 14:26:16] GPU #0: GeForce GT 750M, 1726941440 hashes, 345834 khash/s
[2013-12-21 14:33:27] GPU #0: GeForce GT 750M, 2568025856 hashes, 5971 khash/s
[2013-12-21 14:33:28] GPU #0: GeForce GT 750M, 358234240 hashes, 346545 khash/s
[2013-12-21 14:33:29] Stratum detected new block
[2013-12-21 14:33:29] GPU #0: GeForce GT 750M, 661507840 hashes, 347550 khash/s
[2013-12-21 14:36:42] Stratum detected new block
[2013-12-21 14:36:42] GPU #0: GeForce GT 750M, 2588029440 hashes, 13409 khash/s
[2013-12-21 14:36:45] GPU #0: GeForce GT 750M, 804558720 hashes, 345060 khash/s
[2013-12-21 14:39:03] Stratum detected new block
[2013-12-21 14:39:03] GPU #0: GeForce GT 750M, 955258624 hashes, 6888 khash/s
[2013-12-21 14:39:05] GPU #0: GeForce GT 750M, 413306880 hashes, 343030 khash/s
[2013-12-21 14:40:08] Stratum detected new block
Spiffy_1
Full Member
***
Offline Offline

Activity: 235
Merit: 100


View Profile
December 22, 2013, 09:09:54 PM
 #1732

If you're mining on a pool with vardiff  you're probably being assigned a difficulty your card can't process in the short time between blocks.  Everything is working right on your end it looks like.  Diff should scale back down but it can take up to 24 hours to dial in your difficulty.. I tend to use pools where I can set my difficulty manually so I don't run into this issue.

If you like what I've posted, mine for me on whatever algo you like on www.zpool.ca for a minute using my bitcoin address: 1BJJYPRcRPzTEfByCwkeJ8SCBcrnGD1nhL
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
December 22, 2013, 09:10:18 PM
 #1733

So I am just mining a bit for the experience and its mostly going well but this weird thing. After a couple accepted my hash rate goes out of control and it never works again until I reboot my computer. If this is a known issue or someone else has this if you could let me know. Thanks and sorry if this is newbish.

probably an unstable card due to excessive overclocking ...
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
December 22, 2013, 09:12:59 PM
 #1734

Doubling up keys in a clever way might get that to 90 but at the cost of probably unacceptable register pressure.  I tried it once and threw away the code, but there are a few other ways to imagine doing it.

It's really hard to beat the raw number of ALUs those AMD devices have when the code is as trivially parallel as brute-force hashing.

have you heard of the lookup_gap feature of the OpenCL based miner? reduces the scratchpad size by factor 2 or 3 and replaces the lookups by some extra computation. Not sure if nvidia cards have the computational reserves, but we could try...the funnel shifter seems to help a bit in creating some breathing room. compute 3.5 devices are definitely memory limited when hashing.
dga
Hero Member
*****
Offline Offline

Activity: 737
Merit: 511


View Profile WWW
December 22, 2013, 09:26:42 PM
 #1735

Doubling up keys in a clever way might get that to 90 but at the cost of probably unacceptable register pressure.  I tried it once and threw away the code, but there are a few other ways to imagine doing it.

It's really hard to beat the raw number of ALUs those AMD devices have when the code is as trivially parallel as brute-force hashing.

have you heard of the lookup_gap feature of the OpenCL based miner? reduces the scratchpad size by factor 2 or 3 and replaces the lookups by some extra computation. Not sure if nvidia cards have the computational reserves, but we could try...the funnel shifter seems to help a bit in creating some breathing room. compute 3.5 devices are definitely memory limited when hashing.

Haven't seen it, but I'm guessing from your description that it stores only every other (or every third) scratchpad entry and dynamically recomputes when it needs to access an odd-numbered entry?

I've thought about it, but the nvidia kernels are so compute-bound that I never took it seriously.  Do you know any numbers for how much it speeds up the OpenCL miner?  It's not that hard to implement if it really seems worthwhile, but I'm skeptical for nvidia.

(It is, however, the obvious route to go for FPGA or ASIC.)

polarbear7217008
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
December 22, 2013, 09:28:39 PM
 #1736

So I am just mining a bit for the experience and its mostly going well but this weird thing. After a couple accepted my hash rate goes out of control and it never works again until I reboot my computer. If this is a known issue or someone else has this if you could let me know. Thanks and sorry if this is newbish.

probably an unstable card due to excessive overclocking ...


my card is not overclocked at all but could the same result be from just over heating?
Spiffy_1
Full Member
***
Offline Offline

Activity: 235
Merit: 100


View Profile
December 22, 2013, 09:43:50 PM
 #1737

Doubtful.. Try a pool that doesn't use vardiff. and set it to its lowest difficulty and see if you get solid returns.  Multipool.us has set difficulty levels. You have the same hash rate as I do on my gtx 560M and I get steady returns all the time with it.

If you like what I've posted, mine for me on whatever algo you like on www.zpool.ca for a minute using my bitcoin address: 1BJJYPRcRPzTEfByCwkeJ8SCBcrnGD1nhL
madjules007
Sr. Member
****
Offline Offline

Activity: 400
Merit: 250



View Profile
December 22, 2013, 09:55:15 PM
 #1738

Hi everyone, I wrote a small Treatise on Cuda Miner, mind helping me check it over? (Much updated! wow!)
http://www.reddit.com/r/Dogecoinmining/comments/1tguse/a_treatise_on_cuda_miner/

Thanks for this. Great explanation. Got about 60kh/s extra on my GTX 780!

██████████████████████
██████████████████████
██████████████████████
██████████████████████
██████████████████████
██████████████████████
██████████████████████
██████████████████████
██████████████████████
██████████████████████
██████████████████████
██████████████████████
██████████████████████
██████████████████████
RISE
Ponn
Newbie
*
Offline Offline

Activity: 5
Merit: 0


View Profile
December 22, 2013, 10:07:22 PM
 #1739

Hi everyone, I wrote a small Treatise on Cuda Miner, mind helping me check it over? (Much updated! wow!)
http://www.reddit.com/r/Dogecoinmining/comments/1tguse/a_treatise_on_cuda_miner/

Thanks for this. Great explanation. Got about 60kh/s extra on my GTX 780!

Sweet! Glad I could help, this has been a great learning experience for me.
daddywarbucks
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
December 23, 2013, 01:47:48 AM
 #1740

Hey everyone,

I decided to start mining some alt coins and thought I'd fire up a device I had lying around. I'm using a Tesla S870 system, but only 1 of the 2 connections for now (2 cards).  I'm having an issue finding a stable mining configuration. Essentially, the system appears to be busy and accepting work, but not coins are ever mined. I've done a lot of searching and trail & error with the -l command-line option. As I understand it, the -l option should be the multiprocessors x CUDA cores, correct?  I.e. (from deviceQuery):

Code:
  (16) Multiprocessors x (  8) CUDA Cores/MP:    128 CUDA Cores

Therefore -l L16x8 ?

If I pump up the -l setting (i.e. L128x64, but can be much lower), it will often say it's getting 300+kh/s, which I believe is completely off. I noticed this in another post in this thread, and it seemed to be a one-off. I know that this is not rockstar hardware, but I would like to use it for some light mining. My questions are:

(1) What's the _right_ way to determine the -l settings? I have tried many options as well as 'auto' with -D and even -P (I am a web guy, after all Wink ), which often leads to L0x0 and crashes.

(2) Is there anything I can do to help with support for this hardware?

Here's my configuration:

OS:

Code:
$ uname -a
Linux hypercoil 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

NVIDIA Driver:

Code:
$ cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module  304.54  Sat Sep 29 00:05:49 PDT 2012
GCC version:  gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)

NVCC/Cuda Tools:

Code:
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Fri_Sep_21_17:28:58_PDT_2012
Cuda compilation tools, release 5.0, V0.2.1221

CudaMiner:

Code:
$ ./cudaminer
   *** CudaMiner for nVidia GPUs by Christian Buchner ***
             This is version 2013-12-10 (beta)
based on pooler-cpuminer 2.3.2 (c) 2010 Jeff Garzik, 2012 pooler
       Cuda additions Copyright 2013 Christian Buchner
   My donation address: LKS1WDKGED647msBQfLBHV3Ls8sveGncnm

Kind regards,

DW
Pages: « 1 ... 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 [87] 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 ... 1135 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!