Bitcoin Forum
November 12, 2024, 11:13:05 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [17] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 ... 1135 »
  Print  
Author Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]  (Read 3426930 times)
Edvin512
Full Member
***
Offline Offline

Activity: 167
Merit: 100


View Profile
April 17, 2013, 12:35:00 AM
 #321

gtx 460 815/2000 - 28x4 - 116 k/h  with 14.4 release

I am like a lennisters
FalconFour
Full Member
***
Offline Offline

Activity: 176
Merit: 100



View Profile WWW
April 17, 2013, 01:03:42 AM
Last edit: April 17, 2013, 01:19:10 AM by FalconFour
 #322

Might have the 8800gtx's crash figured out:

Disable Desktop Window Manager, start cudaMiner -> CRASH
Enable DWM, start cudaMiner -> Runs (~30mhash/sec = shit, as I've seen 40+ from this thing)
Enable DWM, start cudaMiner, disable DWM just as the autotune starts -> Runs - autotune figures improve as I disable DWM, but still finds 41.03khash/sec with 30x2 and starts mining at 27.31khash/sec.

edit: Yeah, autotune is definitely buggy. No autotune (plug my own numbers in) -> works fine without DWM.

In fact, I plugged "32x2" in just on a whim, and it initially said it was getting 9.38khash/sec with just 4096 hashes. Then, moments later, it started accepting results - and it was pumping 31.06! This definitely smells like an autotune bug to me...

Maybe you could autotune while the miner runs. For example, each time a result is accepted, adjust hashes/warps (not sure the implications of each) up/down by small or large jumps - then save the best configuration once you find a good hot-spot that "N" adjustments haven't been able to beat - say after 100 adjustments ("14x2", "15x2", "8x2", "16x3", etc), it hasn't improved performance over "16x2" or whatever it's found best. Then save it as a .conf file and read that on startup Smiley

feed the bird: 187CXEVzakbzcANsyhpAAoF2k6KJsc55P1 (BTC) / LiRzzXnwamFCHoNnWqEkZk9HknRmjNT7nU (LTC)
rimasb
Newbie
*
Offline Offline

Activity: 43
Merit: 0


View Profile
April 17, 2013, 07:03:54 AM
 #323

From 8khash to 28khash. Interesting. Now, wonder what the 8800GTX will do...

That's more like what I would have expected from these cards.


My 9800 GT looks better:
https://i.imgur.com/LdfDesv.jpg
CudaMiner 13/04
GeForce 9800 GT OC
Core Clock 780
Memory Clock 1008
Windows XP 32bit
314.22 Driver
-l 14x4
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 17, 2013, 07:42:12 AM
 #324


Beats my GTX 260 - maybe I should have turned off Aero too Wink
FalconFour
Full Member
***
Offline Offline

Activity: 176
Merit: 100



View Profile WWW
April 17, 2013, 09:04:07 AM
 #325

From 8khash to 28khash. Interesting. Now, wonder what the 8800GTX will do...

That's more like what I would have expected from these cards.


My 9800 GT looks better:

CudaMiner 13/04
GeForce 9800 GT OC
Core Clock 780
Memory Clock 1008
Windows XP 32bit
314.22 Driver
-l 14x4
I'm totally copying off your page tomorrow morning at the shop. Many thanks. <3 I didn't know XP could GPU-mine at all. Thought that was a new WinNT 6.x architecture thing...

feed the bird: 187CXEVzakbzcANsyhpAAoF2k6KJsc55P1 (BTC) / LiRzzXnwamFCHoNnWqEkZk9HknRmjNT7nU (LTC)
FalconFour
Full Member
***
Offline Offline

Activity: 176
Merit: 100



View Profile WWW
April 17, 2013, 09:06:04 AM
 #326


Beats my GTX 260 - maybe I should have turned off Aero too Wink
Aero is generally a good thing that actually improves system performance - and I generally recommend everyone leave it on/turn it back on if disabled (even for Bitcoin mining)... but here, the way the program works probably nets a performance gain by cutting any excess GPU activity. Weird, but it could be alpha blues Wink

feed the bird: 187CXEVzakbzcANsyhpAAoF2k6KJsc55P1 (BTC) / LiRzzXnwamFCHoNnWqEkZk9HknRmjNT7nU (LTC)
SubNoize
Newbie
*
Offline Offline

Activity: 47
Merit: 0


View Profile
April 17, 2013, 11:01:11 AM
 #327


Beats my GTX 260 - maybe I should have turned off Aero too Wink
Aero is generally a good thing that actually improves system performance - and I generally recommend everyone leave it on/turn it back on if disabled (even for Bitcoin mining)... but here, the way the program works probably nets a performance gain by cutting any excess GPU activity. Weird, but it could be alpha blues Wink

can you expand on that a little please? I would of assumed that by disabling it you're freeing up more gpu power for other tasks e.g. mining ?
theowalpott
Member
**
Offline Offline

Activity: 80
Merit: 10


View Profile
April 17, 2013, 11:04:52 AM
 #328

Getting about 48KH/s out of a Geforce 240GT.. although CPU usage seems pretty high.. using ~300% (4 cpus total).

Thanks for your efforts - shall keep an eye on it Cheesy

1FwGATm6eU5dSiTp2rpazV5u3qwbx1fuDn
termhn
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
April 17, 2013, 02:18:49 PM
 #329


Beats my GTX 260 - maybe I should have turned off Aero too Wink
Aero is generally a good thing that actually improves system performance - and I generally recommend everyone leave it on/turn it back on if disabled (even for Bitcoin mining)... but here, the way the program works probably nets a performance gain by cutting any excess GPU activity. Weird, but it could be alpha blues Wink

can you expand on that a little please? I would of assumed that by disabling it you're freeing up more gpu power for other tasks e.g. mining ?
When it's off that work gets put back on the CPU.
MiWBitCoin
Newbie
*
Offline Offline

Activity: 12
Merit: 0


View Profile
April 17, 2013, 02:42:46 PM
 #330

Lets play spot the difference:

cgminer 2.11.3
GPU 0:                | 88.38K/88.28Kh/s | A:9 R:0 HW:0 U:16.03/m I:12

CUDAminer 2013-04-13
[2013-04-18 00:09:04] accepted: 277/280 (98.93%), 164.48 khash/s (yay!!!)
[2013-04-18 00:09:12] GPU #0: GeForce GTX 680, 1284224 hashes, 167.86 khash/s
[2013-04-18 00:09:23] accepted: 284/287 (98.95%), 162.91 khash/s (yay!!!)

that is we call a significant improvement in hashing power.
Thank you very much Christian, this is very impressive.
termhn
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
April 17, 2013, 02:53:12 PM
 #331

The thing on my wishlist is an option like cgminer's -expiry 1 or -E 1 for coins that are getting new blocks reallllly fast.
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 17, 2013, 03:00:56 PM
Last edit: April 17, 2013, 05:33:59 PM by cbuchner1
 #332

The thing on my wishlist is an option like cgminer's -expiry 1 or -E 1 for coins that are getting new blocks reallllly fast.

Can you explain how that option works, or point me to a README with an explanation? what other coins would you be mining for using with the scrypt algorithm?

cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 17, 2013, 03:01:49 PM
 #333

Lets play spot the difference:
CUDAminer 2013-04-13: [2013-04-18 00:09:23] accepted: 284/287 (98.95%), 162.91 khash/s (yay!!!)

yay, and it's good hash, too. *puff*, *puff*, *smoke*
mcarturr
Newbie
*
Offline Offline

Activity: 38
Merit: 0



View Profile
April 17, 2013, 05:08:52 PM
 #334

i got 150 khs/s with a 670 gtx
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 17, 2013, 05:58:41 PM
Last edit: April 17, 2013, 08:54:34 PM by cbuchner1
 #335

i got 150 khs/s with a 670 gtx

I've quickly tested the 1D and 2D texture cache with a GTX 660Ti. The achieved values during hashing remain in the range of 153-155 kHash. Or in other words: almost indentical to operation without the cache which is 154 kHash. All generated shares are are valid.

So while the cache feature is not immediately useful, it may become useful as soon as we start shrinking the scrypt scratchpad.

Here is what a LOOKUP_GAP implementation does: it only saves every N'th value out of 1024 writes to the scratchpad, thereby reducing the bandwidth needed for writes by factor N. One "value" here is a vector of 32 uints (128 bytes in total). 1024 * 128 bytes is your typical 128 kbytes scrypt scratchpad. Here's some actual C-code for the programmers among you.

Code:
	uint32_t X[32]; int i,j,k;

for (k = 0; k < 32; k++) X[k] = input[k];

// write phase to scratchpad
for (i = 0; i < 1024; i++) {
memcpy(&V[i * 32], X, 128);
xor_salsa8(&X[0], &X[16]);
xor_salsa8(&X[16], &X[0]);
}

// read phase from scratchpad
for (i = 0; i < 1024; i++) {
j = 32 * (X[16] & 1023);
for (k = 0; k < 32; k++) X[k] ^= V[j + k];
xor_salsa8(&X[0], &X[16]);
xor_salsa8(&X[16], &X[0]);
}

for (k = 0; k < 32; k++) output[k] = X[k];

Unfortunately we still have to do 1024 reads from the scratchpad and perform an increased amount of computation to re-synthesize the values that we ommitted during writing. For a LOOKUP_GAP of 4 we would (on average) have to run xor-salsa 2 extra times per lookup. But here's where the cache may come handy. Say you have reduced the total number of scratchpad values from 1024 to 256 using a LOOKUP_GAP of 4. That may increase your cache hit ratio because you are now going to read each element 4 times on average, therefore boosting your effective memory bandwidth for the reads. Unfortunately the lookups are done in a completely random order which causes a lot of cache lines to get replaced quickly.

Problematic is that the numbers quoted above are per hash, and your typical GPU does a few dozen to a few hundred hashes in parallel on its various multiprocessors. And pre-Kepler, each multiprocessor only has 6-8k of texture cache. On Kepler one SMX has 48kb of texture cache. How are the numbers going to work out? I don't really know yet. The cache seems awfully small for what it needs to cover.

Christian

termhn
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
April 17, 2013, 07:21:20 PM
 #336

The thing on my wishlist is an option like cgminer's -expiry 1 or -E 1 for coins that are getting new blocks reallllly fast.

Can you explain how that option works, or point me to a README with an explanation? what other coins would you be mining for using with the scrypt algorithm?


Sure I can in a few minutes. I am using it with feathercoin.
FalconFour
Full Member
***
Offline Offline

Activity: 176
Merit: 100



View Profile WWW
April 17, 2013, 07:29:15 PM
 #337

Lets play spot the difference:
CUDAminer 2013-04-13: [2013-04-18 00:09:23] accepted: 284/287 (98.95%), 162.91 khash/s (yay!!!)
yay, and it's good hash, too. *puff*, *puff*, *smoke*
No camping, muffukka! Pass that shit :3 hehe


Beats my GTX 260 - maybe I should have turned off Aero too Wink
Aero is generally a good thing that actually improves system performance - and I generally recommend everyone leave it on/turn it back on if disabled (even for Bitcoin mining)... but here, the way the program works probably nets a performance gain by cutting any excess GPU activity. Weird, but it could be alpha blues Wink
can you expand on that a little please? I would of assumed that by disabling it you're freeing up more gpu power for other tasks e.g. mining ?
Aero doesn't really use much GPU power at all. Because the new display architecture of WinNT 6.x (Vista/7/8) is built around Aero at the core, non-Aero (with "desktop composition" disabled) actually runs in an emulation mode. Lots of time is spent slowly re-drawing bitmaps on the screen, from what I've seen. Windows XP used a hardware-accelerated windows manager (GDI) that communicated window draw/move/animation commands to the video card. With the new architecture, it's built around being accelerated by the GPU. So when you disable Aero, Windows now emulates everything - taking up *more* CPU and GPU time than if you had Aero enabled so the GPU could idly manage the native mode of Windows' desktop window manager.

That's why if a computer can't handle Aero, I put XP on it (for a refurbished system, that is). If it CAN handle Aero, I put 7 on it and make sure Aero works. If I ever see a computer come in the shop with Aero disabled (classic theme, etc.), I enable Aero and it significantly improves the responsiveness of the system. As for Bitcoin mining, I've tried with and without Aero - I get a lower hashrate without Aero. That's all the proof I needed. Tongue

feed the bird: 187CXEVzakbzcANsyhpAAoF2k6KJsc55P1 (BTC) / LiRzzXnwamFCHoNnWqEkZk9HknRmjNT7nU (LTC)
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 17, 2013, 08:01:04 PM
Last edit: April 17, 2013, 08:50:16 PM by cbuchner1
 #338


I have released an April-17th version. Source code on github will be updated tomorrow.

It reduces the CPU usage greatly when run in interactive mode on Windows (in fact I think Windows is not measuring it correctly now, because it seems to hover around 0-1% for me). But it might be hashing a bit slower than before when running interactively. ATTENTION: I just remembered that I default to interactive mode now when not otherwise specified. This is done for all CUDA devices that have the watchdog timer active (i.e. they are driving a display and no registry fix was put in place to disable the watchdog for that card).

The texture cache feature now works but is detrimental to performance on all but fast Kepler cards where it doesn't make things neither better nor worse. Note the -C option now takes "1" or "2" as input, corresponding to 1D or 2D texture layouts. Consider this experimental, still.

On Titan I have split the CUDA kernel into two parts labelled "A" and "B", like it was already done for the non-Titan kernels . The "B" part should now automatically read through the 48kb/SMX texture caches by use of const __restrict__ pointers (this nice feature is Titan and Tesla K20 specific). No need to use the -C option there.  Let me know if the Titan still works, and if it's running any slower or faster.

CTRL-C works better than before. Aborts autotuning, and hitting it a second time will ALWAYS abort the tool.

Bye for now!   I still take donations! Wink

I will be experimenting with LOOKUP_GAP next, like it is known from OpenCL miners.

Christian
termhn
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
April 17, 2013, 08:09:53 PM
 #339

Check end of the first post here for brief explanation of -expiry
https://bitcointalk.org/index.php?topic=178286.0;topicseen
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 17, 2013, 08:12:11 PM
 #340

Check end of the first post here for brief explanation of -expiry
https://bitcointalk.org/index.php?topic=178286.0;topicseen

Are you sure that there is no equivalent expiry option in pooler's CPUminer? after all cudaminer is a carbon copy of pooler's code, extended with a bit of CUDA magic. Also, wouldn't LONGPOLL do the trick of switching to new work as soon as a new block arrives?

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [17] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 ... 1135 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!