[ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 02:09:16 PM

#3241

Quote from: Espie on January 29, 2014, 02:04:00 PM

I have Visual Studio 2012, but I can't load the solution file. So I probably have to wait a few more days.

CUDA 5.5 is installed? VS 2012 should be able to upgrade the solution file automatically. But then you will have to make sure that all the other dependencies are available (OpenSSL, pthreads, libcurl....)

ManIkWeet

Full Member

Offline

Activity: 182
Merit: 100

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 02:10:38 PM

#3242

Quote from: cbuchner1 on January 29, 2014, 01:50:34 PM

Quote from: bigjme on January 29, 2014, 01:47:35 PM

So my 780 getting over 5 isnt too bad then

but 6 or 7 would be nicer.

I have one optimization in mind that swaps the state of threads within the lookup_gap loop. The intention is to order threads by the loop trip count (some have to run for 0 loops, others a couple more up to the specified lookup_gap). By ordering them, some of the warps will terminate much earlier and not consume any computational resources.

This would (in theory) reduce the workload nearly by factor 2, but it introduces some overhead for sorting the threads, and for shuffling the state around. Whether a net speed gain remains , that is yet to be seen.

I will save that optimization for February (it would delay this release...)

Christian

I shall happily beta test this when it is out in February Grin

BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 02:11:35 PM

#3243

Quote from: whitesand77 on January 29, 2014, 02:08:43 PM

To be more specific, I was using 4 streams on nv_scrypt_core_kernelA<ALGO_SCRYPT_JANE> and nv_scrypt_core_kernelB<ALGO_SCRYPT_JANE> inside the NVKernel::run_kernel. So those are the kernels I was referring to. Too bad the code in these kernels looks like witchcraft to me at the moment. LOL

what you have to know is that the "A" named kernels writes to the scratchpad (yes, the ENTIRE scratchpad) and kernels labeled "B" reads from random positions in the scratchpad. So there is an A->B dependency, first A has to complete before B can run.

If you still want to try running multiple streams, divide the hashing (nonce) range into 4 equally sized regions

then you can run

A -> B region 1
A -> B region 2
A -> B region 3
A -> B region 4

all simultaneously on 4 streams, as their scratchpad areas do not overlap. But somehow I do not see the advantage of this. Typically launch configurations are determined that a single stream is already fully loading the GPU's multiprocessors.

I currently use two streams for different hashing (nonce) ranges, but a fully concurrent execution would only be allowed if you allocated several scratchpads (one per stream). Considering that the video card memory is a scarce resource this is probably not the best idea. Especially with scrypt-jane coins this is a problem.

Christian

patoberli

Member

Offline

Activity: 106
Merit: 10

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 02:15:16 PM

#3244

Getting a stable 1.35 - 1.45 kh/s on my GT-640 with default clocks and mining YaCoin.
There are a few "does not validate on CPU" results though, and they also don't seem to count:

Code:

[2014-01-29 15:11:26] GPU #1: GeForce GT 640 result does not validate on CPU (i=23, s=0)!
[2014-01-29 15:11:27] GPU #1: GeForce GT 640, 1.45 khash/s
[2014-01-29 15:11:27] accepted: 51/51 (100.00%), 1.45 khash/s (yay!!!)

Otherwise it's running smooth on Windows. Build is built today.
Start Parameters:
cudaminer.exe -a scrypt-jane -i 0 -l K27x2 -o http://yac.coinmine.pl:8882 -O ...:... -H 2 -d 1

YAC: YA86YiWSvWEGSSSerPTMy4kwndabRUNftf
BTC: 16NqvkYbKMnonVEf7jHbuWURFsLeuTRidX
LTC: LTKCoiDwqEjaRCoNXfFhDm9EeWbGWouZjE

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 02:16:38 PM

#3245

Quote from: patoberli on January 29, 2014, 02:15:16 PM

Getting a stable 1.35 - 1.45 kh/s on my GT-640 with default clocks and mining YaCoin.
There are a few "does not validate on CPU" results though, and they also don't seem to count:

I wish I knew what is causing these... try passing a -b 8192 for a bit more speed.

patoberli

Member

Offline

Activity: 106
Merit: 10

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 03:39:38 PM

#3246

Thanks, has the -b parameter an influence when running it -i 0?
In any case, it didn't change the speed visibly.

YAC: YA86YiWSvWEGSSSerPTMy4kwndabRUNftf
BTC: 16NqvkYbKMnonVEf7jHbuWURFsLeuTRidX
LTC: LTKCoiDwqEjaRCoNXfFhDm9EeWbGWouZjE

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 04:21:19 PM

#3247

Quote from: patoberli on January 29, 2014, 03:39:38 PM

Thanks, has the -b parameter an influence when running it -i 0?
In any case, it didn't change the speed visibly.

-b still has an influence, as in reducing overhead for CUDA kernel calls. Bigger chunks of data to work with means less overhead.

apluscarp

Newbie

Offline

Activity: 12
Merit: 0

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 04:55:17 PM

#3248

Where can I find the 2014-01-17 version?

bigjme

Sr. Member

Offline

Activity: 350
Merit: 250

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 05:29:55 PM
Last edit: January 29, 2014, 05:49:19 PM by bigjme

#3249

Results from my latest build, here is my launch config i stuck with and the results for -L2 to -L6

./cudaminer -a scrypt-jane -H 0 -i 0 -d 0 -l T138x2 -o http://127.0.0.1:3339 -u user -p pass -D -L4

-L2 T68x2 - 4.30-4.46
-L3 T68x3 - 4.61-4.87
-L4 T138x2 - 4.89-5.29 - avg. 4.98
-L5 T69x4 - 4.90-5.3 - avg. 5.05
-L6 T108x4 - 4.64-5.2 - avg. 4.65

running with -L4 i have noticed that it has now settled down at 4.96 - 5.05 being on for 4 hours, and i still have a lot of system usage back. so im wondering how much it is actually using

memory wise it is using 2412MiB / 3071MiB
not sure on actual gpu usage. wonder if i could get more memory usage then i am now, may get me more out of it

Owner of: cudamining.co.uk

whitesand77

Full Member

Offline

Activity: 125
Merit: 100

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 05:38:28 PM

#3250

Quote from: cbuchner1 on January 29, 2014, 02:11:35 PM

But somehow I do not see the advantage of this. Typically launch configurations are determined that a single stream is already fully loading the GPU's multiprocessors.

If this were true my stream test would have given me a lower hash rate due to overhead but it doubled. Just because MSI Afterburner or another program is reporting a 90 something % GPU usage doesn't mean streams won't help. When I ran the two kernels I ran all A's first, synced, then the B's. So as long as the batch size for the NFactor is small enough to spawn off enough kernels, they'll run concurrently. Again, another optimization that won't work so well for lower NFactors. But I was actually seeing 99-100% GPU usage with the doubled hash rate. I had the same thought as you before I discovered streams but the NVIDIA Visual Profiler and the sample code convinced me otherwise. I had another raster compression routine when streamed give me a 40% increase when I thought it was already maxed out.

I'm just talking this out here. I know with the current state it won't be valid results due to the kernels tripping all over each others memory space. I just wanted to see the potential.

I'm going to take your suggestion, since you know the code, and see if I can understand it well enough to break it up into 4 regions. So far, I can tell this will be a steep learning curve.

Thanks

Joe

13G

Newbie

Offline

Activity: 17
Merit: 0

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 09:03:10 PM

#3251

Quote from: bigjme on January 29, 2014, 05:29:55 PM

Results from my latest build, here is my launch config i stuck with and the results for -L2 to -L6

./cudaminer -a scrypt-jane -H 0 -i 0 -d 0 -l T138x2 -o http://127.0.0.1:3339 -u user -p pass -D -L4

-L2 T68x2 - 4.30-4.46
-L3 T68x3 - 4.61-4.87
-L4 T138x2 - 4.89-5.29 - avg. 4.98
-L5 T69x4 - 4.90-5.3 - avg. 5.05
-L6 T108x4 - 4.64-5.2 - avg. 4.65

running with -L4 i have noticed that it has now settled down at 4.96 - 5.05 being on for 4 hours, and i still have a lot of system usage back. so im wondering how much it is actually using

memory wise it is using 2412MiB / 3071MiB
not sure on actual gpu usage. wonder if i could get more memory usage then i am now, may get me more out of it

Great improvement! Thank you!
GTX TITAN with "-a scrypt-jane -d 0 -i 0 -H 2 -C 0 -m 0 -b 32768 -L 5 -l T69x4 -s 120" now 4.7khash/s !

bigjme

Sr. Member

Offline

Activity: 350
Merit: 250

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 09:05:10 PM

#3252

No problem.
I am hoping to find a way to allocate some more memory and gpu power. And with the improvement christian has in store it should jump up a lot more

Owner of: cudamining.co.uk

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 09:12:06 PM

#3253

Quote from: bigjme on January 29, 2014, 09:05:10 PM

No problem.
I am hoping to find a way to allocate some more memory and gpu power. And with the improvement christian has in store it should jump up a lot more

make that a "might" jump up a lot more.

I've had my fair share of optimization failures...

bigjme

Sr. Member

Offline

Activity: 350
Merit: 250

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 09:13:28 PM

#3254

Even getting it to use more memory should give me an increase. I say should

Owner of: cudamining.co.uk

bathrobehero

Legendary

Offline

Activity: 2002
Merit: 1051

ICO? Not even once.

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 09:24:17 PM

#3255

What's the reason behind failing to allocate more than 3GB of VRAM on Titans?
It seems that wherever you look, games, applications, whatever it always has problems on that front.

Not your keys, not your coins!

bigjme

Sr. Member

Offline

Activity: 350
Merit: 250

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 09:26:51 PM

#3256

I believe its to do with the memory bus speed limiting the amount of memory it can use

Owner of: cudamining.co.uk

ManIkWeet

Full Member

Offline

Activity: 182
Merit: 100

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 09:53:04 PM

#3257

Quote from: bigjme on January 29, 2014, 09:26:51 PM

I believe its to do with the memory bus speed limiting the amount of memory it can use

You have any idea how logical that sounds?
/sarcasm off
Probably has to do with the whole 32/64 bit thing, running a x64 build doesn't nessecarily fix that either.

BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)

bigjme

Sr. Member

Offline

Activity: 350
Merit: 250

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 10:02:57 PM

#3258

Repeating what someone else said lmao. Sarcasm not needed

Owner of: cudamining.co.uk

cbuchner1 (OP)

Hero Member

Offline

Activity: 756
Merit: 502

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 10:04:03 PM
Last edit: January 29, 2014, 11:15:47 PM by cbuchner1

#3259

Quote from: ghur on January 29, 2014, 07:25:10 AM

cbuchner1, did you note my earlier post about autotune problems and K kernel performance regression?

okay, I have just replaced the ailing PSU in my main development PC, which allows me to put more stress on the GPUs again without it turning off unexpectedly.

So that regression really is bad. 254 kHash/s to 204 kHash/s with same kernel launch parameters between 2013-12-18 and current github.
That's a 20% drop in performance. I might play around a bit to see what I can find.

I did not find the same problem with the T kernel, even though it underwent very similar changes!

EDIT1: the majority of the discrepancy stems from my redefinition of what "warp" means in Dave's Kepler kernel (to be more in line with the CUDA definition of a warp) Hence the equivalent launch config for the current github release has to use four times the number of blocks to be comparable. So I have to go from -l K7x32 to -l K28x32. Then I end up with a drop from 254 kHash/s to 220 kHash/s only. Still bad, but not quite that much.

EDIT2: I find my "simplifications" in read_keys_direct and write_keys_direct to be the culprit. Turns out this has a huge performance impact, despite requiring much less instructions.

Christian

ollyweg

Newbie

Offline

Activity: 19
Merit: 0

Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]

January 29, 2014, 10:40:11 PM

#3260

Hi I was wondering if you guys are doing only simple overclocking with afterburner or forced p-states.
On my EVGA 660Ti I use nvidia inspector to force p2-state which gets me up to 1215MHz (300MHz over stock).
Also my hashrate got a bit more stable with this and autotuning gets much more precise results since then.

This gets me 340khash/s for scrypt and 3.5khash/s for scrypt-jane.
This is with the 2014-1-22 version.
So I´m wondering why I´m actually still below a value from the scrypt-jane spreadsheet which apparently uses almost no OC.
I´ve tested lots of kernel cfgs but I can´t seem to get any higher.

Any ideas?

My exact config:
scrypt: --interactive=0 --hash-parallel=2 --launch-config=Y112x2 --texture-cache=1 --single-memory=0
scrypt-jane: --interactive=0 --hash-parallel=1 --launch-config=K7x23 --texture-cache=0 --single-memory=0 --lookup-gap=3

OC:
P0: clock-offset +160; mem-offset +300; power-target 153% (actually uses about 142% TDP)
P2: clock 1215MHz; forced
Driver: 332.21