Bitcoin Forum
April 25, 2019, 03:35:47 AM *
News: Latest Bitcoin Core release: 0.17.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 [164] 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 ... 1136 »
  Print  
Author Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]  (Read 3408493 times)
apluscarp
Newbie
*
Offline Offline

Activity: 12
Merit: 0


View Profile
January 29, 2014, 04:55:17 PM
 #3261

Where can I find the 2014-01-17 version?
1556163347
Hero Member
*
Offline Offline

Posts: 1556163347

View Profile Personal Message (Offline)

Ignore
1556163347
Reply with quote  #2

1556163347
Report to moderator
1556163347
Hero Member
*
Offline Offline

Posts: 1556163347

View Profile Personal Message (Offline)

Ignore
1556163347
Reply with quote  #2

1556163347
Report to moderator
100% New Software
PC, Mac, Android, & HTML5 Clients
Krill Rakeback
Low Rake
Bitcoin Poker 3.0
Bad Beat Jackpot
SwC Poker Relaunch
PLAY NOW
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 29, 2014, 05:29:55 PM
Last edit: January 29, 2014, 05:49:19 PM by bigjme
 #3262

Results from my latest build, here is my launch config i stuck with and the results for -L2 to -L6

./cudaminer -a scrypt-jane -H 0 -i 0 -d 0 -l T138x2 -o http://127.0.0.1:3339 -u user -p pass -D -L4

-L2 T68x2 - 4.30-4.46
-L3 T68x3 - 4.61-4.87
-L4 T138x2 - 4.89-5.29 - avg. 4.98
-L5 T69x4 - 4.90-5.3 - avg. 5.05
-L6 T108x4 - 4.64-5.2 - avg. 4.65

running with -L4 i have noticed that it has now settled down at 4.96 - 5.05 being on for 4 hours, and i still have a lot of system usage back. so im wondering how much it is actually using

memory wise it is using 2412MiB /  3071MiB
not sure on actual gpu usage. wonder if i could get more memory usage then i am now, may get me more out of it

Owner of: cudamining.co.uk
whitesand77
Full Member
***
Offline Offline

Activity: 125
Merit: 100


View Profile
January 29, 2014, 05:38:28 PM
 #3263

But somehow I do not see the advantage of this. Typically launch configurations are determined that a single stream is already fully loading the GPU's multiprocessors.

If this were true my stream test would have given me a lower hash rate due to overhead but it doubled.  Just because MSI Afterburner or another program is reporting a 90 something % GPU usage doesn't mean streams won't help.  When I ran the two kernels I ran all A's first, synced, then the B's.  So as long as the batch size for the NFactor is small enough to spawn off enough kernels, they'll run concurrently.  Again, another optimization that won't work so well for lower NFactors.  But I was actually seeing 99-100% GPU usage with the doubled hash rate.  I had the same thought as you before I discovered streams but the NVIDIA Visual Profiler and the sample code convinced me otherwise.  I had another raster compression routine when streamed give me a 40% increase when I thought it was already maxed out.

I'm just talking this out here.  I know with the current state it won't be valid results due to the kernels tripping all over each others memory space.  I just wanted to see the potential.

I'm going to take your suggestion, since you know the code, and see if I can understand it well enough to break it up into 4 regions.  So far, I can tell this will be a steep learning curve.

Thanks

Joe
13G
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
January 29, 2014, 09:03:10 PM
 #3264

Results from my latest build, here is my launch config i stuck with and the results for -L2 to -L6

./cudaminer -a scrypt-jane -H 0 -i 0 -d 0 -l T138x2 -o http://127.0.0.1:3339 -u user -p pass -D -L4

-L2 T68x2 - 4.30-4.46
-L3 T68x3 - 4.61-4.87
-L4 T138x2 - 4.89-5.29 - avg. 4.98
-L5 T69x4 - 4.90-5.3 - avg. 5.05
-L6 T108x4 - 4.64-5.2 - avg. 4.65

running with -L4 i have noticed that it has now settled down at 4.96 - 5.05 being on for 4 hours, and i still have a lot of system usage back. so im wondering how much it is actually using

memory wise it is using 2412MiB /  3071MiB
not sure on actual gpu usage. wonder if i could get more memory usage then i am now, may get me more out of it


Great improvement! Thank you!
GTX TITAN  with "-a scrypt-jane -d 0 -i 0 -H 2 -C 0 -m 0 -b 32768 -L 5 -l T69x4 -s 120" now 4.7khash/s !
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 29, 2014, 09:05:10 PM
 #3265

No problem.
I am hoping to find a way to allocate some more memory and gpu power. And with the improvement christian has in store it should jump up a lot more

Owner of: cudamining.co.uk
cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500


View Profile
January 29, 2014, 09:12:06 PM
 #3266

No problem.
I am hoping to find a way to allocate some more memory and gpu power. And with the improvement christian has in store it should jump up a lot more

make that a "might" jump up a lot more.

I've had my fair share of optimization failures...
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 29, 2014, 09:13:28 PM
 #3267

Even getting it to use more memory should give me an increase. I say should

Owner of: cudamining.co.uk
bathrobehero
Legendary
*
Offline Offline

Activity: 1680
Merit: 1027


ICO? Not even once.


View Profile
January 29, 2014, 09:24:17 PM
 #3268

What's the reason behind failing to allocate more than 3GB of VRAM on Titans?
It seems that wherever you look, games, applications, whatever it always has problems on that front.

RIP Bittrex
RIP Poloniex
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 29, 2014, 09:26:51 PM
 #3269

I believe its to do with the memory bus speed limiting the amount of memory it can use

Owner of: cudamining.co.uk
ManIkWeet
Full Member
***
Offline Offline

Activity: 182
Merit: 100


View Profile
January 29, 2014, 09:53:04 PM
 #3270

I believe its to do with the memory bus speed limiting the amount of memory it can use
You have any idea how logical that sounds?
/sarcasm off
Probably has to do with the whole 32/64 bit thing, running a x64 build doesn't nessecarily fix that either.

BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 29, 2014, 10:02:57 PM
 #3271

Repeating what someone else said lmao. Sarcasm not needed

Owner of: cudamining.co.uk
cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500


View Profile
January 29, 2014, 10:04:03 PM
Last edit: January 29, 2014, 11:15:47 PM by cbuchner1
 #3272

cbuchner1, did you note my earlier post about autotune problems and K kernel performance regression?

okay, I have just replaced the ailing PSU in my main development PC, which allows me to put more stress on the GPUs again without it turning off unexpectedly.

So that regression really is bad. 254 kHash/s to 204 kHash/s with same kernel launch parameters between 2013-12-18 and current github.
That's a 20% drop in performance. I might play around a bit to see what I can find.

I did not find the same problem with the T kernel, even though it underwent very similar changes!

EDIT1: the majority of the discrepancy stems from my redefinition of what "warp" means in Dave's Kepler kernel (to be more in line with the CUDA definition of a warp) Hence the equivalent launch config for the current github release has to use four times the number of blocks to be comparable. So I have to go from -l K7x32 to -l K28x32. Then I end up with a drop from 254 kHash/s to 220 kHash/s only. Still bad, but not quite that much.

EDIT2: I find my "simplifications" in read_keys_direct and write_keys_direct to be the culprit. Turns out this has a huge performance impact, despite requiring much less instructions.

Christian

ollyweg
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile
January 29, 2014, 10:40:11 PM
 #3273

Hi I was wondering if you guys are doing only simple overclocking with afterburner or forced p-states.
On my EVGA 660Ti I use nvidia inspector to force p2-state which gets me up to 1215MHz (300MHz over stock).
Also my hashrate got a bit more stable with this and autotuning gets much more precise results since then.

This gets me 340khash/s for scrypt and 3.5khash/s for scrypt-jane.
This is with the 2014-1-22 version.
So I´m wondering why I´m actually still below a value from the scrypt-jane spreadsheet which apparently uses almost no OC.
I´ve tested lots of kernel cfgs but I can´t seem to get any higher.

Any ideas?

My exact config:
scrypt: --interactive=0 --hash-parallel=2 --launch-config=Y112x2 --texture-cache=1 --single-memory=0
scrypt-jane: --interactive=0 --hash-parallel=1 --launch-config=K7x23 --texture-cache=0 --single-memory=0 --lookup-gap=3

OC:
P0: clock-offset +160; mem-offset +300; power-target 153% (actually uses about 142% TDP)
P2: clock 1215MHz; forced
Driver: 332.21
humanitybg
Newbie
*
Offline Offline

Activity: 3
Merit: 0


View Profile
January 29, 2014, 11:33:44 PM
 #3274

Guys can some upload a newer compiled version (I read that you guys are using new ones) since I have 0 idea how to compile on Windows. I want to test out how my GTX770 will do compared to the 12-18 release. Smiley
bathrobehero
Legendary
*
Offline Offline

Activity: 1680
Merit: 1027


ICO? Not even once.


View Profile
January 30, 2014, 12:20:30 AM
 #3275

Since commit ~125, benchmark mode has gone rogue. The previous commit I complied and was fine was 114.



It's just a minor issue, but I thought it's worth mentioning.
I use solomining as benchmark now (no shares skewing with the numbers)

RIP Bittrex
RIP Poloniex
Morgahl
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
January 30, 2014, 12:21:31 AM
 #3276

Hi I was wondering if you guys are doing only simple overclocking with afterburner or forced p-states.
On my EVGA 660Ti I use nvidia inspector to force p2-state which gets me up to 1215MHz (300MHz over stock).
Also my hashrate got a bit more stable with this and autotuning gets much more precise results since then.

This gets me 340khash/s for scrypt and 3.5khash/s for scrypt-jane.
This is with the 2014-1-22 version.
So I´m wondering why I´m actually still below a value from the scrypt-jane spreadsheet which apparently uses almost no OC.
I´ve tested lots of kernel cfgs but I can´t seem to get any higher.

Any ideas?

My exact config:
scrypt: --interactive=0 --hash-parallel=2 --launch-config=Y112x2 --texture-cache=1 --single-memory=0
scrypt-jane: --interactive=0 --hash-parallel=1 --launch-config=K7x23 --texture-cache=0 --single-memory=0 --lookup-gap=3

OC:
P0: clock-offset +160; mem-offset +300; power-target 153% (actually uses about 142% TDP)
P2: clock 1215MHz; forced
Driver: 332.21

Your, Mem offset might be going the wrong direction. There is zero bottleneck for Scrypt-Jane (@ -L 1) and almost none for Scrypt or Scrypt-Jane (@ -L 2+) in terms of memory throughput most times.

This may just be a Titan specific observation, but I literally see zero change to hash rate swinging from -502 to +300 in 100 steps, none. All a positive memclock did was raise the TDP utilization of the card for zero gain. You might find that lowering the Memclock will allow a more stable feed to the GPU and allow a higher clock as a result. If not at least you will see a reduction in TDP % saving you power costs in the long run.
cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500


View Profile
January 30, 2014, 12:39:05 AM
 #3277

cbuchner1, did you note my earlier post about autotune problems and K kernel performance regression?

So that regression really is bad. 254 kHash/s to 204 kHash/s with same kernel launch parameters between 2013-12-18 and current github.
That's a 20% drop in performance. I might play around a bit to see what I can find.


It's fixed! A radical simplification I made some time ago just turned out to having been a radical speed brake Wink

The K, T, X kernels are 10-15% faster now. Definitely for scrypt - but also slightly faster for scrypt-jane.

Too bad that for scrypt you would usually take the new Y, Z kernels (or the good old F kernel for Fermi).
So the overall benefit isn't that great.

Christian
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 30, 2014, 12:42:47 AM
Last edit: January 30, 2014, 12:59:25 AM by bigjme
 #3278

I will download and give it a test :-)

Edit: ok so it took 7 attempts and 3 downloads to get the latest build to compile. Bare in mind I have compiled every other release in 1 attempt. It was giving different errors constantly and even sometimes doing nothing

Finally got it to compile and found a drop in performance.
4.77khash/s for the latest
5.04khash/s on the one I built around 12 hours ago

Thats on yacoin with the same configuration as I posted earlier. It was not left to autotune I just ran the settings I did earlier

Owner of: cudamining.co.uk
cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500


View Profile
January 30, 2014, 01:18:37 AM
 #3279

I will download and give it a test :-)

Edit: ok so it took 7 attempts and 3 downloads to get the latest build to compile. Bare in mind I have compiled every other release in 1 attempt. It was giving different errors constantly and even sometimes doing nothing

Finally got it to compile and found a drop in performance.
4.77khash/s for the latest
5.04khash/s on the one I built around 12 hours ago

Thats on yacoin with the same configuration as I posted earlier. It was not left to autotune I just ran the settings I did earlier

For scrypt-jane I am seeing mixed results. GT 640 (Compute 3.5) seems a little slower, GTX 780 TIs on Linux slightly faster.
Maybe I can later offer an option to switch between these two (totally different) memory access schemes to the scratchpad.

I am also wondering if you guys still see non-validating shares in conjunction with the lookup gap after this change?
 I can't see much activity when solo-mining, but the pool miners will notice.

Christian
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 30, 2014, 01:24:36 AM
 #3280

Solo mining myself, sorry
Again I will gladly test anything new you release. For anyone wondering my gpu is a evga gtx 780 hydro copper

Seems between the 660, 780, and 780ti there are very mixed results. Wish yacoin hadnt dropped in price so much. 1800 coins is only worth £27 now. Return on them seems very low

Owner of: cudamining.co.uk
Pages: « 1 ... 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 [164] 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 ... 1136 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!