Bitcoin Forum
April 24, 2024, 03:03:42 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 [162] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 ... 1135 »
  Print  
Author Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]  (Read 3426868 times)
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 29, 2014, 11:22:21 AM
 #3221

Is that windows or linux? My 780 gets 3.77khash/s and sometimes higher with only 3gb memory. On linux that it

I found it best to run a stock config and check how much memory was used. Then up it till its close to full and compare the rates. I believe my yacoin is 20x1 right now

Owner of: cudamining.co.uk
1713971022
Hero Member
*
Offline Offline

Posts: 1713971022

View Profile Personal Message (Offline)

Ignore
1713971022
Reply with quote  #2

1713971022
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
ktf
Newbie
*
Offline Offline

Activity: 24
Merit: 0


View Profile
January 29, 2014, 11:25:48 AM
 #3222

3.7kh/s seems a bit low on 780 seeing how my GTX 660 gets 3.3 or so @ 1200mhz.
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 29, 2014, 11:27:54 AM
 #3223

My 780 is at stock with a build from the start of january as I havent botherd to compile a new one yet.

It is using 2.89gb of memory which is the main limiter

Owner of: cudamining.co.uk
ManIkWeet
Full Member
***
Offline Offline

Activity: 182
Merit: 100


View Profile
January 29, 2014, 11:36:41 AM
 #3224

My 780 is at stock with a build from the start of january as I havent botherd to compile a new one yet.

It is using 2.89gb of memory which is the main limiter
My GTX 780 hits 4.1khash/s in interactive and 4.4khash/s when not interactive, it has a stock overclock though... (asus one)

BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)
13G
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
January 29, 2014, 11:37:19 AM
 #3225

Is that windows or linux? My 780 gets 3.77khash/s and sometimes higher with only 3gb memory. On linux that it

I found it best to run a stock config and check how much memory was used. Then up it till its close to full and compare the rates. I believe my yacoin is 20x1 right now

Win7 x64, 332.21, memory usage 2360MB from 6144MB


GTX 780 hits 4.4khash/s Huh

post please parameters, gpu Mhz and build..incredible :-)
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 29, 2014, 11:39:13 AM
 #3226

What build are you using and what os?
i just compiled the latest cudaminer from github, running with this config i still only get 3.76khash/s

./cudaminer --algo=scrypt-jane -H 0 -i 0 -d 0 -l T20x1 -o http://127.0.0.1:3339 -u user -p pass -D

i may be missing some parameters, like i don't have -c set

Owner of: cudamining.co.uk
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 29, 2014, 11:52:15 AM
Last edit: January 29, 2014, 01:46:31 PM by cbuchner1
 #3227

What build are you using and what os?
i just compiled the latest cudaminer from github, running with this config i still only get 3.76khash/s

./cudaminer --algo=scrypt-jane -H 0 -i 0 -d 0 -l T20x1 -o http://127.0.0.1:3339 -u user -p pass -D

i may be missing some parameters, like i don't have -c set

you want a lookup-gap of up to 6 on GTX 780 cards, specify it with the -L parameter.

Try with -L 2 first, let it autotune and increase the lookup gap one by one.
WARNING: autotuning may take long with enabled gap.

Stop when you find a power consumption vs kHash/s rate that suits you.

I find that a my 660Ti makes my GTX 780 card look poor in comparison. Not sure why that is, exactly (as the 780 has Compute 3.5 and way more SMX'es to work with).

Christian
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 29, 2014, 12:20:23 PM
Last edit: January 29, 2014, 12:38:59 PM by bigjme
 #3228

Yes that is weird. Im going to do a full run from -L 2 to -L 6 and will post what I get for each.

How much is your 660Ti getting christian?

Owner of: cudamining.co.uk
Espie
Newbie
*
Offline Offline

Activity: 5
Merit: 0


View Profile
January 29, 2014, 12:51:44 PM
 #3229

Hi Christian,

I noticed some nice development on the Nvidia Developer Zone. When will you make this version available?
https://devtalk.nvidia.com/default/topic/643428/cuda-programming-and-performance/could-anyone-benchmark-this-for-me-on-a-780-ti-or-titan-/

Dennis

bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 29, 2014, 01:08:46 PM
 #3230

Ok so just a quick drop of numbers before I post the results tonight. Latest cudaminer on my 780  is now getting 5.03khash/s

I have use of my desktop with it. And that is with T69x4 and -L5. That and my cpu is slightly more free now so my cpu does 0.72khash/s constant now. So ive gone from 4.3khash/s to 5.8khash/s

Not a bad jump

Owner of: cudamining.co.uk
ghur
Full Member
***
Offline Offline

Activity: 154
Merit: 100


View Profile
January 29, 2014, 01:27:15 PM
 #3231

<snip>

Alright, thank you.

Sounds reasonable enough for me Smiley

doge: D8q8dR6tEAcaJ7U65jP6AAkiiL2CFJaHah
Automated faucet, pays daily: Qoinpro
whitesand77
Full Member
***
Offline Offline

Activity: 125
Merit: 100


View Profile
January 29, 2014, 01:30:34 PM
 #3232

I've been experimenting with streams on the Y kernel.  So far I've tested this on YAC and got 5.3 khash/s on my 660 Ti.  Too bad it doesn't validate on the CPU though.  The kernel must not be concurrent safe, =). 
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 29, 2014, 01:43:53 PM
 #3233

I've been experimenting with streams on the Y kernel.  So far I've tested this on YAC and got 5.3 khash/s on my 660 Ti.  Too bad it doesn't validate on the CPU though.  The kernel must not be concurrent safe, =).  

yes. right. there is one scratchpad but two streams. The scrypt_core kernels have to be serialized, or they would destroy each other's scratchpad. This is why I am using CUDA events.  

Some overlap of memcpy and kernels would be desired (not happening now due to issue order of commands), and possibly the SHA256/Keccak kernels of one stream could be executed concurrently with the scrypt_core kernels of the other stream. This is also not happening now because my CUDA events currently also serialize these (need to change when events are generated and synchronized upon).

I intend to get rid of memcpy alltogether by checking hashes on the GPU instead, so the memcpy/kernel overlap issue is moot.

Christian
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 29, 2014, 01:45:38 PM
 #3234

Hi Christian,

I noticed some nice development on the Nvidia Developer Zone. When will you make this version available?
https://devtalk.nvidia.com/default/topic/643428/cuda-programming-and-performance/could-anyone-benchmark-this-for-me-on-a-780-ti-or-titan-/

Dennis


get it from github, or wait for the next official release (only a few more days...)
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 29, 2014, 01:45:55 PM
 #3235

Yes that is weird. Im going to do a full run from -L 2 to -L 6 and will post what I get for each.

How much is your 660Ti getting christian?

3.7 kHash/s give or take.
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 29, 2014, 01:47:35 PM
 #3236

So my 780 getting over 5 isnt too bad then

Owner of: cudamining.co.uk
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
January 29, 2014, 01:50:34 PM
 #3237

So my 780 getting over 5 isnt too bad then

but 6 or 7 would be nicer.

I have one optimization in mind that swaps the state of threads within the lookup_gap loop. The intention is to order threads by the loop trip count (some have to run for 0 loops, others a couple more up to the specified lookup_gap). By ordering them, some of the warps will terminate much earlier and not consume any computational resources.

This would (in theory) reduce the workload nearly by factor 2, but it introduces some overhead for sorting the threads, and for shuffling the state around. Whether a net speed gain remains ,  that is yet to be seen.

I will save that optimization for February (it would delay this release...)

Christian

bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 29, 2014, 01:53:16 PM
Last edit: January 29, 2014, 02:09:33 PM by bigjme
 #3238

That would be nice to see
I will gladly test it christian if you want to sent it through while your working on it

Owner of: cudamining.co.uk
Espie
Newbie
*
Offline Offline

Activity: 5
Merit: 0


View Profile
January 29, 2014, 02:04:00 PM
 #3239

Hi Christian,

I noticed some nice development on the Nvidia Developer Zone. When will you make this version available?
https://devtalk.nvidia.com/default/topic/643428/cuda-programming-and-performance/could-anyone-benchmark-this-for-me-on-a-780-ti-or-titan-/

Dennis


get it from github, or wait for the next official release (only a few more days...)

I have Visual Studio 2012, but I can't load the solution file. So I probably have to wait a few more days.
whitesand77
Full Member
***
Offline Offline

Activity: 125
Merit: 100


View Profile
January 29, 2014, 02:08:43 PM
 #3240

I've been experimenting with streams on the Y kernel.  So far I've tested this on YAC and got 5.3 khash/s on my 660 Ti.  Too bad it doesn't validate on the CPU though.  The kernel must not be concurrent safe, =).  

yes. right. there is one scratchpad but two streams. The scrypt_core kernels have to be serialized, or they would destroy each other's scratchpad. This is why I am using CUDA events.  

Some overlap of memcpy and kernels would be desired (not happening now due to issue order of commands), and possibly the SHA256/Keccak kernels of one stream could be executed concurrently with the scrypt_core kernels of the other stream. This is also not happening now because my CUDA events currently also serialize these (need to change when events are generated and synchronized upon).

I intend to get rid of memcpy alltogether by checking hashes on the GPU instead, so the memcpy/kernel overlap issue is moot.

Christian


To be more specific, I was using 4 streams on nv_scrypt_core_kernelA<ALGO_SCRYPT_JANE> and nv_scrypt_core_kernelB<ALGO_SCRYPT_JANE> inside the NVKernel::run_kernel.  So those are the kernels I was referring to.  Too bad the code in these kernels looks like witchcraft to me at the moment. LOL
Pages: « 1 ... 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 [162] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 ... 1135 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!