bigjme
|
 |
January 29, 2014, 11:22:21 AM |
|
Is that windows or linux? My 780 gets 3.77khash/s and sometimes higher with only 3gb memory. On linux that it
I found it best to run a stock config and check how much memory was used. Then up it till its close to full and compare the rates. I believe my yacoin is 20x1 right now
|
Owner of: cudamining.co.uk
|
|
|
ktf
Newbie
Offline
Activity: 24
Merit: 0
|
 |
January 29, 2014, 11:25:48 AM |
|
3.7kh/s seems a bit low on 780 seeing how my GTX 660 gets 3.3 or so @ 1200mhz.
|
|
|
|
bigjme
|
 |
January 29, 2014, 11:27:54 AM |
|
My 780 is at stock with a build from the start of january as I havent botherd to compile a new one yet.
It is using 2.89gb of memory which is the main limiter
|
Owner of: cudamining.co.uk
|
|
|
ManIkWeet
|
 |
January 29, 2014, 11:36:41 AM |
|
My 780 is at stock with a build from the start of january as I havent botherd to compile a new one yet.
It is using 2.89gb of memory which is the main limiter
My GTX 780 hits 4.1khash/s in interactive and 4.4khash/s when not interactive, it has a stock overclock though... (asus one)
|
BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)
|
|
|
13G
Newbie
Offline
Activity: 17
Merit: 0
|
 |
January 29, 2014, 11:37:19 AM |
|
Is that windows or linux? My 780 gets 3.77khash/s and sometimes higher with only 3gb memory. On linux that it
I found it best to run a stock config and check how much memory was used. Then up it till its close to full and compare the rates. I believe my yacoin is 20x1 right now
Win7 x64, 332.21, memory usage 2360MB from 6144MB GTX 780 hits 4.4khash/s  post please parameters, gpu Mhz and build..incredible :-)
|
|
|
|
bigjme
|
 |
January 29, 2014, 11:39:13 AM |
|
What build are you using and what os? i just compiled the latest cudaminer from github, running with this config i still only get 3.76khash/s ./cudaminer --algo=scrypt-jane -H 0 -i 0 -d 0 -l T20x1 -o http://127.0.0.1:3339 -u user -p pass -D i may be missing some parameters, like i don't have -c set
|
Owner of: cudamining.co.uk
|
|
|
cbuchner1 (OP)
|
 |
January 29, 2014, 11:52:15 AM Last edit: January 29, 2014, 01:46:31 PM by cbuchner1 |
|
What build are you using and what os? i just compiled the latest cudaminer from github, running with this config i still only get 3.76khash/s ./cudaminer --algo=scrypt-jane -H 0 -i 0 -d 0 -l T20x1 -o http://127.0.0.1:3339 -u user -p pass -D i may be missing some parameters, like i don't have -c set you want a lookup-gap of up to 6 on GTX 780 cards, specify it with the -L parameter. Try with -L 2 first, let it autotune and increase the lookup gap one by one. WARNING: autotuning may take long with enabled gap. Stop when you find a power consumption vs kHash/s rate that suits you. I find that a my 660Ti makes my GTX 780 card look poor in comparison. Not sure why that is, exactly (as the 780 has Compute 3.5 and way more SMX'es to work with). Christian
|
|
|
|
bigjme
|
 |
January 29, 2014, 12:20:23 PM Last edit: January 29, 2014, 12:38:59 PM by bigjme |
|
Yes that is weird. Im going to do a full run from -L 2 to -L 6 and will post what I get for each.
How much is your 660Ti getting christian?
|
Owner of: cudamining.co.uk
|
|
|
Espie
Newbie
Offline
Activity: 5
Merit: 0
|
 |
January 29, 2014, 12:51:44 PM |
|
|
|
|
|
bigjme
|
 |
January 29, 2014, 01:08:46 PM |
|
Ok so just a quick drop of numbers before I post the results tonight. Latest cudaminer on my 780 is now getting 5.03khash/s
I have use of my desktop with it. And that is with T69x4 and -L5. That and my cpu is slightly more free now so my cpu does 0.72khash/s constant now. So ive gone from 4.3khash/s to 5.8khash/s
Not a bad jump
|
Owner of: cudamining.co.uk
|
|
|
ghur
|
 |
January 29, 2014, 01:27:15 PM |
|
<snip>
Alright, thank you. Sounds reasonable enough for me 
|
doge: D8q8dR6tEAcaJ7U65jP6AAkiiL2CFJaHah Automated faucet, pays daily: Qoinpro
|
|
|
whitesand77
|
 |
January 29, 2014, 01:30:34 PM |
|
I've been experimenting with streams on the Y kernel. So far I've tested this on YAC and got 5.3 khash/s on my 660 Ti. Too bad it doesn't validate on the CPU though. The kernel must not be concurrent safe, =).
|
|
|
|
cbuchner1 (OP)
|
 |
January 29, 2014, 01:43:53 PM |
|
I've been experimenting with streams on the Y kernel. So far I've tested this on YAC and got 5.3 khash/s on my 660 Ti. Too bad it doesn't validate on the CPU though. The kernel must not be concurrent safe, =).
yes. right. there is one scratchpad but two streams. The scrypt_core kernels have to be serialized, or they would destroy each other's scratchpad. This is why I am using CUDA events. Some overlap of memcpy and kernels would be desired (not happening now due to issue order of commands), and possibly the SHA256/Keccak kernels of one stream could be executed concurrently with the scrypt_core kernels of the other stream. This is also not happening now because my CUDA events currently also serialize these (need to change when events are generated and synchronized upon). I intend to get rid of memcpy alltogether by checking hashes on the GPU instead, so the memcpy/kernel overlap issue is moot. Christian
|
|
|
|
cbuchner1 (OP)
|
 |
January 29, 2014, 01:45:38 PM |
|
get it from github, or wait for the next official release (only a few more days...)
|
|
|
|
cbuchner1 (OP)
|
 |
January 29, 2014, 01:45:55 PM |
|
Yes that is weird. Im going to do a full run from -L 2 to -L 6 and will post what I get for each.
How much is your 660Ti getting christian?
3.7 kHash/s give or take.
|
|
|
|
bigjme
|
 |
January 29, 2014, 01:47:35 PM |
|
So my 780 getting over 5 isnt too bad then
|
Owner of: cudamining.co.uk
|
|
|
cbuchner1 (OP)
|
 |
January 29, 2014, 01:50:34 PM |
|
So my 780 getting over 5 isnt too bad then
but 6 or 7 would be nicer. I have one optimization in mind that swaps the state of threads within the lookup_gap loop. The intention is to order threads by the loop trip count (some have to run for 0 loops, others a couple more up to the specified lookup_gap). By ordering them, some of the warps will terminate much earlier and not consume any computational resources. This would (in theory) reduce the workload nearly by factor 2, but it introduces some overhead for sorting the threads, and for shuffling the state around. Whether a net speed gain remains , that is yet to be seen. I will save that optimization for February (it would delay this release...) Christian
|
|
|
|
bigjme
|
 |
January 29, 2014, 01:53:16 PM Last edit: January 29, 2014, 02:09:33 PM by bigjme |
|
That would be nice to see I will gladly test it christian if you want to sent it through while your working on it
|
Owner of: cudamining.co.uk
|
|
|
Espie
Newbie
Offline
Activity: 5
Merit: 0
|
 |
January 29, 2014, 02:04:00 PM |
|
get it from github, or wait for the next official release (only a few more days...) I have Visual Studio 2012, but I can't load the solution file. So I probably have to wait a few more days.
|
|
|
|
whitesand77
|
 |
January 29, 2014, 02:08:43 PM |
|
I've been experimenting with streams on the Y kernel. So far I've tested this on YAC and got 5.3 khash/s on my 660 Ti. Too bad it doesn't validate on the CPU though. The kernel must not be concurrent safe, =).
yes. right. there is one scratchpad but two streams. The scrypt_core kernels have to be serialized, or they would destroy each other's scratchpad. This is why I am using CUDA events. Some overlap of memcpy and kernels would be desired (not happening now due to issue order of commands), and possibly the SHA256/Keccak kernels of one stream could be executed concurrently with the scrypt_core kernels of the other stream. This is also not happening now because my CUDA events currently also serialize these (need to change when events are generated and synchronized upon). I intend to get rid of memcpy alltogether by checking hashes on the GPU instead, so the memcpy/kernel overlap issue is moot. Christian To be more specific, I was using 4 streams on nv_scrypt_core_kernelA<ALGO_SCRYPT_JANE> and nv_scrypt_core_kernelB<ALGO_SCRYPT_JANE> inside the NVKernel::run_kernel. So those are the kernels I was referring to. Too bad the code in these kernels looks like witchcraft to me at the moment. LOL
|
|
|
|
|