bathrobehero
Legendary
Offline
Activity: 2002
Merit: 1051
ICO? Not even once.
|
|
January 18, 2014, 08:31:23 PM |
|
Also some of you might want to check if it works for you to specify --algo=scrypt:2048 (or whatever "N" value it is currently at) to mine VertCoin. You can now directly give the N parameter if needed (not the N-factor like with scrypt-jane).
It starts hashing, but as soon as it would found/check a share it crashes, even with different scrypt arguments:
|
Not your keys, not your coins!
|
|
|
orrett3
Newbie
Offline
Activity: 33
Merit: 0
|
|
January 18, 2014, 10:47:36 PM |
|
Hey guys, i've been running cuda miner for a very long time now and would like to say thanks to whoever contributed and also cbuchner1. Right now i've compiled the latest source and this is what I've been getting, mining yacoin with a gtx 770 2GB card. originally autotune tuned it to 37x1 up from 9x1 on the last official release, but i was able to manually configure it to 40x1, so autotune is a little off. Also if i do go to 41 i get an error message that is spammed on the screen. EDIT: the error message is [2014-01-18 17:51:00] GPU #0: cudaError 4 (unspecified launch failure) calling ' cudaEventRecord(context_serialize[stream][thr_id], context_streams[stream][thr_i d])' (C:/Users/Orrett3/Desktop/Build CudaMiner/source/salsa_kernel.cu line 820) config: -i 1 -b 32768 -C 1 -l K40x1 Mem usage: 1561 MB Utilization: 99% Core offset: +160 Mem offset: -502 As you can see i have some error messages, but i don't think they are affecting the hashrate too much. Other than this do you see anything wrong with what im getting? Is there any way to get more? http://i1081.photobucket.com/albums/j348/Orrett3/MiningYacoinMax.png
|
|
|
|
bathrobehero
Legendary
Offline
Activity: 2002
Merit: 1051
ICO? Not even once.
|
|
January 18, 2014, 11:13:31 PM |
|
Hi, try autotune with -C 0.
|
Not your keys, not your coins!
|
|
|
cbuchner1 (OP)
|
|
January 18, 2014, 11:47:45 PM |
|
Try the lookup-gap now on Compute 3.0 devices (Kepler kernel). The Titan kernel will follow soon... always autotune for different gap numbers, as configurations will differ wildly
NOTE: a gap value of 1 actually means no gap. ;-) a gap value of 2 specifies that only every 2nd value is stored in the scratchpad (and the intermediate values being recomputed on the fly), cutting memory use in half. Values of up to 4 may make sense IMHO. start with 2 and work your way up...
the more SMX your card has and the less memory there is, the more benefit you may see.. power consumption may also rise... Users of 1GB and 2GB cards may finally see some better hash rates now.
|
|
|
|
ManIkWeet
|
|
January 18, 2014, 11:51:31 PM |
|
Does the lookup-gap decrease the "value" of a hash or is it only positive effects?
|
BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)
|
|
|
cbuchner1 (OP)
|
|
January 18, 2014, 11:53:30 PM |
|
Does the lookup-gap decrease the "value" of a hash or is it only positive effects?
depends entirely on the card. cannot generalize here, sorry. Also I do not recommend to use a lookup-gap with scrypt mining. I think it only has benefits with scrypt-jane.
|
|
|
|
ManIkWeet
|
|
January 19, 2014, 12:02:19 AM |
|
Does the lookup-gap decrease the "value" of a hash or is it only positive effects?
depends entirely on the card. cannot generalize here, sorry. I mean, lets say I solo mine YAC with GTX 780, and increase lookup-gap, would I find blocks more often? I don't exactly know how the hashrate works...
|
BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)
|
|
|
orrett3
Newbie
Offline
Activity: 33
Merit: 0
|
|
January 19, 2014, 12:03:15 AM |
|
Try the lookup-gap now on Compute 3.0 devices (Kepler kernel). The Titan kernel will follow soon... always autotune for different gap numbers, as configurations will differ wildly
Would i be adding that a flag on the shortcut or bat file?
|
|
|
|
cbuchner1 (OP)
|
|
January 19, 2014, 12:04:11 AM |
|
I mean, lets say I solo mine YAC with GTX 780, and increase lookup-gap, would I find blocks more often? I don't exactly know how the hashrate works...
if you get a higher kHash/s then yes... GTX 780 is a Compute 3.5 part. I haven't finished the lookup-gap for that kernel yet. I expect the higher end devices like 660Ti, 760, 770, 780, 780Ti, Geforce Titan to benefit from the lookup gap. Also the lower end cards with 1GB (e.g. GT 640 GK208 with 1 GB DDR5 memory)
|
|
|
|
cbuchner1 (OP)
|
|
January 19, 2014, 12:05:10 AM |
|
Would i be adding that a flag on the shortcut or bat file?
one of -L 2 -L 3 -L 4 added to your bat file or shortcut (on Windows). Anything else stays quite the same.
|
|
|
|
ManIkWeet
|
|
January 19, 2014, 12:06:52 AM |
|
if you get a higher kHash/s then yes...
GTX 780 is a Compute 3.5 part. I haven't finished the lookup-gap for that kernel yet.
I expect the higher end devices like 660Ti, 760, 770, 780, 780Ti to benefit from the lookup gap.
Also the lower end cards with 1GB (e.g. GT 640 GK208 with 1 GB DDR5 memory)
Very nice! I will patiently wait for you to implement it for the T kernel
|
BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)
|
|
|
cbuchner1 (OP)
|
|
January 19, 2014, 12:09:35 AM |
|
Right now i've compiled the latest source and this is what I've been getting, mining yacoin with a gtx 770 2GB card.
Other than this do you see anything wrong with what im getting? Is there any way to get more?
The values aren't stellar - but your card does not have enough RAM to make use of all its compute power. So try my new lookup gap. I have a GTX 760 with 4 GB RAM and it helped a bit. It should help quite a bit more on your 2 GB card. Pass -L 2 and autotune (preferrably with the -D flag also given so you see autotune results printed). Afterwards maybe also check -L 3
|
|
|
|
dereinehalt
Newbie
Offline
Activity: 9
Merit: 0
|
|
January 19, 2014, 12:21:05 AM Last edit: January 19, 2014, 12:35:28 AM by dereinehalt |
|
I reach 570-600 khash / s with gtx 780 Ganinwald Phanton GLH. -H 1 -D -i 0 -l T24x26 -C1 may have a use for it or any suggestions to me ^ ^
|
|
|
|
col_oddball
Newbie
Offline
Activity: 4
Merit: 0
|
|
January 19, 2014, 01:47:24 AM Last edit: January 19, 2014, 02:02:52 AM by col_oddball |
|
Does the lookup-gap decrease the "value" of a hash or is it only positive effects?
depends entirely on the card. cannot generalize here, sorry. Also I do not recommend to use a lookup-gap with scrypt mining. I think it only has benefits with scrypt-jane. That's interesting for cgminer default lookup-gap is 2 and you get increased hash rate.. I guess it comes down to how efficient the salsa20/8 implement is. The lower the number of cycles to complete (salsa20/8) then lookup gap is worthwhile. FYI: I been writing a FPGA implementation and the lookup-gap helps increase the hash rate since you can increase the number of scrypt cores. below shows what can be achieved on a Virtex 6 running @ 150MHz. total FPGA blockram memory: 1,024 kbytes, you need 128kbytes for lookup-gap=1 therefore 1024/128=8 lookup_gap 1 2 4 8 total cores: 8 16 32 64 total FPGA hash 29 49 73 98 kh/s cbuchner1: do you plan to implement lookup gap for scrypt??? cheers oddball
|
|
|
|
bathrobehero
Legendary
Offline
Activity: 2002
Merit: 1051
ICO? Not even once.
|
|
January 19, 2014, 04:39:35 AM |
|
Also some of you might want to check if it works for you to specify --algo=scrypt:2048 (or whatever "N" value it is currently at) to mine VertCoin. You can now directly give the N parameter if needed (not the N-factor like with scrypt-jane).
It starts hashing, but as soon as it would found/check a share it crashes, even with different scrypt arguments: VertCoin schedule:
|
Not your keys, not your coins!
|
|
|
coercion
Newbie
Offline
Activity: 34
Merit: 0
|
|
January 19, 2014, 06:14:49 AM |
|
I've been playing with the lookup gap. My GTX 780 went from 3.7 kH/s to 5.0 kH/s on Yacoin with -L 5. (-L 4 produces 4.948, -L 3 4.535, and -L 2 was almost no improvement) My GT 640s received no benefit at N Factor 14. At N Factor 15 they produce 0.684 kH/s. If I recall they maxed out at at 0.6 previously. I've been mining an N Factor 13 coin as of late, and my 780 went from 10.7 to 16.0 kH/s with -L 3. Cudaminer does not fair well with NF=13 when I try to autotune with -L 3 if I don't also specify -lT. Masochist:CudaMiner mark$ ./cudaminer --algo=scrypt-jane:13 -d0 -m1 -i0 -L3 --benchmark -D *** CudaMiner for nVidia GPUs by Christian Buchner *** This is version 2013-12-18 (beta) based on pooler-cpuminer 2.3.2 (c) 2010 Jeff Garzik, 2012 pooler Cuda additions Copyright 2013 Christian Buchner My donation address: LKS1WDKGED647msBQfLBHV3Ls8sveGncnm
[2014-01-18 22:09:13] 1 miner threads started, using 'scrypt-jane' algorithm. [2014-01-18 22:09:13] DEBUG: got new work in 1 ms [2014-01-18 22:09:13] Given scrypt-jane parameters: 13 [2014-01-18 22:09:13] Nfactor is 13 (N=16384)! [2014-01-18 22:09:20] GPU #0: GeForce GTX 780 with compute capability 3.5 [2014-01-18 22:09:20] GPU #0: interactive: 0, tex-cache: 0 , single-alloc: 1 [2014-01-18 22:09:20] GPU #0: 8 hashes / 5.3 MB per warp. [2014-01-18 22:09:22] GPU #0: Performing auto-tuning (Patience...) [2014-01-18 22:09:22] GPU #0: cudaError 2 (out of memory) calling 'cudaMalloc((void **) &d_idata, mem_size)' (salsa_kernel.cu line 499)
[2014-01-18 22:09:22] GPU #0: cudaError 2 (out of memory) calling 'cudaMalloc((void **) &d_odata, mem_size)' (salsa_kernel.cu line 501)
[2014-01-18 22:09:22] GPU #0: cudaError 11 (invalid argument) calling 'cudaMemcpy(d_idata, h_idata, mem_size, cudaMemcpyHostToDevice)' (salsa_kernel.cu line 506)
[2014-01-18 22:09:22] GPU #0: maximum warps: 527 [2014-01-18 22:09:22] GPU #0: cudaError 4 (unspecified launch failure) calling 'cudaDeviceSynchronize()' (salsa_kernel.cu line 534) It proceeds to repeat the last line for each warp in every block. More than a little spammy, though I guess I'm asking with it with the debug flag on, but I'm not really interested in waiting several hours for autotune to complete to find out what my optimal config is, particularly when I'm testing multiple lookup gaps with multiple N factors. It seems to autotune fine with -L3 if I specify -lT. It will not autotune with -L4 in any case, and will fail in the same fashion as above, although if I just give it a config, -L4 seems to work fine. I'm not too worried about it, looking at the -L2 and -L3 charts makes it pretty clear I was experiencing diminishing returns.
|
|
|
|
bathrobehero
Legendary
Offline
Activity: 2002
Merit: 1051
ICO? Not even once.
|
|
January 19, 2014, 06:51:33 AM |
|
I've been mining an N Factor 13 coin as of late, and my 780 went from 10.7 to 16.0 kH/s with -L 3.
I'm doing the same (zcc) and lookup gap had no effect for me on my 660 (2GB), it stayed at 10.0 kH/s. Here's my oversimplified take on lookup gap: An increased lookup gap is virtually giving the GPU more VRAM to play with, but that only helps if the GPU was bottlenecked by the amount of VRAM in the first place so it doesn't help a thing if the GPU was already sweating to get the job done. So an increased lookup gap with my mediocre GPU and mediocre VRAM amount, at: N 13 had no effect (GPU wasn't bottlenecked by VRAM); N 14 had a 30% performance increase; N 15 had a 100% performance increase because the memory bottleneck is the worst here.
|
Not your keys, not your coins!
|
|
|
cbuchner1 (OP)
|
|
January 19, 2014, 08:24:22 AM |
|
cbuchner1: do you plan to implement lookup gap for scrypt???
it's implemented but Salsa20/8 (N=1024) is mostly compute bound on nVidia and there is no benefit seen from this feature.
|
|
|
|
cbuchner1 (OP)
|
|
January 19, 2014, 08:41:03 AM Last edit: January 19, 2014, 09:51:52 AM by cbuchner1 |
|
I've been playing with the lookup gap. My GTX 780 went from 3.7 kH/s to 5.0 kH/s on Yacoin with -L 5. (-L 4 produces 4.948, -L 3 4.535, and -L 2 was almost no improvement)
[2014-01-18 22:09:22] GPU #0: Performing auto-tuning (Patience...) [2014-01-18 22:09:22] GPU #0: cudaError 2 (out of memory) calling 'cudaMalloc((void **) &d_idata, mem_size)' (salsa_kernel.cu line 499)
can't wait to try -L on my three 780Ti cards at home - hoping for 5-6 kHash/s per device. Right now I am at a meeting of computer geeks demo'ing one of my mining rigs... I will have to improve the memory management a lot, both on Windows and on Linux. This out of memory problem is annoying.
|
|
|
|
Ultimist
|
|
January 19, 2014, 08:46:32 AM |
|
It's really unfortunate that this seems to have become an exclusive club for those who can compile the source code for all these new features. The rest of us are left out in the cold, having to wait months I guess while our cards become increasingly useless over time. I was really hoping to take advantage of the scrypt-jane for yac/qqcoin but by the time binaries are released with the latest features, the amount of yac/qqcoin I'll be able to earn daily with my GTX 670 won't be worth it anymore.
I'd like to throw in a request to perhaps update the main binaries a little more often than has been the case for the last month. New versions could always be labeled as incomplete/use at your own risk, etc...
|
|
|
|
|