cbuchner1 (OP)
|
|
April 16, 2013, 08:54:26 AM |
|
Thank you for your work! I think you did a great job! What I miss is a variable to control the system/GPU load. The --interactive flag does not really work for me, I even experienced greater desktop lags with "interactive 1"... For interactive you need to let autotune choose a smaller workload. Manually specifying the same -l parameter as for non-interactive mode won't be a good idea. Interactive mode will be trying that you have around 60 individual CUDA kernel launches per second, and a millisecond of CPU+GPU sleep time inbetween. -> 60 frame updates on the display should be possible, so you can watch movies or porn or whatever while mining Christian
|
|
|
|
cbuchner1 (OP)
|
|
April 16, 2013, 09:13:28 AM Last edit: April 16, 2013, 10:37:08 AM by cbuchner1 |
|
Christian, could you just post the source to git and host the binaries there?
My only prior experience is with sourceforge, but I will see how I can get started on github. UPDATE: I think they've removed the feature to serve binary distributions as separate downloads.
|
|
|
|
cbuchner1 (OP)
|
|
April 16, 2013, 09:15:21 AM |
|
When compiling 04-14 in Linux (Ubuntu 12.04), I'm getting the following message not seen in 04-09:
it's a known problem - try targeting a 32 bit executable, as shown in configure.sh g++-multilib, ia32-libs and libcurl4-dev:i386 should be installed prior to that.
|
|
|
|
SubNoize
Newbie
Offline
Activity: 47
Merit: 0
|
|
April 16, 2013, 12:50:18 PM |
|
Out of curiosity how much further do you think you can push nvidia cards? Do you see any improvements coming any time soon or if we see another large improvement it will be due to an unusual find?
|
|
|
|
portosTCM
Newbie
Offline
Activity: 19
Merit: 0
|
|
April 16, 2013, 01:34:53 PM |
|
Will you improve sha256 version? I see that your miner can achieve good khashes ratio so i can't wait for fully working gpu version
|
|
|
|
cbuchner1 (OP)
|
|
April 16, 2013, 01:54:07 PM |
|
Out of curiosity how much further do you think you can push nvidia cards? Do you see any improvements coming any time soon or if we see another large improvement it will be due to an unusual find?
My crystal ball is currently malfunctioning. I advise that you consult a fortune teller of your choosing
|
|
|
|
cbuchner1 (OP)
|
|
April 16, 2013, 01:54:56 PM |
|
Will you improve sha256 version? I see that your miner can achieve good khashes ratio so i can't wait for fully working gpu version No motivation to do so, as Bitcoin mining is so unprofitable.
|
|
|
|
Schleicher
|
|
April 16, 2013, 02:08:31 PM |
|
My only prior experience is with sourceforge, but I will see how I can get started on github. UPDATE: I think they've removed the feature to serve binary distributions as separate downloads.
Sourceforge is ok I think. You could use that for the binaries.
|
|
|
|
datguyian
|
|
April 16, 2013, 03:02:18 PM |
|
Thanks for this! I've updated the sheet with the Quadro cards I've been messing with (600, 4000 and 4600). I haven't really had enough time to mess with the settings much, so I pretty much let auto tune do its thing then used whatever kernel it decided on after that (and notated in the spreadsheet).
|
|
|
|
cbuchner1 (OP)
|
|
April 16, 2013, 03:39:14 PM Last edit: April 16, 2013, 04:30:35 PM by cbuchner1 |
|
A few days ago I ordered a used 560Ti 448 core edition (~130 Euros) because of the stellar performance figures.
I believe the high memory bandwidth of 500 series cards is mainly responsible for their performance. And it seems the core count vs. memory throughput is rather balanced for this type of application.
The Kepler series (6xx) seems to have too many CUDA cores and a memory interface that isn't any better than the 500 series. In other words: too much compute power in relation to bandwidth.
About future optimization possibilities:
I do believe that adding a LOOKUP_GAP implementation for factor 2 and 3 may boost the performance slightly - and more significantly for Kepler cards and the GTX Titan (250 kHash for a non-overclocked Titan seems really low).
I think that using some inline PTX assembly for the xor_salsa implementation we can get another slight boost, and maybe also a reduction in kernel register count.
I have doubts about the potential and/or feasibility of the texture cache. The texture cache would work better for a very small scratchpad for sure - a small lookup table size increases the cache hit-to-miss ratio, but maybe it requires such a high LOOKUP_GAP value that any memory performance benefit is offset by the required extra computation.
Christian
|
|
|
|
Nomusss
Newbie
Offline
Activity: 19
Merit: 2
|
|
April 16, 2013, 06:50:04 PM |
|
Got 185 kh\s on GTX680 1214\6038
Thanks for the software!
|
|
|
|
portosTCM
Newbie
Offline
Activity: 19
Merit: 0
|
|
April 16, 2013, 06:52:50 PM |
|
cudaMiner shows cpu usage near 100%, how can i fix it?
|
|
|
|
cbuchner1 (OP)
|
|
April 16, 2013, 07:18:57 PM Last edit: April 16, 2013, 07:31:26 PM by cbuchner1 |
|
cudaMiner's inconsistent CPU usage is a topic that I will be working on. You can currently only play with the -i flag to see if it makes a difference. I think I found out what is wrong with the texture cache. I was not computing the texel coordinates correctly - in particular I failed to add a block+warp specific texel offset. Results now do validate, but I see a performance degradation instead of a gain. I will have to determine whether it is better to use a 2-dimensional texturing or a single 1 dimensional linear texture. I may even allow to pass in the dimensionality via the -C flag directly Christian
|
|
|
|
FalconFour
|
|
April 16, 2013, 08:31:00 PM |
|
Well, I definitely appreciate that someone's put some work into an nVidia miner! Maybe I'm alone here, but I kinda think most of us *aren't* going to go out and buy all-new cards just to mine Litecoin. Maybe. Maybe not. I dunno. But the most valuable use I have for it now is going through a junk-pile at the shop and pulling out all the 8000-series and higher cards and building mining systems for them (while the shop owner and I work together mining Bitcoin, of course... hehe). That said, the best card that's been in the pile so far is a 9800GT (which was kinda impressive - thought it was an 8800). So I've got a 9800GT and a 8800GTX working right now with this cudaMiner. Here's the problem I ran into. Both are experiencing all-over-the-map performance variations. The 8800GTX was previously cranking out 34-36khps (with accepted results), then when I moved to a 64-bit Windows 7 SP1 install (previously 32-bit Vista SP0 from the initial OEM install), it shot up to ~44khps. However, after updating drivers and allowing me to crank the fan speed higher, it fell through the floor and lingers around 16khps. And that 9800GT? It was cranking out 16khps, pretty pathetic, under a 32-bit Win7 SP0 install. When I moved that up to Win7 SP1 x64, it again shot up to ~24khps, but that also wasn't stable - next time I restarted the miner, it's only doing... EIGHT... YES... EIGHT! 8khps. I've been playing with the different driver versions, and it seems that cudaMiner won't run (just silently crashes/exits without any output other than the initial banner - not even an error log entry) with any drivers below version 300. Can't make any sense of it... :/
|
feed the bird: 187CXEVzakbzcANsyhpAAoF2k6KJsc55P1 (BTC) / LiRzzXnwamFCHoNnWqEkZk9HknRmjNT7nU (LTC)
|
|
|
cbuchner1 (OP)
|
|
April 16, 2013, 09:08:45 PM Last edit: April 16, 2013, 10:24:32 PM by cbuchner1 |
|
CUDA drivers rel 304.54 (Linux) and 306.94 (Windows) or later are required for CUDA 5.0 apps like cudaMiner. Because I am not doing error checks yet, you will see the program crashing if these requirements are not met.
About 8800/9800 cards performance: I am seeing the same performance varitions on an nVidia 9600M GT on Windows. sometimes 4 kHash, sometimes 6 kHash.
Linux: I get solid 9.6 kHash.
I am starting to believe there's something wrong with windows drivers for very old card models.
Could be that the device is not clocking up for CUDA workloads? Have you tried running any kind of DirectX or OpenGL app simultaneously, to see if that makes it get up to speed?
UPDATE: the texture cache feature seems to work in 1D and 2D modes now, but does not really make things faster yet. I do get accepted and verified shares though (happy!)
UPDATE2: I may have solved the excessive CPU utilization problem on Windows, too.
Christian
|
|
|
|
FalconFour
|
|
April 16, 2013, 11:30:12 PM |
|
Well, these old cards don't have dynamic clocks for 2D/3D modes - which is why they get so damn hot while just sitting idle. I do however think that if Linux is giving so much of a performance boost, it'd be worth just dumping Ubuntu on these things to mine with them. They're "shell" computers anyway - optimally just going to sit up on a shelf connected to power and network, just being remote-controlled for mining. TONS of motherboards, hard drives, CPUs, memory sticks, and GPUs laying around that I'd love to put to work while the shop doesn't have to pay for power You got a recommendation for a Linux distro that'll do the job best?
|
feed the bird: 187CXEVzakbzcANsyhpAAoF2k6KJsc55P1 (BTC) / LiRzzXnwamFCHoNnWqEkZk9HknRmjNT7nU (LTC)
|
|
|
FalconFour
|
|
April 16, 2013, 11:34:27 PM |
|
Whoa, never mind Linux. This just happened when I disabled Aero/desktop composition: From 8khash to 28khash. Interesting. Now, wonder what the 8800GTX will do... I do think the resolution of the auto-tune is a bit sketchy though. It seems to fly through the khash/sec timings far faster than it can get an accurate reading, which results in many test results just being all over the place (20... 18... 20... 22... 18...). There's stuff going on in the background (like drawing on the screen) that I'm sure causes some bumps in the readings that it doesn't test twice. Maybe increase the test duration for each step, and lock into multiples of two? I couldn't imagine "13x3" would serve any better purpose than a rounded number like 14x2 or such... (or am I wrong there?)
|
feed the bird: 187CXEVzakbzcANsyhpAAoF2k6KJsc55P1 (BTC) / LiRzzXnwamFCHoNnWqEkZk9HknRmjNT7nU (LTC)
|
|
|
cbuchner1 (OP)
|
|
April 16, 2013, 11:42:01 PM |
|
From 8khash to 28khash. Interesting. Now, wonder what the 8800GTX will do...
That's more like what I would have expected from these cards. Strangely, when I enable texture caching the determined performance during autotune is about 10-25% higher than without cache. But the achieved performance during the mining is way about 30% less than without cache. So why does the performance advantage turn into a disadvantage? This discrepancy needs to be understood before I can put out another version. I've even tried to completely randomize the input data during autotune - but no change. I really want to get that measured gain into the actual mining. I hope it's not just an illusion.
|
|
|
|
cbuchner1 (OP)
|
|
April 16, 2013, 11:43:58 PM |
|
Any news on a secondary download source? Dropbox, github,sourceforge?
You can get the source code from github now, but not the binaries. For Linux compilation this should suffice.
|
|
|
|
FalconFour
|
|
April 17, 2013, 12:15:01 AM |
|
From 8khash to 28khash. Interesting. Now, wonder what the 8800GTX will do...
That's more like what I would have expected from these cards. Strangely, when I enable texture caching the determined performance during autotune is about 10-25% higher than without cache. But the achieved performance during the mining is way about 30% less than without cache. So why does the performance advantage turn into a disadvantage? This discrepancy needs to be understood before I can put out another version. I've even tried to completely randomize the input data during autotune - but no change. I really want to get that measured gain into the actual mining. I hope it's not just an illusion. This could be along the same issue with the short auto-tune duration problem. Maybe the texture cache benefits for a very short time but starts deteriorating slowly (on the order of whole seconds, not milliseconds). Maybe try a narrow set of autotune parameters (it's unlikely that a card would ever see any autotune benefit in the sequential range from 20...100 iterations) and run longer tuning per each combination? Basically, not every cell in the autotune matrix needs to be checked, I think. Also getting super-erratic behavior right now after updating drivers on the 8800GTX. It launches and identifies "compute capability 1.0", but it cranks out all zeroes on the autotune then crashes (hard). That's with the latest driver I just installed, 314.22. :/
|
feed the bird: 187CXEVzakbzcANsyhpAAoF2k6KJsc55P1 (BTC) / LiRzzXnwamFCHoNnWqEkZk9HknRmjNT7nU (LTC)
|
|
|
|