Bearclaw
Newbie
Offline
Activity: 52
Merit: 0
|
|
April 09, 2014, 01:54:01 PM |
|
I should try djm34 version because I don't get that sort of performance on scrypt with my gtx780ti For scrypt-jane try to autotune first djm... it was a EVGA 780Ti Superclocked with Skynet bios. OC was +135 (total 1180.6). The autotune for YAC works on my 780, so I will try the 750's when I get home tonight, as I have to leave for work in a few.
|
|
|
|
ltcnim
Legendary
Offline
Activity: 914
Merit: 1001
|
|
April 09, 2014, 02:01:14 PM Last edit: April 09, 2014, 02:11:21 PM by ltcnim |
|
Hmm, Linux compilation of cudaminer is borked.
cpu-miner.c won't build, most likely because Alexey used C++'isms... which works on Windows because I complile this module with the /TP flag in order to trick it into allowing inline delarations of variables (and other things) requiring C99 support which Visual Studio 2010 is lacking)
EDIT: it's fixed!
Christian
just compiled it under ubuntu 13.10 server, but something seems to be very wrong. It looks like only one card is used out of my 5x750Ti rig. Even with -d 0,1,2,3,4 only one card seem to be working (i guess that by the cards temperatures) using scrypt-n: Edit: switched back to tagged release 2014-2-28 and everything is back to normal.
|
|
|
|
cbuchner1 (OP)
|
|
April 09, 2014, 02:11:27 PM |
|
just compiled it under ubuntu 13.10 server, but something seems to be very wrong. It looks like only one card is used out of my 5x750Ti rig. Even with -d 0,1,2,3,4 only one card seem to be working (i guess that by the cards temperatures):
well he made substantial changes in unfamiliar code with only a few days to spare, so it's expected that he broke a couple of things Don't worry, I'll clean up. Christian
|
|
|
|
ltcnim
Legendary
Offline
Activity: 914
Merit: 1001
|
|
April 09, 2014, 02:13:34 PM |
|
just throw a line here, when we can test it again, I'll happily help testing as much as I can.
|
|
|
|
bigjme
|
|
April 09, 2014, 03:02:11 PM |
|
how are you getting the reported hashrate? one thing i have wanted is to be able to read the last reported hashrate for each cudaminer instance
|
Owner of: cudamining.co.uk
|
|
|
ltcnim
Legendary
Offline
Activity: 914
Merit: 1001
|
|
April 09, 2014, 03:10:42 PM |
|
parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.
|
|
|
|
bigjme
|
|
April 09, 2014, 03:55:07 PM |
|
parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.
straight over my head :p im stuck in windows so i am pretty limited. hmm i wonder if i can go through the cudaminer code and see where it reports hashrate and edit the outputs
|
Owner of: cudamining.co.uk
|
|
|
ltcnim
Legendary
Offline
Activity: 914
Merit: 1001
|
|
April 09, 2014, 04:15:05 PM |
|
parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.
straight over my head :p im stuck in windows so i am pretty limited. hmm i wonder if i can go through the cudaminer code and see where it reports hashrate and edit the outputs under windows, you could write a little app which starts the cudaminer instances and then reads from stdout. should be easy.
|
|
|
|
ManIkWeet
|
|
April 09, 2014, 04:56:24 PM |
|
parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.
straight over my head :p im stuck in windows so i am pretty limited. hmm i wonder if i can go through the cudaminer code and see where it reports hashrate and edit the outputs under windows, you could write a little app which starts the cudaminer instances and then reads from stdout. should be easy. Can even be done in Java
|
BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)
|
|
|
NuggetFlipper
Newbie
Offline
Activity: 4
Merit: 0
|
|
April 09, 2014, 05:24:34 PM |
|
I have a gt650m, using cudaminer from feb 9. Whenever set clocks below 800 mhz, it mines at 65 C no problem at 65k/h when I increase the clocks even a little bit, the temps slowly rise to 90-100 C (wtf?) and its mining rate doesn't even change significantly Is anyone else having this issue?
I would also appreciate it if cudaminer had backup pool support...
|
|
|
|
bigjme
|
|
April 09, 2014, 05:26:52 PM Last edit: April 09, 2014, 06:42:52 PM by bigjme |
|
parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.
straight over my head :p im stuck in windows so i am pretty limited. hmm i wonder if i can go through the cudaminer code and see where it reports hashrate and edit the outputs under windows, you could write a little app which starts the cudaminer instances and then reads from stdout. should be easy. i will have to have a look. writing websites i can do. writing and manipulating batch files, im not so good at ok then so i have been able to write a code that will read the last reported hashrate from a file, the only problem is that after a while the file will get large, so i would need the file to be overwritten with every output line cudaminer gives, that part i can not figure out soooo much easier if someone was able to figure out how to add in a curl to cudaminer :-(
|
Owner of: cudamining.co.uk
|
|
|
scriptfu
Newbie
Offline
Activity: 19
Merit: 0
|
|
April 09, 2014, 07:00:33 PM Last edit: April 09, 2014, 08:36:34 PM by scriptfu |
|
I did some more HVC benchmarking of ccminer, varying the launch parameters of the hefty_gpu_hash kernel. I chose this kernel to tweak as the majority of the runtime is spent on it according to nvprof (due to stream synchronization after hefty and sha256 kernels are launched). I based block size on a multiple of SMs per card (e.g. 110 * 5 SMs on 750ti == 550). Each launch config was tested 5 times over 5 minute intervals (25 minute total sample) at the hvc.1gh.com pool, and results were averaged. Note that I did see CPU validation failures, however both the average hashrate and accepted shares outweighed them, confirmed by the 1gh dashboard. My best configuration was 550 blocks x 768 threads per block (average khash/s rate is per 750ti; share metrics are for all six cards): ‡ is default launch config. +---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+ | || blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % | +=========++========+=========+===================+==================+=================+=================+==================+ | best || 550 | 768 | 16781 | 32 | 28 | 4 | 87 | +---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+ | default || ‡ 683 | 768 | 13987 | 17 | 16 | 1 | 94 | +---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+ | diff || -133 | - | +2794 | +15 | +12 | +3 | -7 | +---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
Other than the launch parameter change, the miner code under test has no local modifications. I have, however, made a few changes to how the code is compiled: - Using CUDA 6 RC
- Compiled with relocatable device code support, (--relocatable-device-code=true --compile, requires manual linking for both host and device objects)
- Removed maxrregcount to let compiler choose register count
The full data for all block configs can be found here: https://docs.google.com/spreadsheets/d/1C6fSk0pkDXBFIzXselXDE8IJP26dj6grWAJxnRrHO3Y/edit?usp=sharingTests run on a system with the following specs: https://gist.github.com/danryan/7c8762fda4d9783a58aeedits: - added default block size baseline for comparison
- clarified block size calculation
- added ± diff comparison
|
|
|
|
bigjme
|
|
April 09, 2014, 07:04:59 PM |
|
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+ | blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % | +========+=========+===================+==================+=================+=================+==================+ | 550 | 768 | 16781 | 32 | 28 | 4 | 87 | +--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
almost 17MH/s for 1 750Ti?
|
Owner of: cudamining.co.uk
|
|
|
scriptfu
Newbie
Offline
Activity: 19
Merit: 0
|
|
April 09, 2014, 07:10:07 PM |
|
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+ | blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % | +========+=========+===================+==================+=================+=================+==================+ | 550 | 768 | 16781 | 32 | 28 | 4 | 87 | +--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
almost 17MH/s for 1 750Ti? Correct. I should have been more clear about that. Fixing the original post. Thanks for pointing that out!
|
|
|
|
bigjme
|
|
April 09, 2014, 07:11:18 PM Last edit: April 09, 2014, 07:24:37 PM by bigjme |
|
How on earth did you manage that? We havent been able to get over 13Mh/s Christian i found an example on how to implement a rpc into a c++ program, i will try to see if i can get it working but i don't have a clue what i am doing
|
Owner of: cudamining.co.uk
|
|
|
ManIkWeet
|
|
April 09, 2014, 07:35:00 PM |
|
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+ | blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % | +========+=========+===================+==================+=================+=================+==================+ | 550 | 768 | 16781 | 32 | 28 | 4 | 87 | +--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
almost 17MH/s for 1 750Ti? Correct. I should have been more clear about that. Fixing the original post. Thanks for pointing that out! Is this with or without the failed hashes included?
|
BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
April 09, 2014, 07:43:52 PM |
|
I did some more HVC benchmarking of ccminer, varying the launch parameters of the hefty_gpu_hash kernel. I chose this kernel to tweak as the majority of the runtime is spent on it according to nvprof (due to stream synchronization after hefty and sha256 kernels are launched). Each launch config was tested 5 times over 5 minute intervals (25 minute total sample) at the hvc.1gh.com pool, and results were averaged. Note that I did see CPU validation failures, however both the average hashrate and accepted shares outweighed them, confirmed by the 1gh dashboard. My best configuration was 550 blocks x 768 threads per block (average khash/s rate is per 750ti; share metrics are for all six cards): +--------+---------+-------------------+------------------+-----------------+-----------------+------------------+ | blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % | +========+=========+===================+==================+=================+=================+==================+ | 550 | 768 | 16781 | 32 | 28 | 4 | 87 | +--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
Other than the launch parameter change, the miner code under test has no local modifications. I have, however, made a few changes to how the code is compiled: - Using CUDA 6 RC
- Compiled with relocatable device code support, (--relocatable-device-code=true --compile, requires manual linking for both host and device objects)
- Removed maxrregcount to let compiler choose register count
The full data for all block configs can be found here: https://docs.google.com/spreadsheets/d/1C6fSk0pkDXBFIzXselXDE8IJP26dj6grWAJxnRrHO3Y/edit?usp=sharingTests run on a system with the following specs: https://gist.github.com/danryan/7c8762fda4d9783a58aeCan you add to your table what are the change between each lines. Did you try on windows ? Must say I am a bit surprise bu the 23MHash/s. You should run a little longer to make sure everything is stable.
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
scriptfu
Newbie
Offline
Activity: 19
Merit: 0
|
|
April 09, 2014, 07:54:44 PM |
|
How on earth did you manage that? We havent been able to get over 13Mh/s
Just by benchmarking various launch configs until I found one that worked well, in addition to the other changes I listed in my original post. I modified the hefty_cpu_hash function in cuda_hefty1.cu. Changes made are expressed in this diff: https://gist.github.com/danryan/6a631e0ece773e5f6788Correct. I should have been more clear about that. Fixing the original post. Thanks for pointing that out!
Is this with or without the failed hashes included? Could you clarify what you mean by failed hashes? If you're referring to ones that didn't pass CPU validation, yes they are included in the hashrate average, but they are not included in the share metrics (I care more about these, as these are the canonical numbers by which one gets credited for work).
|
|
|
|
scriptfu
Newbie
Offline
Activity: 19
Merit: 0
|
|
April 09, 2014, 08:03:11 PM Last edit: April 09, 2014, 08:23:36 PM by scriptfu |
|
Can you add to your table what are the change between each lines.
Sure! I will update have updated my post accordingly. Did you try on windows ?
I haven't because I do not have a Windows rig, and likely will not test this because I do not want to reimage or deal with Windows taking over my boot record See my diff in a previous post above for the changes I made. If you are capable of compiling this, I'd be very curious to see the results. Must say I am a bit surprise bu the 23MHash/s. You should run a little longer to make sure everything is stable.
Configurations with the highest hashrates were stable enough to run in the sense that the program would not crash, however they were not stable enough to provide valid shares. For instance, 384 blocks x 768 threads @ 23213 khash/s attempted 27 shares, but only 16 were valid (less than half that of the 550x768 config).
|
|
|
|
zelante
|
|
April 09, 2014, 08:40:23 PM |
|
|
|
|
|
|