bigjme
|
|
April 09, 2014, 08:45:51 PM |
|
wow, i think at over 16Mh/s i would accept 1 validation error everything 12 accepted lmao
|
Owner of: cudamining.co.uk
|
|
|
scriptfu
Newbie
Offline
Activity: 19
Merit: 0
|
|
April 09, 2014, 09:33:45 PM |
|
wow, i think at over 16Mh/s i would accept 1 validation error everything 12 accepted lmao O.o o.O Impressive! What was done to achieve these numbers?
|
|
|
|
bigjme
|
|
April 09, 2014, 09:39:14 PM Last edit: April 09, 2014, 10:36:57 PM by bigjme |
|
christian i just found a webserver to implement into ccminer for giving some json formatted output, i am able to implement it i guess i need to add it into cpu-miner.c as thats the base but reading through the code, i have a few options for the hashrate value, which of these is it i need? 1. 1e-3 * hashrate 2. hashrate EDIT: also since updating my drivers it seems i am unable to compile, i am getting the following message on new and old versions C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 5.5.targets(592,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2010 -ccbin "M:\Program Files\Microsoft Visual Studio 10.0\VC\bin" -I. -Icompat -Icompat\jansson -Icompat\getopt -I"..\pthreads\Pre-built.2\include" -I"..\curl-7.29.0\include" -I"..\OpenSSL-Win32\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" --keep --keep-dir Release -maxrregcount=64 --ptxas-options=-v --machine 32 --compile -cudart static -Xptxas -v,-abi=no -DWIN32 -DNDEBUG -D_CONSOLE -D_CRT_SECURE_NO_WARNINGS -DCURL_STATICLIB -DSCRYPT_KECCAK512 -DSCRYPT_CHACHA -DSCRYPT_CHOOSE_COMPILETIME -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MD " -o Release\fermi_kernel.cu.obj "M:\CUDAMINER\Building\CudaMiner-13-2-14\fermi_kernel.cu"" exited with code 1. can't test it my implemented webserver works or not without compiling you know what, its nice to have a good anti-virus but when i can't compile because its too paranoid UPDATE: so i added all my webserver stuff, its very basic additional code, about 12 lines. but it fails at the end of compile with this 1>cl : Command line warning D9025: overriding '/TC' with '/TP' 1> util.c 1> sha2.c 1> cpu-miner.c 1>m:\cudaminer\building\ccminer-0.5\cpu-miner.c(1389): error C2731: 'main' : function cannot be overloaded 1> m:\cudaminer\building\ccminer-0.5\cpu-miner.c(1388) : see declaration of 'main' 1> Generating Code... ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
any ideas?
|
Owner of: cudamining.co.uk
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
April 09, 2014, 11:20:31 PM |
|
I did a rapid test over the code modification. I get quite a lot of "hash for nonce ... does not validate of cpu" However the shares are accepted and the speed is 35Mh/s (instead of 28~30 depending on clock speed and number of instance) To what corresponds this 768, is this the number of cuda core of the 750ti ? (need to see how this can be updated to the 780ti).
I only modified the cuda_hefty1.cu (I am lazy...), compiled with cuda 5.5 (didn't use either --relocatable-device-code=true) and in principle compute_3.5
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
scriptfu
Newbie
Offline
Activity: 19
Merit: 0
|
|
April 10, 2014, 01:23:15 AM |
|
To what corresponds this 768, is this the number of cuda core of the 750ti ? (need to see how this can be updated to the 780ti).
Launching a CUDA kernel uses the following syntax (ignoring optional parameters for now): kernel_name<<<blocks_per_grid, threads_per_block>>>(kernel_function_args...) 768 is the number of threads launched per block. The 750 Ti has 640 cores (128/SM (multiprocessor), 5 SMs/card). The 780 Ti has 2880 cores (192/SM, 15 SMs/card). I used very a basic calculation, essentially choosing a block count that is some multiple of the number of cores. In the case of the 780 Ti, 100 * SM count, or 100 * 15 == 1500. I haven't looked closely at the 780's specs, so one might run into a limitation on how many blocks per grid the card can support. You should be able to glean additional information from the following references:
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
April 10, 2014, 02:16:03 AM |
|
On the original version the program is using 683blocks and 768 threads per block. With your modification it is using 32x15=480 and 768 thread/block However the number of thread is 524288, which in my opininon in the reason why I get "the does not validate on cpu" and why 683 got chosen, since it is just thread/thread_per_block This gives me around 36MHash/s
I changed 768 by 512 and then I get 39~40MHash/s no rejected, however high rate of "does not valitate". Which means large fraction of the shares are just thrown
I have the feeling it is faster because it throws away a lot of things...
Would be interesting to have Christian opinion on that. In there a way to decrease the number of thread ? (assuming it works) ?
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
scriptfu
Newbie
Offline
Activity: 19
Merit: 0
|
|
April 10, 2014, 03:00:21 AM |
|
On the original version the program is using 683blocks and 768 threads per block. With your modification it is using 32x15=480 and 768 thread/block However the number of thread is 524288, which in my opininon in the reason why I get "the does not validate on cpu" and why 683 got chosen, since it is just thread/thread_per_block This gives me around 36MHash/s
Yes, your numbers are correct, though it is not as simple as dividing the number of total threads by desired threads per block. Not all of the 524288 threads can be executed simultaneously; max resident threads for 3.x-5.x devices is 2048/SM (10240 on 750 Ti for example). However, they can be scheduled, and are processed once resources become available as previous tasks complete. I have the feeling it is faster because it throws away a lot of things...
This is indeed what happens when you get the "does not validate" error. The CPU tries to recreate the hash one last time before submitting it as proof, and it gets dropped if it fails validation. Work in this case is simply trashed. I have not finished instrumenting the code fully to provide exact details. What I do have is verification from pools through higher reported hashrate (calculated from rate of valid shares) and in particular a correlated increase in valid share counts. Would be interesting to have Christian opinion on that. In there a way to decrease the number of thread ? (assuming it works) ?
Agreed, I will 100% defer to Christian on this subject
|
|
|
|
|
cbuchner1 (OP)
|
|
April 10, 2014, 07:02:20 AM Last edit: April 10, 2014, 07:30:31 AM by cbuchner1 |
|
How on earth did you manage that? We havent been able to get over 13Mh/s
Just by benchmarking various launch configs until I found one that worked well, in addition to the other changes I listed in my original post. I modified the hefty_cpu_hash function in cuda_hefty1.cu. Changes made are expressed in this diff: https://gist.github.com/danryan/6a631e0ece773e5f6788this change is potentially dangerous as the total number of threads run on the GPU is not aligned with the "throughput" variable as used by the heavycoin scanhash function (passed in as the variable "threads" into the function you modified). This could lead to overlapping shares being found (same nonce leading to rejects), part of the nonce space to be skipped (not actually a problem), or buffers to be overrun (potentially serious). You need to add some code to compute the throughput variable (=total number of GPU threads) based on device properties, e.g. in an early function call to the cuda_hefty1.cu module. Christian
|
|
|
|
cbuchner1 (OP)
|
|
April 10, 2014, 07:21:59 AM |
|
any ideas?
do you have two main functions in the code now? does your extra main function use a different argument list? int main(int argc, char **argv)
|
|
|
|
bigjme
|
|
April 10, 2014, 08:34:45 AM |
|
any ideas?
do you have two main functions in the code now? does your extra main function use a different argument list? int main(int argc, char **argv)
I checked all the code and there is no main function. The only word main in the entire files is for domain. I will have to check through again and see whats going on
|
Owner of: cudamining.co.uk
|
|
|
ManIkWeet
|
|
April 10, 2014, 09:01:20 AM |
|
Suddenly I feel no longer sad that I sold that bitcoin at the price I did, I am sad that I didn't sell my other bitcoin at the higher price Edit: Just wondering: How can you people even get a profit out of cards like GTX 780?
|
BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)
|
|
|
manydog
Newbie
Offline
Activity: 4
Merit: 0
|
|
April 10, 2014, 09:32:09 AM |
|
I'm having trouble with mining dogecoin. Getting 9 h/s. Should I do anything to the script instead of autotune?
|
|
|
|
ManIkWeet
|
|
April 10, 2014, 09:37:13 AM |
|
I'm having trouble with mining dogecoin. Getting 9 h/s. Should I do anything to the script instead of autotune?
9 hashes per second, sounds like my calculator can do more
|
BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
April 10, 2014, 09:42:20 AM |
|
I'm having trouble with mining dogecoin. Getting 9 h/s. Should I do anything to the script instead of autotune?
9 hashes per second, sounds like my calculator can do more he didn't say what he was using...
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
manydog
Newbie
Offline
Activity: 4
Merit: 0
|
|
April 10, 2014, 09:55:48 AM |
|
I'm having trouble with mining dogecoin. Getting 9 h/s. Should I do anything to the script instead of autotune?
I downloaded the 337.50 beta update from Nvidia with no improvement! Here's my specs: Windows 8 64-bit Intel Pentium G2020 @ 2.90GHz 35 °C Ivy Bridge 22nm Technology 4.00GB Single-Channel DDR3 @ 665MHz (9-9-9-24) Acer Aspire XC600 (SOCKET 0) 28 °C 1023MB NVIDIA GeForce 605 (Elitegroup) 58 °C cudaminer.exe -H 1 -i 0 -l auto -C 1 -o stratum+tcp://stratum.teamdoge.com:3333 -O
|
|
|
|
ghur
|
|
April 10, 2014, 10:01:41 AM |
|
NVIDIA GeForce 605 (Elitegroup)
Ouch. I think you may just have to accept you're not going to mine with that.
|
doge: D8q8dR6tEAcaJ7U65jP6AAkiiL2CFJaHah Automated faucet, pays daily: Qoinpro
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
April 10, 2014, 10:06:41 AM |
|
NVIDIA GeForce 605 (Elitegroup)
Ouch. I think you may just have to accept you're not going to mine with that. If it is elitegroup, you should try to mine ~BCX~ the coin for the elite
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
manydog
Newbie
Offline
Activity: 4
Merit: 0
|
|
April 10, 2014, 10:19:11 AM |
|
NVIDIA GeForce 605 (Elitegroup)
Ouch. I think you may just have to accept you're not going to mine with that. If it is elitegroup, you should try to mine ~BCX~ the coin for the elite Have mercy: DABceSeUHRfeRnByhP3yGHnHk8Fx136ehR
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
April 10, 2014, 10:31:17 AM |
|
NVIDIA GeForce 605 (Elitegroup)
Ouch. I think you may just have to accept you're not going to mine with that. If it is elitegroup, you should try to mine ~BCX~ the coin for the elite Have mercy: DABceSeUHRfeRnByhP3yGHnHk8Fx136ehR oula don't fork the coin
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
|