-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
October 15, 2012, 12:53:13 AM |
|
Yes that is interesting. I'm guessing you have underclocked your memory exceptionally low, as that was found to be an issue with use of atomic ops. Some people found a bump of 15 in memory was enough to correct it. Lack of atomic functions there could lead to HW errors and loss of shares. It's a tradeoff either way. The change was put in there to make sure no shares were lost, which can happen with the old opencl code (though it's only a very small number that would be lost).
Ah, ok! Thanks for the info. Yep, I'm at 150MHz mem clock. It's to prevent the case of simultaneous nonce finds on different vectors to overwrite the result on the same address, right? I prefer the tradeoff tbh, I did the math a while ago on the probability of that happening (P=1/(2^32)*1/(2^32)=1/(2^64). On a 1GH/s card, that will happen on average once every ~585 years) I'm still using that optimization tradeoff I posted for more than a year now! #elif defined VECTORS2 uint result = W[117].x ? 0u:W[3].x; result = W[117].y ? result:W[3].y; if (result) SETFOUND(result);
No, you're not quite right there btw. There are a few issues that made me use the atomic ops instead. There is no way to return a nonce value of 0. Bitmasked nonce values can also be zero meaning they get lost. It is not just vectors that find nonces at the same time, it's a whole wave front of threads finding nonces at the same time and corrupting both values. Bitmasked nonce values from results found in the same global worksize can come out the same value and overwrite each other. It's to consolidate the return values from different kernels and decrease the CPU usage of the return code that checks the nonce values. Again, very small but far from 2^64. Since bitcoin mining is a game of odds, I didn't see the point of losing that - provided you don't drop the hashrate of course. It's unusual that some devices need higher memory speed just for one atomic op but clearly it's a massively memory intensive operation that affects the whole wave front. Considering increasing ram speed by 15 or 20 would not even register in terms of extra power usage and temperature generated, to me at least it seems a better option. But the beauty of free software is you can do whatever you like to the code if you don't like the way I do it
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
kano
Legendary
Offline
Activity: 4620
Merit: 1851
Linux since 1997 RedHat 4
|
|
October 15, 2012, 03:46:37 AM Last edit: October 15, 2012, 03:57:27 AM by kano |
|
After messing with the MMQ initialisation code for a while I've rewritten it and solved the hang I was getting. It's not in any git yet. Is anyone using (working) cgminer on an MMQ on linux at the moment? If so - what do all of uname -r cat /etc/*release ls -las /dev/tty[AU]* return? As I've mentioned before, the current git code just hangs during initialisation for me on Xubu 11.04 and Fedora 17 and it's due to termios not handling the ACM device as a tty on both my linux versions I'm going to redo the old Frequency management code (cgminer doesn't currently contain the newer Frequency code in BarbieMiner) then put up a pull with just those 2 changes so people can try it out and let me know if there are any problems with them The initialisation change is only for linux Redoing the threading will be done next after that
|
|
|
|
Lem
Newbie
Offline
Activity: 78
Merit: 0
|
|
October 15, 2012, 07:10:04 AM |
|
I'm using 2.8.3 with Linux. I mine on many pools, and I do hop on some of them, having the others as backups, so pool switches are common here. My only stratum pool is Slush. I have it in my configuration file as: "url" : "http://api.bitcoin.cz:8332" so I leave it to cgminer to recognize ad activate the stratum protocol, which it does: Switching pool 7 http://api.bitcoin.cz:8332 to stratum+tcp://api-stratum.bitcoin.cz:3333 But after a few hours that cgminer is running, I always find this stratum pool as "Enabled Dead". While mining on another pool, in the log I see things like: [2012-10-15 01:31:50] Pool 7 http://api.bitcoin.cz:8332 not responding! [2012-10-15 01:31:51] Pool 7 http://api.bitcoin.cz:8332 alive but, checking immediately after that, in the Pools section the pool is showed as "Enabled Dead", and it will remain dead forever. As soon as I restart cgminer, the pool is alive again (so it's not a pool problem). I don't know if it happens every time the pool doesn't respond for a while, or whenever my box changes its IP address. I'm just sure it happens at least every few hours. HTH.
|
|
|
|
Krak
|
|
October 15, 2012, 07:24:47 AM |
|
I'm using 2.8.3 with Linux. I mine on many pools, and I do hop on some of them, having the others as backups, so pool switches are common here. My only stratum pool is Slush. I have it in my configuration file as: "url" : "http://api.bitcoin.cz:8332" so I leave it to cgminer to recognize ad activate the stratum protocol, which it does: Switching pool 7 http://api.bitcoin.cz:8332 to stratum+tcp://api-stratum.bitcoin.cz:3333 But after a few hours that cgminer is running, I always find this stratum pool as "Enabled Dead". While mining on another pool, in the log I see things like: [2012-10-15 01:31:50] Pool 7 http://api.bitcoin.cz:8332 not responding! [2012-10-15 01:31:51] Pool 7 http://api.bitcoin.cz:8332 alive but, checking immediately after that, in the Pools section the pool is showed as "Enabled Dead", and it will remain dead forever. As soon as I restart cgminer, the pool is alive again (so it's not a pool problem). I don't know if it happens every time the pool doesn't respond for a while, or whenever my box changes its IP address. I'm just sure it happens at least every few hours. HTH. I noticed this also when I used BTC Guild as a backup for a while. I just assumed they were sick of me using them for detecting new blocks sooner without actually giving much work back; I had just as many getworks with them as I did with my primary pool.
|
BTC: 1KrakenLFEFg33A4f6xpwgv3UUoxrLPuGn
|
|
|
Vbs
|
|
October 15, 2012, 09:51:45 AM |
|
#elif defined VECTORS2 uint result = W[117].x ? 0u:W[3].x; result = W[117].y ? result:W[3].y; if (result) SETFOUND(result);
No, you're not quite right there btw. There are a few issues that made me use the atomic ops instead. There is no way to return a nonce value of 0. Bitmasked nonce values can also be zero meaning they get lost. It is not just vectors that find nonces at the same time, it's a whole wave front of threads finding nonces at the same time and corrupting both values. Bitmasked nonce values from results found in the same global worksize can come out the same value and overwrite each other. It's to consolidate the return values from different kernels and decrease the CPU usage of the return code that checks the nonce values. Again, very small but far from 2^64. Since bitcoin mining is a game of odds, I didn't see the point of losing that - provided you don't drop the hashrate of course. It's unusual that some devices need higher memory speed just for one atomic op but clearly it's a massively memory intensive operation that affects the whole wave front. Considering increasing ram speed by 15 or 20 would not even register in terms of extra power usage and temperature generated, to me at least it seems a better option. But the beauty of free software is you can do whatever you like to the code if you don't like the way I do it Thanks for the detailed explanation! Some more food for thought: I think the bitmasked stuff was probably the biggest problem (because the less 1's the nonce has, the bigger the probability of it being lost IIRC), that's why on the above code there is no bitmasking for checking for nonces, it uses the SETA op code IIRC (the C "?" operand gets a specific gpu isa op code). A specific nonce value of 0 also happens at a rate of P = 1/(2^64) [P_finding_nonce = 1/(2^32), P_nonce_is_all_zeros = 1/(2^32)], so that's also fine by me. About the global worksize bitmasked problem, if not using bitmasks, the only way for overwrites to happen would be if 2 identical bitwise nonces were found, correct? I will also try the tiny mem o/c to see the hashrate diference when using the atomic_add, I like to try all angles to solve a problem, thanks! Btw, since I haven't yet, 1btc donation sent! I can only imagine the number of hours you spent writing and optimizing cgminer's code!
|
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
October 15, 2012, 10:35:17 AM |
|
About the global worksize bitmasked problem, if not using bitmasks, the only way for overwrites to happen would be if 2 identical bitwise nonces were found, correct? Btw, since I haven't yet, 1btc donation sent! I can only imagine the number of hours you spent writing and optimizing cgminer's code! With bitmasked ones, if the bitmask is only 127, 7 bits need to be in common with anything out of the 4 billion nonces. Again rare but not 2^64 rare. With non bitmasked, 2 or more nonces need to be found in a "wavefront" concurrently and can race on the array variable in FOUND. They can be completely different nonces in that case since they're just trying to access exactly the same variable at the same time without protection. Instead of there being 2 nonces flagged as existing, it could be 0,1,2 or a much larger number, and the slots used for the nonces could be anywhere. In other words, the only "strictly correct" way to do it without there being any chance of error is with the atomic ops. Now how well implemented the atomic ops are in hardware and how much they depend on software is a totally different equation that I can't answer, and AMD is unlikely to tell us. Thanks for the donation
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
amigaman
|
|
October 15, 2012, 10:50:34 AM |
|
Yes but it is only absolutely tiny amounts of time the load drops and may not even visible in the reported GPU load which isn't an accurate measure anyway.
I only show fan percentage when fan RPM is not available for that particular device. The reason? The fan percentage is just the value you have told the device to run... it could happily say 55% and the fan may have stopped spinning for some reason which is inherently dangerous. The fan speed rpm is a monitor, it is not a setting.
Yes, that is right, but it is hard for me to recognize if 2857rpms are ok or not. All my 6 units have slightly different rpms for the given percent. But: the regulation uses temperature to calculate the needed percentage. So as long as temperature is in ok ranges, fan percentage is lower then the set max, and when i see 52% i know all is well, at 55% i know something gets too warm, either i have to check fans or switch my ac on. But fan rpm is needed also, so my request was to display fan percentage additionally.
|
|
|
|
Vbs
|
|
October 15, 2012, 01:36:50 PM |
|
With bitmasked ones, if the bitmask is only 127, 7 bits need to be in common with anything out of the 4 billion nonces. Again rare but not 2^64 rare. With non bitmasked, 2 or more nonces need to be found in a "wavefront" concurrently and can race on the array variable in FOUND. They can be completely different nonces in that case since they're just trying to access exactly the same variable at the same time without protection. Instead of there being 2 nonces flagged as existing, it could be 0,1,2 or a much larger number, and the slots used for the nonces could be anywhere. In other words, the only "strictly correct" way to do it without there being any chance of error is with the atomic ops. Now how well implemented the atomic ops are in hardware and how much they depend on software is a totally different equation that I can't answer, and AMD is unlikely to tell us. Thanks for the donation Still, the odds for concurrency are very very low. A nonce find can be modeled by a Poisson distribution with lambda=1/(2^32). Taking the example of a 1GHS card (sum of all wavefronts speed/second):
lambda = 1/(2^32) Poisson(k) = (lambda^k)*(exp^(-lambda))/(k!)
Probability of finding 2 nounces (at the same time) per second: P = Poisson(2)*1E9 = 2.7105e-11 = ~.0000000027%
|
|
|
|
Lem
Newbie
Offline
Activity: 78
Merit: 0
|
|
October 15, 2012, 02:52:16 PM |
|
Still, the odds for concurrency are very very low. A nonce find can be modeled by a Poisson distribution with lambda=1/(2^32). Taking the example of a 1GHS card (sum of all wavefronts speed/second):
lambda = 1/(2^32) Poisson(k) = (lambda^k)*(exp^(-lambda))/(k!)
Probability of finding 2 nounces (at the same time) per second: P = Poisson(2)*1E9 = 2.7105e-11 = ~.0000000027%
Sorry, I haven't followed your whole discussion, but as soon as I read this message it looked to me a bit weird. I apologize in advance if I have misread or misunderstood something. In Poisson distribution, the period of time taken into account is fixed. Let's call it T. Lambda is the mean of Poisson distribution (and its variance too). So with lambda=1/2^32 you're stating that you expect to find 1/2^32 nonces in T. That is: you expect to wait (2^32) Ts to find a single nonce. R U sure? And what do you mean with: Poisson(2)*1E9 Besides, your calculation looks wrong. With lambda=1/2^32, P(2) is ... almost zero. P(2)*1E9 isn't much more, and we can round it to zero as well.
|
|
|
|
Vbs
|
|
October 15, 2012, 03:48:01 PM |
|
Sorry, I haven't followed your whole discussion, but as soon as I read this message it looked to me a bit weird. I apologize in advance if I have misread or misunderstood something. In Poisson distribution, the period of time taken into account is fixed. Let's call it T. Lambda is the mean of Poisson distribution (and its variance too). So with lambda=1/2^32 you're stating that you expect to find 1/2^32 nonces in T. That is: you expect to wait (2^32) Ts to find a single nonce. R U sure? And what do you mean with: Poisson(2)*1E9 Besides, your calculation looks wrong. With lambda=1/2^32, P(2) is ... almost zero. P(2)*1E9 isn't much more, and we can round it to zero as well. That's correct, you are expecting to find 1 nonce out of 2^32 cases in T=1/(GPU hashes per timeframe). On the example above, T=1/(1E9). Each processed 32-bit hash has a probability of 1/(2^32) of being a nonce. So a card that processes 1.000.000 hashes/s has a probability of finding one each (2^32)/1E9 = 4.3 seconds. My above Poisson math is for the case of 2 simultaneous nonce finds (that's why lambda=1/(2^32) and not lambda=1/(2^32)*1E9).
|
|
|
|
Lem
Newbie
Offline
Activity: 78
Merit: 0
|
|
October 15, 2012, 05:31:14 PM |
|
My above Poisson math is for the case of 2 simultaneous nonce finds (that's why lambda=1/(2^32) and not lambda=1/(2^32)*1E9).
This is what I don't understand too well (sorry to bother you). Let's assume, as you wrote, that your device has 1GH/s of computing power. So what T do you like to choose? If you choose T=1 sec, then the probabilty to find one nonce in T is the mean, so is lambda. Lambda is 1/4.3=0.23. 23%. The probability to find two nonces in T, according to Poisson distribution, is ((0.2325^2)*e^-0.2325)/2!=0.021. 2,1%. If you choose T=1/1E9 sec, AKA the clock tick duration of your device, then I calculate the probability to simultaneously find two or more nonces this way: a) if your device processes hashes sequentially (one thread), of course there cannot be simultaneous nonces if we consider T=1/1E9 sec =1/(GPU hashes per timeframe); b) if your device processes more than one hash simultaneously (more threads), there can be simultaneous nonces, but every thread uses just a part of the device computing power. Let's say we have five threads. Each thread is capable of 200MH/s, so it finds a nonce in about 21.47 sec (that is: 2^32 H / 200MH/s) In 1/1E9 sec each thread finds a mean of 1/21.47G nonces. The probability that at least N of our five threads find a nonce in the same clock is 1/21.47G^N. We can say zero, and we don't need any Poisson distribution for it.
|
|
|
|
ryann
Member
Offline
Activity: 70
Merit: 10
|
|
October 15, 2012, 08:11:18 PM |
|
I used the new 2.8.3 and its noticeably slower for some reason. My Mhash is the same as 2.7.5 but the the accepts shares are like significantly less. I usually do about 600 shares an hour with this im doing 120 shares
|
|
|
|
amigaman
|
|
October 15, 2012, 08:26:34 PM |
|
Do you have higher difficulty? If you use stratum protocol and your workers give too much hashes, the pool increases difficulty for you, so you have less shares but get paid better per share. CGM shows that in the shares line.
|
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
October 15, 2012, 08:28:27 PM |
|
I'm using 2.8.3 with Linux. I mine on many pools, and I do hop on some of them, having the others as backups, so pool switches are common here. My only stratum pool is Slush. I have it in my configuration file as: "url" : "http://api.bitcoin.cz:8332" so I leave it to cgminer to recognize ad activate the stratum protocol, which it does: Switching pool 7 http://api.bitcoin.cz:8332 to stratum+tcp://api-stratum.bitcoin.cz:3333 But after a few hours that cgminer is running, I always find this stratum pool as "Enabled Dead". While mining on another pool, in the log I see things like: [2012-10-15 01:31:50] Pool 7 http://api.bitcoin.cz:8332 not responding! [2012-10-15 01:31:51] Pool 7 http://api.bitcoin.cz:8332 alive but, checking immediately after that, in the Pools section the pool is showed as "Enabled Dead", and it will remain dead forever. As soon as I restart cgminer, the pool is alive again (so it's not a pool problem). I don't know if it happens every time the pool doesn't respond for a while, or whenever my box changes its IP address. I'm just sure it happens at least every few hours. HTH. I noticed this also when I used BTC Guild as a backup for a while. I just assumed they were sick of me using them for detecting new blocks sooner without actually giving much work back; I had just as many getworks with them as I did with my primary pool. I haven't seen this myself but there could well still be a bug in there. I'll keep auditing code.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
Vbs
|
|
October 15, 2012, 10:27:14 PM |
|
This is what I don't understand too well (sorry to bother you).
Let's assume, as you wrote, that your device has 1GH/s of computing power.
So what T do you like to choose?
If you choose T=1 sec, then the probabilty to find one nonce in T is the mean, so is lambda. Lambda is 1/4.3=0.23. 23%. The probability to find two nonces in T, according to Poisson distribution, is ((0.2325^2)*e^-0.2325)/2!=0.021. 2,1%.
Depends on what question you are trying to solve. The question you answered above is "What is the probability of finding k nonces in a period of 1s? And in a period of 2s?" That was not the question I was answering. My question was "What is the probability of finding k nonces at the same time?" If you choose T=1/1E9 sec, AKA the clock tick duration of your device, then I calculate the probability to simultaneously find two or more nonces this way: a) if your device processes hashes sequentially (one thread), of course there cannot be simultaneous nonces if we consider T=1/1E9 sec =1/(GPU hashes per timeframe); b) if your device processes more than one hash simultaneously (more threads), there can be simultaneous nonces, but every thread uses just a part of the device computing power. Let's say we have five threads. Each thread is capable of 200MH/s, so it finds a nonce in about 21.47 sec (that is: 2^32 H / 200MH/s). In 1/1E9 sec each thread finds a mean of 1/21.47G nonces.
The problem is independent of thread execution time, because that is relatively constant. They all end at mostly the same time in a wavefront. The problem is: when they all end, how many have nonces? The probability that at least N of our five threads find a nonce in the same clock is 1/21.47G^N. We can say zero, and we don't need any Poisson distribution for it.
When n->inf and p->0 the Poisson follows the Binomial, so either gives the same results: https://en.wikipedia.org/wiki/Poisson_limit_theoremI can rewrite everything in another way, using another example: Assuming each wavefront composed of 256 threads with 2 vectors each:
lambda = 1/(2^32)*256*2 Poisson(k) = (lambda^k)*(exp^(-lambda))/(k!)
Probability of finding 1 nounce in a wavefront: P = Poisson(1) = 1.1921e-07 P = Binomial(1, 512, 1/(2^32)) = 1.1921e-07
Probability of finding 2 nounces in a wavefront: P = Poisson(2) = 7.1054e-15 P = Binomial(2, 512, 1/(2^32)) = 7.0915e-15
Either way we look at the problem, the answer is always the same: the probability of finding 2 nonces at the same time is 10^8 smaller than for finding 1 nonce! In another words, a ~0.1 Exahash/s hardware will find 2 nonces simultaneously at about the same rate a 1GH/s card finds 1 nonce right now.
|
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
October 15, 2012, 10:34:38 PM |
|
Don't forget modern GPUs have up to 2048 shaders...
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
aka_bigred
Newbie
Offline
Activity: 5
Merit: 0
|
|
October 15, 2012, 10:35:47 PM |
|
I've noticed that the GPU core temp is no longer being displayed on the output from version 2.8.3 for my 5850 on Win7x64. Previously when using 2.7.6 and before it was outputing GPU temps. No change in SDK/Driver, the only change was the cgminer.exe binary. I looked over the config, and nothing looks too different - What am I missing in order to output/log GPU temps? >>"%LOGFILEANDPATH%" 2>&1 cgminer.exe -T --kernel-path "%PATHTOKERNELS%" --kernel phatk --device 0 --verbose --log 30 --gpu-vddc 1.088 --gpu-engine 977 --gpu-memclock 300 --intensity 10 --queue 8 --gpu-threads 4 --worksize 256 %POOLSSTRING%
Log file from cgminer 2.8.3 [2012-10-15 17:04:35] Started cgminer 2.8.3 [2012-10-15 17:04:36] CL Platform 0 vendor: Advanced Micro Devices, Inc. [2012-10-15 17:04:36] CL Platform 0 name: AMD Accelerated Parallel Processing [2012-10-15 17:04:36] CL Platform 0 version: OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1) [2012-10-15 17:04:36] Platform 0 devices: 1 [2012-10-15 17:04:36] 0 Cypress [2012-10-15 17:04:36] Failed to ADL_Adapter_ID_Get. Error -1 [2012-10-15 17:04:36] Failed to ADL_Adapter_ID_Get. Error -1 [2012-10-15 17:04:36] GPU 0 ATI Radeon HD 5800 Series hardware monitoring enabled [2012-10-15 17:04:36] Setting GPU 0 engine clock to 977 [2012-10-15 17:04:36] Setting GPU 0 memory clock to 300 [2012-10-15 17:04:36] Setting GPU 0 voltage to 1.088 [2012-10-15 17:04:36] Probing for an alive pool [2012-10-15 17:04:36] Testing pool http://localhost:9332 [2012-10-15 17:04:36] HTTP request failed: Empty reply from server [2012-10-15 17:04:36] Stratum authorisation success for pool 0 [2012-10-15 17:04:36] Pool 0 http://localhost:9332 active [2012-10-15 17:04:36] Init GPU thread 0 GPU 0 virtual GPU 0 [2012-10-15 17:04:36] CL Platform vendor: Advanced Micro Devices, Inc. [2012-10-15 17:04:36] CL Platform name: AMD Accelerated Parallel Processing [2012-10-15 17:04:36] CL Platform version: OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1) [2012-10-15 17:04:36] List of devices: [2012-10-15 17:04:36] 0 Cypress [2012-10-15 17:04:36] Selected 0: Cypress [2012-10-15 17:04:36] Initialising kernel phatk120823.cl with bitalign, 2 vectors and worksize 256 [2012-10-15 17:04:36] initCl() finished. Found Cypress [2012-10-15 17:04:36] Init GPU thread 1 GPU 0 virtual GPU 0 [2012-10-15 17:04:36] CL Platform vendor: Advanced Micro Devices, Inc. [2012-10-15 17:04:36] CL Platform name: AMD Accelerated Parallel Processing [2012-10-15 17:04:36] CL Platform version: OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1) [2012-10-15 17:04:36] List of devices: [2012-10-15 17:04:36] 0 Cypress [2012-10-15 17:04:36] Selected 0: Cypress [2012-10-15 17:04:37] Initialising kernel phatk120823.cl with bitalign, 2 vectors and worksize 256 [2012-10-15 17:04:37] initCl() finished. Found Cypress [2012-10-15 17:04:37] Init GPU thread 2 GPU 0 virtual GPU 0 [2012-10-15 17:04:37] CL Platform vendor: Advanced Micro Devices, Inc. [2012-10-15 17:04:37] CL Platform name: AMD Accelerated Parallel Processing [2012-10-15 17:04:37] CL Platform version: OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1) [2012-10-15 17:04:37] List of devices: [2012-10-15 17:04:37] 0 Cypress [2012-10-15 17:04:37] Selected 0: Cypress [2012-10-15 17:04:37] Initialising kernel phatk120823.cl with bitalign, 2 vectors and worksize 256 [2012-10-15 17:04:37] initCl() finished. Found Cypress [2012-10-15 17:04:37] Init GPU thread 3 GPU 0 virtual GPU 0 [2012-10-15 17:04:37] CL Platform vendor: Advanced Micro Devices, Inc. [2012-10-15 17:04:37] CL Platform name: AMD Accelerated Parallel Processing [2012-10-15 17:04:37] CL Platform version: OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1) [2012-10-15 17:04:37] List of devices: [2012-10-15 17:04:37] 0 Cypress [2012-10-1 [2012-10-15 17:04:38] Submitting share d72f00f1 to pool 0 [2012-10-15 17:04:39] Accepted d72f00f1 Diff 1/1 GPU 0 pool 0 [2012-10-15 17:04:39] New block: 0000039f47f7ac4bdb5b4c64abb1a9c3... [2012-10-15 17:04:39] Stratum from pool 0 detected new block [2012-10-15 17:04:39] Initialising kernel phatk120823.cl with bitalign, 2 vectors and worksize 256 [2012-10-15 17:04:39] initCl() finished. Found Cypress [2012-10-15 17:04:39] 4 gpu miner threads started ... (30s):416.7M (avg):406.7Mh/s | Q:2803 A:7542 R:21 HW:0 E:269% U:5.6/m ... (30s):437.4M (avg):406.7Mh/s | Q:2810 A:7572 R:21 HW:0 E:269% U:5.6/m ... (30s):397.1M (avg):406.7Mh/s | Q:2811 A:7575 R:21 HW:0 E:269% U:5.6/m ... (30s):396.2M (avg):406.7Mh/s | Q:2812 A:7580 R:21 HW:0 E:270% U:5.6/m ..
Log file from cgminer 2.7.6 [2012-09-27 16:08:00] Started cgminer 2.7.6 [2012-09-27 16:08:00] CL Platform 0 vendor: Advanced Micro Devices, Inc. [2012-09-27 16:08:00] CL Platform 0 name: AMD Accelerated Parallel Processing [2012-09-27 16:08:00] CL Platform 0 version: OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1) [2012-09-27 16:08:00] Platform 0 devices: 1 [2012-09-27 16:08:00] 0 Cypress [2012-09-27 16:08:01] Failed to ADL_Adapter_ID_Get. Error -1 [2012-09-27 16:08:01] Failed to ADL_Adapter_ID_Get. Error -1 [2012-09-27 16:08:01] GPU 0 ATI Radeon HD 5800 Series hardware monitoring enabled [2012-09-27 16:08:01] Setting GPU 0 engine clock to 980 [2012-09-27 16:08:01] Setting GPU 0 memory clock to 300 [2012-09-27 16:08:01] Setting GPU 0 voltage to 1.088 [2012-09-27 16:08:01] Probing for an alive pool [2012-09-27 16:08:01] Testing pool http://localhost:8336 [2012-09-27 16:08:01] Pool 0 http://localhost:8336 active [2012-09-27 16:08:01] Init GPU thread 0 GPU 0 virtual GPU 0 [2012-09-27 16:08:01] CL Platform vendor: Advanced Micro Devices, Inc. [2012-09-27 16:08:01] Long-polling activated for http://localhost:8336/lp [2012-09-27 16:08:01] CL Platform name: AMD Accelerated Parallel Processing [2012-09-27 16:08:01] CL Platform version: OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1) [2012-09-27 16:08:01] List of devices: [2012-09-27 16:08:01] 0 Cypress [2012-09-27 16:08:01] New block: 0000051dcdc2f44fd0bfc7859eb9921a... [2012-09-27 16:08:01] Selected 0: Cypress [2012-09-27 16:08:01] Initialising kernel phatk120823.cl with bitalign, 2 vectors and worksize 256 [2012-09-27 16:08:01] initCl() finished. Found Cypress [2012-09-27 16:08:01] Init GPU thread 1 GPU 0 virtual GPU 0 [2012-09-27 16:08:01] CL Platform vendor: Advanced Micro Devices, Inc. [2012-09-27 16:08:01] CL Platform name: AMD Accelerated Parallel Processing [2012-09-27 16:08:01] CL Platform version: OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1) [2012-09-27 16:08:01] List of devices: [2012-09-27 16:08:01] 0 Cypress [2012-09-27 16:08:01] Selected 0: Cypress [2012-09-27 16:08:01] Pool 0 http://localhost:8336 alive [2012-09-27 16:08:02] Initialising kernel phatk120823.cl with bitalign, 2 vectors and worksize 256 [2012-09-27 16:08:02] initCl() finished. Found Cypress [2012-09-27 16:08:02] Init GPU thread 2 GPU 0 virtual GPU 0 [2012-09-27 16:08:02] CL Platform vendor: Advanced Micro Devices, Inc. [2012-09-27 16:08:02] CL Platform name: AMD Accelerated Parallel Processing [2012-09-27 16:08:02] CL Platform version: OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1) [2012-09-27 16:08:02] List of devices: [2012-09-27 16:08:02] 0 Cypress [2012-09-27 16:08:02] Selected 0: Cypress [2012-09-27 16:08:02] Initialising kernel phatk120823.cl with bitalign, 2 vectors and worksize 256 [2012-09-27 16:08:03] initCl() finished. Found Cypress [2012-09-27 16:08:03] Init GPU thread 3 GPU 0 virtual GPU 0 [2012-09-27 16:08:03] CL Platform vendor: Advanced Micro Devices, Inc. [2012-09-27 16:08:03] CL Platform name: AMD Accelerated Parallel Processing [2012-09-27 16:08:03] CL Platform version: OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1) [2012-09-27 16:08:03] List of dev [2012-09-27 16:08:03] 0 Cypress [2012-09-27 16:08:03] Selected 0: Cypress [2012-09-27 16:08:04] Initialising kernel phatk120823.cl with bitalign, 2 vectors and worksize 256 [2012-09-27 16:08:04] initCl() finished. Found Cypress [2012-09-27 16:08:04] 4 gpu miner threads started ... ... [2012-09-27 17:45:18] Accepted 21adfc25 Diff 1 GPU 0 pool 0 [2012-09-27 17:45:18] GPU0 67.0C 3237RPM | (30s):408.2 (avg):407.7 Mh/s | A:562 R:3 HW:0 U:5.8/m I:10 [2012-09-27 17:45:20] Accepted 729a0594 Diff 1 GPU 0 pool 0 [2012-09-27 17:45:20] GPU0 67.0C 3250RPM | (30s):408.2 (avg):407.7 Mh/s | A:563 R:3 HW:0 U:5.8/m I:10 [2012-09-27 17:45:34] Accepted 0a38cdc3 Diff 1 GPU 0 pool 0 [2012-09-27 17:45:34] GPU0 67.0C 3227RPM | (30s):409.1 (avg):408.9 Mh/s | A:564 R:3 HW:0 U:5.8/m I:10 [2012-09-27 17:45:40] Accepted 8b296886 Diff 1 GPU 0 pool 0 [2012-09-27 17:45:40] GPU0 67.0C 3258RPM | (30s):408.7 (avg):409.4 Mh/s | A:565 R:3 HW:0 U:5.8/m I:10
|
|
|
|
ralree
|
|
October 15, 2012, 10:52:47 PM |
|
Just updated to 2.8.3. I have to say I really don't like the new precision on numeric output: GPU 0: 73.0C 2900RPM | 604.2M/661.2Mh/s | A:3 R:0 HW:0 U: 7.03/m I: 9 GPU 1: 74.0C 2912RPM | 595.1M/659.3Mh/s | A:5 R:0 HW:0 U:11.72/m I: 9 GPU 2: 74.0C 2922RPM | 611.8M/658 Mh/s | A:10 R:0 HW:0 U:23.44/m I: 9 Could you put a .0 on there instead of all that blank space? It's a minor thing, but I like it better that way. This is a side effect of trying to find a generic format that is aligned on the screen and fits values from 0 to 18,446,744,073,709,551,616 in a generic way, while still maintaining adequate precision for the relative rate for that device. It is not entirely straight forward and what to do about zeroes is not ever going to be to everyone's satisfaction. 001.0 or 01.00 or 1.000 ? By the way, that massive value would show up as 18.45EH/s with that current scheme, so that it could show up aligned on the same screen as something with 0.001 H/s. Thanks - I figured as much. The units allow you to have a maximum 3 digits on the left side, though, so you could simply lay it out such that there is a space-padded 3-digit integer part and a 1 digit decimal part all the time, no matter what, meaning: 604.2M/661.2Mh/s 595.1M/659.3Mh/s 611.8M/658.0Mh/s In my case, and using your example: 604.2M/661.2Mh/s 595.1M/659.3Mh/s 611.8M/658.0Mh/s 18.5E/ 18.5Eh/s
|
1MANaTeEZoH4YkgMYz61E5y4s9BYhAuUjG
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
October 15, 2012, 10:55:36 PM |
|
Yes but that would make my 2.722 Gh only appear as 2.7GH which is not accurate enough. Significant digits is the key.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
Vbs
|
|
October 16, 2012, 12:15:37 AM |
|
Don't forget modern GPUs have up to 2048 shaders...
Thanks! So, refactoring... Assuming a worst case scenario, with each wavefront composed of 2048 threads with 4 vectors each:
lambda = 1/(2^32)*2048*4 Poisson(k) = (lambda^k)*(exp^(-lambda))/(k!)
Probability of finding 1 nonce in a wavefront: P = Poisson(1) = 1.9073e-06 P = Binomial(1, 8192, 1/(2^32)) = 1.9073e-06
Probability of finding 2 nonces in a wavefront: P = Poisson(2) = 1.8190e-12 P = Binomial(2, 8192, 1/(2^32)) = 1.8188e-12
So, if they all end at the same time (or during the non-atomic mem write op), for every ~1.000.000 found nonces you'll throw one away (overwritten). If using vectors 2, every 2.1e6, and vectors 1 every 4.2e6. On a 1GH/s card, where finding a nonce takes about ~4.3s, it will take around 50 days of 24/7 runtime for the vectors 4 case to happen once.
|
|
|
|
|