-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
February 24, 2012, 09:38:54 PM |
|
Thanks for this mate. This means that the probability of finding 2 hashes in the same vector is 1/(4.3e9*4.3e9)), which is infinitesimally close to 1/inf ~= 0. This allows for a further optimization of the code. Using a VECTORS2 example, #elif defined VECTORS2 bool result = min(W[117].x,W[117].y); if (!result) { if (!W[117].x) output[FOUND] = output[NFLAG & W[3].x] = W[3].x; else //if (!W[117].y) output[FOUND] = output[NFLAG & W[3].y] = W[3].y; }
Since min() takes care of the false positives, the 'else' branch is only true when W[117].y==0. The result in the KernelAnalyzer for a 5870 is: phatk 120223 -> cycles: min:67.65, max:68.15, avg:67.82, alu:1363 phatk "new" -> cycles: min:67.65, max:67.90, avg:67.78, alu:1362
This looks okay but it's in the output path so not hit very often so unlikely to make a demonstrable performance change :\
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
Dyaheon
Member
Offline
Activity: 121
Merit: 10
|
|
February 24, 2012, 10:58:15 PM |
|
Great job on the 2.3.1! I gained some 1% or even a bit more with my 5k series cards on both SDK 2.1 and SDK 2.4 systems, compared to 2.3.0 on phatk kernel.
|
|
|
|
Diapolo
|
|
February 24, 2012, 10:58:25 PM |
|
I've got a nice idea for VECTORS2 and the nonce-check ^^ ... so the chance to get 2 positive nonces within a single uint2 work-item is extremely small, right? Will play around with it tomorrow and perhaps I'll do another commit for diakgcn.
Dia
|
|
|
|
bulanula
|
|
February 24, 2012, 11:07:35 PM |
|
Great job on the 2.3.1! I gained some 1% or even a bit more with my 5k series cards on both SDK 2.1 and SDK 2.4 systems, compared to 2.3.0 on phatk kernel. What kernel is this ? Still phatk one ? I got 5870s. Can memory still be underclocked to 300 and you get still good performance ? Thanks !
|
|
|
|
Dyaheon
Member
Offline
Activity: 121
Merit: 10
|
|
February 24, 2012, 11:15:15 PM |
|
Great job on the 2.3.1! I gained some 1% or even a bit more with my 5k series cards on both SDK 2.1 and SDK 2.4 systems, compared to 2.3.0 on phatk kernel. What kernel is this ? Still phatk one ? I got 5870s. Can memory still be underclocked to 300 and you get still good performance ? Thanks ! phatk as I mentioned And on 2.1 & 2.4 SDK, yes they can. Not sure about 2.6, never used that with 5870s. Currently hashing away at 444.4MH/s on a 950/300 5870, SDK 2.1 -g 1 -I 10 -w 256 -v 2, although -g 1 is probably not a good idea, just something I've stuck with . SDK 2.4 gives me the same, perhaps even very slightly faster hashrate
|
|
|
|
bulanula
|
|
February 24, 2012, 11:21:10 PM |
|
Great job on the 2.3.1! I gained some 1% or even a bit more with my 5k series cards on both SDK 2.1 and SDK 2.4 systems, compared to 2.3.0 on phatk kernel. What kernel is this ? Still phatk one ? I got 5870s. Can memory still be underclocked to 300 and you get still good performance ? Thanks ! phatk as I mentioned And on 2.1 & 2.4 SDK, yes they can. Not sure about 2.6, never used that with 5870s. Currently hashing away at 444.4MH/s on a 950/300 5870, SDK 2.1 -g 1 -I 10 -w 256 -v 2, although -g 1 is probably not a good idea, just something I've stuck with . SDK 2.4 gives me the same, perhaps even very slightly faster hashrate Can you try 950 / 300 on SDK 2.1 and 2.4 and see what the difference is ( make sure to delete the bins etc. ) ? Maybe also try 960 core / 300 memory ? What OS btw ? Thanks !
|
|
|
|
Vbs
|
|
February 24, 2012, 11:32:20 PM |
|
Thanks for this mate. This means that the probability of finding 2 hashes in the same vector is 1/(4.3e9*4.3e9)), which is infinitesimally close to 1/inf ~= 0. This allows for a further optimization of the code. Using a VECTORS2 example, #elif defined VECTORS2 bool result = min(W[117].x,W[117].y); if (!result) { if (!W[117].x) output[FOUND] = output[NFLAG & W[3].x] = W[3].x; else //if (!W[117].y) output[FOUND] = output[NFLAG & W[3].y] = W[3].y; }
Since min() takes care of the false positives, the 'else' branch is only true when W[117].y==0. The result in the KernelAnalyzer for a 5870 is: phatk 120223 -> cycles: min:67.65, max:68.15, avg:67.82, alu:1363 phatk "new" -> cycles: min:67.65, max:67.90, avg:67.78, alu:1362
This looks okay but it's in the output path so not hit very often so unlikely to make a demonstrable performance change :\ True, and the better the branching prediction works with "if (!result)" the lesser it will be taken. I'll check how min() gets implemented in low level.
|
|
|
|
kano
Legendary
Offline
Activity: 4620
Merit: 1851
Linux since 1997 RedHat 4
|
|
February 24, 2012, 11:37:25 PM |
|
I've got a nice idea for VECTORS2 and the nonce-check ^^ ... so the chance to get 2 positive nonces within a single uint2 work-item is extremely small, right? Will play around with it tomorrow and perhaps I'll do another commit for diakgcn.
Dia
The chance of getting a positive nonce is ALWAYS the same for each hash you do, no matter when you do it. If a single thread is idle it is wasted. Edit: and aborting all threads when you find a nonce means you on average double the overhead of setting up work. (i.e. time wasted when the GPU could be mining)
|
|
|
|
Dyaheon
Member
Offline
Activity: 121
Merit: 10
|
|
February 25, 2012, 12:06:31 AM |
|
Can you try 950 / 300 on SDK 2.1 and 2.4 and see what the difference is ( make sure to delete the bins etc. ) ?
Maybe also try 960 core / 300 memory ?
What OS btw ?
Thanks !
Sorry, can't. Cards are on different computer, and the 5870 on the sdk 2.4 machine is on an extender, which slows down the hashrate somewhat. What I can do however, is give you the difference between a 5970 @ 810/300 on 2.4 and a 5970 at the same clocks on 2.1. SDK 2.4: GPU 1: 51.5C 1569RPM | 375.7/375.7Mh/s | A: 98 R:0 HW:0 U: 4.86/m I:10 GPU 2: 55.0C 1569RPM | 375.7/375.7Mh/s | A: 97 R:0 HW:0 U: 4.81/m I:10 SDK 2.1: GPU 0: 82.5C 3840RPM | 375.6/375.4Mh/s | A:457 R:2 HW:0 U: 5.27/m I:10 GPU 1: 82.5C 3840RPM | 375.4/375.4Mh/s | A:477 R:0 HW:0 U: 5.50/m I:10 You can multiply that by 960/810 or 950/810 to get a good estimate of a 5870's performance at 960 and 950 clocks, respectively. So it seems that 2.4 is very slightly better at 300 memclocks and I 10, 1 thread. Haven't had time to test other settings, it could vary. Oh and OS is 64-bit Lubuntu.
|
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
February 25, 2012, 12:54:20 AM |
|
SDK 2.4: GPU 1: 51.5C 1569RPM | 375.7/375.7Mh/s | A: 98 R:0 HW:0 U: 4.86/m I:10 GPU 2: 55.0C 1569RPM | 375.7/375.7Mh/s | A: 97 R:0 HW:0 U: 4.81/m I:10
SDK 2.1: GPU 0: 82.5C 3840RPM | 375.6/375.4Mh/s | A:457 R:2 HW:0 U: 5.27/m I:10 GPU 1: 82.5C 3840RPM | 375.4/375.4Mh/s | A:477 R:0 HW:0 U: 5.50/m I:10 So it seems that 2.4 is very slightly better at 300 memclocks and I 10, 1 thread.
I would say the difference is below noise levels, so I would say they perform identically on that hardware/software combo.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
Vbs
|
|
February 25, 2012, 01:21:58 AM |
|
Been testing some changes on phatk with the KernelAnalyzer and my own personal testing. Using a VECTORS2 example, bool result = W[117].x & W[117].y;
gives a lot of false positives, changing it to bool result = min(W[117].x,W[117].y);
is guaranteed to give yummy results! (same ALU #ops and fetch, no false positives on the next 'if') See now this is dangerous. Do you REALLY know how fast the "min" function is on all SDKs? Don't expect AMD to do the right thing and to guarantee it's as fast as &. min(x,y) http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/commonMin.htmlgets implemented low-level as w: MIN_UINT R0.w, R0.x, PV1350.y , which *should* (I know, AMD... ) be rather stable. The big problem with the alternative (&) is the huge number of false positives, since it's bitwise, like 01010011 & 10101100 = 00000000, which is bad for the branch predictor. I'm testing now with a conservative approach (just this one change from default), #elif defined VECTORS2 bool result = min(W[117].x,W[117].y); if (!result) { if (!W[117].x) output[FOUND] = output[NFLAG & W[3].x] = W[3].x; if (!W[117].y) output[FOUND] = output[NFLAG & W[3].y] = W[3].y; } and got a slight (3~4MH/s) increase (5850, SDK 2.5 from Cat 11.11).
|
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
February 25, 2012, 01:59:49 AM |
|
min(x,y) http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/commonMin.htmlgets implemented low-level as w: MIN_UINT R0.w, R0.x, PV1350.y , which *should* (I know, AMD... ) be rather stable. The big problem with the alternative (&) is the huge number of false positives, since it's bitwise, like 01010011 & 10101100 = 00000000, which is bad for the branch predictor. I'm testing now with a conservative approach (just this one change from default), #elif defined VECTORS2 bool result = min(W[117].x,W[117].y); if (!result) { if (!W[117].x) output[FOUND] = output[NFLAG & W[3].x] = W[3].x; if (!W[117].y) output[FOUND] = output[NFLAG & W[3].y] = W[3].y; } and got a slight (3~4MH/s) increase (5850, SDK 2.5 from Cat 11.11). You can do the maths on false positives. You're greatly exaggerating the "HUGE NUMBER". It's about 1 share for 1 false positive. More so on 4 vectors (but no one uses them). That is not remotely common... Increase eh? Call me sceptical to the core. EDIT: I will look into it, but I'm so terrified of unintentionally breaking shit like I did last time. It was in this code specifically where the slowdown was, so you can imagine why I'm so resistant.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
DutchBrat
|
|
February 25, 2012, 02:01:07 AM |
|
Hi Ckolivas, Downloaded the new version, let it run for a while while I was watching some videos on my pc (Windows XP), running the new version with Dynamic Intensity (1 thread automatically disabled). Then I changed the Intensity to 8 (running a 5800) and the following weird message came up: G[P2U0 01:2 2-8002.-92 /5 2 0810:.530 M:h0/4s] | T Ah:r5e7a9d R 1: b1e HiWn:g0 re U-:e3n.a7b1l/emd I 8 The intensity did set to 8, so no problems there, might just be cosmetic, thought I would put it out here anyway This is a dump of the entire screen: cgminer version 2.3.1 - Started: [2012-02-24 23:13:58] -------------------------------------------------------------------------------- (5s):282.2 (avg):280.3 Mh/s | Q:1541 A:583 R:1 HW:0 E:38% U:3.70/m TQ: 2 ST: 4 SS: 0 DW: 84 NB: 12 LW: 0 GF: 3 RF: 0 Connected to http://mine2.btcguild.com:8332 with LP as user Block: 00000a7c9a40539601dd382d3a7d13a0... Started: [01:35:44] -------------------------------------------------------------------------------- [P]ool management [G]PU management ettings [D]isplay options [Q]uit GPU 0: 72.0C 2757RPM | 284.0/280.3Mh/s | A:583 R:1 HW:0 U: 3.70/m I: 8 --------------------------------------------------------------------------------
8 Intensity on gpu 0 set to 8 G[P2U0 01:2 2-8002.-92 /5 2 0810:.530 M:h0/4s] | T Ah:r5e7a9d R 1: b1e HiWn:g0 re U-:e3n.a7b1l/emd I 8 72.0 C F: 65% (2758 RPM) E: 900 MHz M: 800 Mhz V: 1.163V A: 98% P: 0% Last initialised: [2012-02-24 23:14:03] Intensity: 8 Thread 0: 282.8 Mh/s Enabled ALIVE Thread 1: 2.0 Mh/s Enabled ALIVE
[E]nable [D]isable ntensity [R]estart GPU [C]hange settings
Brat
|
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
February 25, 2012, 02:02:38 AM |
|
Hi Ckolivas,
Downloaded the new version, let it run for a while while I was watching some videos on my pc (Windows XP), running the new version with Dynamic Intensity (1 thread automatically disabled). Then I changed the Intensity to 8 (running a 5800) and the following weird message came up:
G[P2U0 01:2 2-8002.-92 /5 2 0810:.530 M:h0/4s] | T Ah:r5e7a9d R 1: b1e HiWn:g0 re U-:e3n.a7b1l/emd I 8 Yeah the curses interface just scrambles output occasionally. Harmless.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
omo
|
|
February 25, 2012, 02:43:42 AM |
|
sometimes cgminer display chaios info(for ex. the GPU1 line below) also the info "^[^B, sleeping for 30s" : cgminer version 2.3.1 - Started: [2012-02-24 23:38:00] -------------------------------------------------------------------------------- (5s):336.42(avg):472.4 Mh/s | Q:77450 A:43434 R:39 HW:0 E:56% U:6.59/m TQ: 33 STT 44 S: 30 DW: 1911 NB: 80 LW: 69 GF: 27 RF: 11 Connected to http://mmrpc.bitparking.com:80/ with LP as user **** Block: 0000040d77cbcf9fb906c8b45953feef... Started: [10:32:52] -------------------------------------------------------------------------------- [P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit GPU 0: 60.05 578 | 105.45105.0Mh/s | A: 9878R: 7 HW:0 U:1.50/m I: 3 GPU 1: 7765C 29469PM | 372.65367.44h/s | A:33556R:32 HW:0 U:5.09/m I: 9 --------------------------------------------------------------------------------
[2012-02-25 10:35:46] Accepted 00000000.b7ef9df9.0dcb577c GPU 1 thread 3 pool 0 ^[^B, sleeping for 30s [2012-02-25 10:36:06] Accepted 00000000.0428b7b0.81fe4c1c GPU 1 thread 2 pool 0 [2012-02-25 10:36:08] Accepted 00000000.ea1491ca.ebef84ea GPU 1 thread 3 pool 0 [2012-02-25 10:36:13] Accepted 00000000.8f4412eb.39157414 GPU 1 thread 2 pool 0 [2012-02-25 10:36:18] Accepted 00000000.f0ae9070.23e80435 GPU 1 thread 3 pool 0 ^[^B, sleeping for 30s [2012-02-25 10:36:37] Accepted 00000000.e112fd60.55ec2a41 GPU 0 thread 0 pool 0 [2012-02-25 10:36:50] Accepted 00000000.cd40f348.5ed1d95e GPU 1 thread 3 pool 0 ^[^B, sleeping for 30s [2012-02-25 10:37:02] Accepted 00000000.0915ea82.c0bdaab4 GPU 1 thread 2 pool 0
|
BTC:1Fu4TNpVPToxxhSXBNSvE9fz6X3dbYgB8q
|
|
|
Ed
Member
Offline
Activity: 69
Merit: 10
|
|
February 25, 2012, 07:05:52 AM |
|
New release: Version 2.2.7 - February 20, 2012 ...... reject ratio higher for me, about 3% instead 0,5% at 2.1.2 my conf. p2pool 462b252 multi merged mining bitcoind 0.6 atiumdag 8.920.0.0 (Catalyst 11.12) / Win7 64 OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1) Version 2.3.1 - February 24, 2012 2.3.1-2 reject ratio still about 3% for me...
|
|
|
|
Andrew Vorobyov
|
|
February 25, 2012, 10:40:27 AM Last edit: February 25, 2012, 10:51:22 AM by Andrew Vorobyov |
|
[SOLVED] https://bitcointalk.org/index.php?topic=22554.0Don't see temperature etc... AMD 2.4 SDK, ubuntu 11.04 ------------------------------------------------------------------------ cgminer 2.3.1 ------------------------------------------------------------------------
Configuration Options Summary:
OpenCL...............: FOUND. GPU mining support enabled ADL..................: SDK found, GPU monitoring support enabled
BitForce.FPGAs.......: Disabled Icarus.FPGAs.........: Disabled
CPU Mining...........: Disabled
Compilation............: make (or gmake) CPPFLAGS.............: CFLAGS...............: -O2 -Wall -march=native LDFLAGS..............: -lpthread LDADD................: -ldl -lcurl compat/jansson/libjansson.a -lpthread -lOpenCL -lncurses -lm
Installation...........: make install (as root if needed, with 'su' or 'sudo') prefix...............: /usr/local
In miner when I see only GPU 0: 169.2 / 181.7 Mh/s | A:2 R:0 HW:0 U:10.48/m I:10 Last initialised: [2012-02-25 13:39:33] Intensity: 10 Thread 0: 170.6 Mh/s Enabled ALIVE
|
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
February 25, 2012, 12:19:58 PM Last edit: February 25, 2012, 12:32:25 PM by ckolivas |
|
New release: Version 2.2.7 - February 20, 2012 ...... reject ratio higher for me, about 3% instead 0,5% at 2.1.2 my conf. p2pool 462b252 multi merged mining bitcoind 0.6 atiumdag 8.920.0.0 (Catalyst 11.12) / Win7 64 OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1) Version 2.3.1 - February 24, 2012 2.3.1-2 reject ratio still about 3% for me... cgminer supports the SUBMITOLD extension now and p2pool is telling cgminer to submit the stale shares. So yep, it's working.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
Diapolo
|
|
February 25, 2012, 02:09:23 PM |
|
New release: Version 2.2.7 - February 20, 2012 ...... reject ratio higher for me, about 3% instead 0,5% at 2.1.2 my conf. p2pool 462b252 multi merged mining bitcoind 0.6 atiumdag 8.920.0.0 (Catalyst 11.12) / Win7 64 OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1) Version 2.3.1 - February 24, 2012 2.3.1-2 reject ratio still about 3% for me... cgminer supports the SUBMITOLD extension now and p2pool is telling cgminer to submit the stale shares. So yep, it's working. I'm currently playing around with p2pool, too ... so ne need to add --submit-stale as this is forced, if needed via SUBMITOLD, right? The LPs occur quite often for p2pool, so what would you suggest as a good intensity, perhaps in relation to MH/s ... could that be an idea to let CGMINER compute the best value for -I with p2pool (and I'm not talking about the normal -I d switch). Dia
|
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
February 25, 2012, 02:11:32 PM |
|
I'm currently playing around with p2pool, too ... so ne need to add --submit-stale as this is forced, if needed via SUBMITOLD, right? The LPs occur quite often for p2pool, so what would you suggest as a good intensity, perhaps in relation to MH/s ... could that be an idea to let CGMINER compute the best value for -I with p2pool (and I'm not talking about the normal -I d switch).
Yes to the first question, README to the second.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
|