BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 23, 2011, 10:43:17 PM |
|
69xx version would be wonderful ;-P
To all 69XX card owners, that want 1 ALU OP less, down to 1697 . Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version): Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124); with Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124); Please report if it works . Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower! Dia What's the rationale behind this? It seems very weird to me that the compiler interpret the two statement differently.
|
|
|
|
indio007
|
|
July 23, 2011, 11:49:00 PM |
|
69xx version would be wonderful ;-P
To all 69XX card owners, that want 1 ALU OP less, down to 1697 . Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version): Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124); with Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124); Please report if it works . Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower! Dia Will it be slower on a 6850?
|
|
|
|
MiningBuddy
|
|
July 24, 2011, 02:32:56 AM |
|
69xx version would be wonderful ;-P
To all 69XX card owners, that want 1 ALU OP less, down to 1697 . Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version): Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124); with Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124); Please report if it works . Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower! Dia Thanks, gave me a 0.31 Mh/s increase per core on my 6990's
|
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 24, 2011, 03:29:33 AM |
|
thats really weired. I cant see why the compiler generates better code with this statement - but it does. Verified it with SDK 2.4 myself .. Anyone noticed that during the last few rounds a lot of variables are calculated, but never used again? I took the time to optimize all unused statements out, but got no speed improvements at all. Seems like the compiler optimized it anyway. Here it is - from round 124 to the #ifdef VECTORS preprocessor command sharound(121);
// Optimized out all unused calculations W[122 - O] = P4(122) + P3(122) + P2(122) + P1(122); Vals[1] += K[58] + Vals[5] + W[122 - O] + s1(122) + ch(122); Vals[0] += K[59] + Vals[4] + s1(123) + ch(123) + P4(123) + P3(123) + P2(123) + P1(123); Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);
#ifdef VECTORS Maybe someone can trick the compiler into making better code ;-)
|
|
|
|
Diapolo (OP)
|
|
July 24, 2011, 08:41:18 AM |
|
69xx version would be wonderful ;-P
To all 69XX card owners, that want 1 ALU OP less, down to 1697 . Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version): Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124); with Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124); Please report if it works . Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower! Dia What's the rationale behind this? It seems very weird to me that the compiler interpret the two statement differently. You are not the only one , but the compiler sais it's one ALU OP less! Dia
|
|
|
|
Diapolo (OP)
|
|
July 24, 2011, 08:46:06 AM |
|
69xx version would be wonderful ;-P
To all 69XX card owners, that want 1 ALU OP less, down to 1697 . Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version): Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124); with Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124); Please report if it works . Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower! Dia Will it be slower on a 6850? 6850 is a VLIW5 design and will be slower ... really only 69XX cards!
|
|
|
|
Diapolo (OP)
|
|
July 24, 2011, 08:48:26 AM |
|
thats really weired. I cant see why the compiler generates better code with this statement - but it does. Verified it with SDK 2.4 myself .. Anyone noticed that during the last few rounds a lot of variables are calculated, but never used again? I took the time to optimize all unused statements out, but got no speed improvements at all. Seems like the compiler optimized it anyway. Here it is - from round 124 to the #ifdef VECTORS preprocessor command sharound(121);
// Optimized out all unused calculations W[122 - O] = P4(122) + P3(122) + P2(122) + P1(122); Vals[1] += K[58] + Vals[5] + W[122 - O] + s1(122) + ch(122); Vals[0] += K[59] + Vals[4] + s1(123) + ch(123) + P4(123) + P3(123) + P2(123) + P1(123); Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);
#ifdef VECTORS Maybe someone can trick the compiler into making better code ;-) Hey Vince, tried this by myself a few days ago and didn't get a better efficiency either ... perhaps I will have to throw it into the mixer again . And thanks again for your work! Dia
|
|
|
|
Diapolo (OP)
|
|
July 25, 2011, 02:03:38 PM |
|
thats really weired. I cant see why the compiler generates better code with this statement - but it does. Verified it with SDK 2.4 myself .. Anyone noticed that during the last few rounds a lot of variables are calculated, but never used again? I took the time to optimize all unused statements out, but got no speed improvements at all. Seems like the compiler optimized it anyway. Here it is - from round 124 to the #ifdef VECTORS preprocessor command sharound(121);
// Optimized out all unused calculations W[122 - O] = P4(122) + P3(122) + P2(122) + P1(122); Vals[1] += K[58] + Vals[5] + W[122 - O] + s1(122) + ch(122); Vals[0] += K[59] + Vals[4] + s1(123) + ch(123) + P4(123) + P3(123) + P2(123) + P1(123); Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);
#ifdef VECTORS Maybe someone can trick the compiler into making better code ;-) Hey Vince, tried this by myself a few days ago and didn't get a better efficiency either ... perhaps I will have to throw it into the mixer again . And thanks again for your work! Dia No chance, tried different things and combinations, but the OpenCL compiler does it better, than I do (again ^^). Dia
|
|
|
|
|
Diapolo (OP)
|
|
July 29, 2011, 06:48:07 PM |
|
Thanks for pointing me to that thread! Dia
|
|
|
|
Diapolo (OP)
|
|
July 30, 2011, 06:14:41 PM |
|
I'm working on a new version! The inputs came from the original Author of phatk, who released a version 2.0 of phatk (THANKS Phateus). Currently my version IS slower, but I see this as a fair and cool competition, from which all of us will benefit in the end.
Dia
|
|
|
|
Tx2000
|
|
July 30, 2011, 06:27:01 PM |
|
I'm working on a new version! The inputs came from the original Author of phatk, who released a version 2.0 of phatk (THANKS Phateus). Currently my version IS slower, but I see this as a fair and cool competition, from which all of us will benefit in the end.
Dia
Hopefully you can re-work his version because atm, it doesn't work for me on Win7/2.4SDK/GUIMiner lastest. I replaced your phatk kernel with his and it just stays at connecting spamming idle worker in console.
|
|
|
|
pennytrader
|
|
July 31, 2011, 11:12:27 AM |
|
I'm working on a new version! The inputs came from the original Author of phatk, who released a version 2.0 of phatk (THANKS Phateus). Currently my version IS slower, but I see this as a fair and cool competition, from which all of us will benefit in the end.
Dia
Actually your kernel is fast for me. Catalyst 11.6 + SDK 2.1, 975/300 gives me 313 mhs (your kernel) Catalyst 11.6 + SDK 2.4, 975/300 gives me 311 mhs (latest from the original author. SDK 2.1 doesn't work)
|
please donate to 1P3m2resGCP2o2sFX324DP1mfqHgGPA8BL
|
|
|
deepceleron
Legendary
Offline
Activity: 1512
Merit: 1036
|
|
July 31, 2011, 07:14:16 PM |
|
I tried the new kernel with phoenix/11.6/2.4/Win7/5830, and gave it lots of different command line options. I could only get it to be 'miner idle' five times a second, or 5Ghash/s with no share solves.
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
July 31, 2011, 11:26:38 PM |
|
I tried the new kernel with phoenix/11.6/2.4/Win7/5830, and gave it lots of different command line options. I could only get it to be 'miner idle' five times a second, or 5Ghash/s with no share solves.
this is the exact same problem I have too, but i use guiminer
|
|
|
|
Diapolo (OP)
|
|
August 01, 2011, 01:57:33 PM |
|
I tried the new kernel with phoenix/11.6/2.4/Win7/5830, and gave it lots of different command line options. I could only get it to be 'miner idle' five times a second, or 5Ghash/s with no share solves.
Are you talking about my kernel mod or phatk 2.0? What are your command line options. Try to delete the *.elf files in the phatk directory! Dia
|
|
|
|
deepceleron
Legendary
Offline
Activity: 1512
Merit: 1036
|
|
August 01, 2011, 05:17:10 PM |
|
I tried the new kernel with phoenix/11.6/2.4/Win7/5830, and gave it lots of different command line options. I could only get it to be 'miner idle' five times a second, or 5Ghash/s with no share solves.
Are you talking about my kernel mod or phatk 2.0? What are your command line options. Try to delete the *.elf files in the phatk directory! Dia Sorry for the confusion, that's regarding the 2.0 phatk that came this weekend.
|
|
|
|
Diapolo (OP)
|
|
August 01, 2011, 05:59:37 PM |
|
I tried the new kernel with phoenix/11.6/2.4/Win7/5830, and gave it lots of different command line options. I could only get it to be 'miner idle' five times a second, or 5Ghash/s with no share solves.
Are you talking about my kernel mod or phatk 2.0? What are your command line options. Try to delete the *.elf files in the phatk directory! Dia Sorry for the confusion, that's regarding the 2.0 phatk that came this weekend. Okay, but then please don't use this thread for phatk 2.0 support . I think you understand that. Regards, Dia
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
August 02, 2011, 09:10:31 AM |
|
Alright diapolo, you got your work cut out for you. Phateus fixed the guiminer bug (It was in his kernel. 1 too many #'s in __init__). You do great work; you might want to join forces at this point though
|
|
|
|
Tx2000
|
|
August 02, 2011, 08:21:31 PM |
|
Alright diapolo, you got your work cut out for you. Phateus fixed the guiminer bug (It was in his kernel. 1 too many #'s in __init__). You do great work; you might want to join forces at this point though No way! Competition breeds innovation.
|
|
|
|
|