Bitcoin Forum
November 19, 2024, 03:29:27 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 [7] 8 9 10 11 12 13 14 15 16 »  All
  Print  
Author Topic: Modified Kernel for Phoenix 1.5  (Read 96717 times)
iopq
Hero Member
*****
Offline Offline

Activity: 658
Merit: 500


View Profile
July 31, 2011, 10:20:05 AM
 #121

Quote
doesn't work when vectors are turned on
I am running SDK 2.1, it's either -v or --phatk2, doesn't work with both
How odd. Do you recall what the error message was, if any?

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(189): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[22] = P3C(22) + P1(22);
                          ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(189): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[22] = P3C(22) + P1(22);
                          ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(189): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        W[22] = P3C(22) + P1(22);
                          ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(190): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[23] = W16 + P1(23);
                      ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(190): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[23] = W16 + P1(23);
                      ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(190): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        W[23] = W16 + P1(23);
                      ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(192): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[24] = W17 + P1(24);
                      ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(192): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[24] = W17 + P1(24);
                      ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(192): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        W[24] = W17 + P1(24);
                      ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(Cool;
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(Cool;
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(Cool;
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(Cool;
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(Cool;
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(Cool;
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        sharoundC(Cool;
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(Cool;
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(Cool;
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(Cool;
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(Cool;
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(Cool;
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(Cool;
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(194): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[25] = P1(25) + P3(25);
                ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(194): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[25] = P1(25) + P3(25);
                ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(194): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        W[25] = P1(25) + P3(25);
                ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(196): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[26] = P1(26) + P3(26);
                ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(196): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[26] = P1(26) + P3(26);
                ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(196): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        W[26] = P1(26) + P3(26);
                ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(197): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[27] = P1(27) + P3(27);
                ^

Error limit reached.
100 errors detected in the compilation of "C:\Users\Igor\AppData\Local\Temp\OCL1
F64.tmp.cl".
Compilation terminated.
UniverseMan
Newbie
*
Offline Offline

Activity: 26
Merit: 0


View Profile
July 31, 2011, 11:18:42 PM
 #122

I'm using Ubuntu 11.04, Catalyst 11.6, Phoenix 1.50. I unpacked the phatk version 2 files into my phoenix-1.50/kernels/phatk folder.
When I ran my phoenix with kernel options
Code:
-k phatk DEVICE=0 BFI_INT VECTORS AGGRESSION=12 FASTLOOP=FALSE WORKSIZE=256
I got the following error:
Code:
user@computer:~$ sudo ./btcg0.sh
[31/07/2011 18:04:08] Phoenix 1.50 starting...
[31/07/2011 18:04:09] Connected to server
[0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]Unhandled error in Deferred:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 361, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 542, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/user/phoenix-1.50/QueueReader.py", line 136, in preprocess
    d2 = defer.maybeDeferred(self.preprocessor, nr)
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 133, in maybeDeferred
    result = f(*args, **kw)
  File "kernels/phatk/__init__.py", line 167, in <lambda>
    self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr),
  File "kernels/phatk/__init__.py", line 361, in preprocess
    kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION)
  File "kernels/phatk/__init__.py", line 46, in __init__
    unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32)
struct.error: unpack requires a string argument of length 32
I then had to CTRL+Z to kill the process.

Not sure if this error is related to anything discussed before. But it's no big deal, as I've merely switched back to my previous kernel. Cheers!
bcforum
Full Member
***
Offline Offline

Activity: 140
Merit: 100


View Profile
July 31, 2011, 11:23:32 PM
 #123

I'm using Ubuntu 11.04, Catalyst 11.6, Phoenix 1.50. I unpacked the phatk version 2 files into my phoenix-1.50/kernels/phatk folder.
When I ran my phoenix with kernel options
Code:
-k phatk DEVICE=0 BFI_INT VECTORS AGGRESSION=12 FASTLOOP=FALSE WORKSIZE=256
I got the following error:
Code:
user@computer:~$ sudo ./btcg0.sh
[31/07/2011 18:04:08] Phoenix 1.50 starting...
[31/07/2011 18:04:09] Connected to server
[0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]Unhandled error in Deferred:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 361, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 542, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/user/phoenix-1.50/QueueReader.py", line 136, in preprocess
    d2 = defer.maybeDeferred(self.preprocessor, nr)
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 133, in maybeDeferred
    result = f(*args, **kw)
  File "kernels/phatk/__init__.py", line 167, in <lambda>
    self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr),
  File "kernels/phatk/__init__.py", line 361, in preprocess
    kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION)
  File "kernels/phatk/__init__.py", line 46, in __init__
    unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32)
struct.error: unpack requires a string argument of length 32
I then had to CTRL+Z to kill the process.

Not sure if this error is related to anything discussed before. But it's no big deal, as I've merely switched back to my previous kernel. Cheers!

I get the same error with a similiar setup.

If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
Tx2000
Full Member
***
Offline Offline

Activity: 182
Merit: 100



View Profile
August 01, 2011, 01:34:16 AM
 #124

miner is idle spam in console multiple times a second. Under Windows 7 x64, 11.4 Cat, 2.4 SDK, GuiMiner 2011-07-01. Essentially, does not work.
Phateus (OP)
Newbie
*
Offline Offline

Activity: 52
Merit: 0


View Profile
August 01, 2011, 05:14:36 PM
Last edit: August 01, 2011, 05:27:56 PM by Phateus
 #125

Phat, what is the effect of "LLLL" instead of "IIII" in the .py file? It seems to work even with IIII.

Thanks,
Dia

Nothing, I was trying to fix a bug with low WORKSIZE numbers which results in duplicate hashes (not sure if it is solved yet).  Technically, the values are 32-bit which are "L" values instead of 16-bit "I" values, but python seems to handle both the same.

As for all of the other issues, I think there is an issue with SDK 2.1 with my kernel.  I will try explicitly declaring the rotation constant as uint instead of int (that may fix the problem)
if anyone with SDK 2.1 wants to help out:
change
Code:
#define rot(x, y) amd_bitalign(x, x, (32-y))
#else
#define rot(x, y) rotate(x, y)
#endif
to
Code:
#define rot(x, y) amd_bitalign(x, x, (uint)(32-y))
#else
#define rot(x, y) rotate(x, (uint)(y))
#endif
and
Code:
#define rot2(x, y) rotate(x, y)
to
Code:
#define rot2(x, y) rotate(x, (uint)(y))
If anyone tries this out, let me know if it changes anything.


I've done a few things over the weekend (increased performance another ~.5%) and cleaned up my code a lot, so I will release another version when I figure what is causing some of the issues that people are having...

Diapolo, I know you made some modifications to my kernel to make it compatible with 2.1, are they basically type casting issues like the one above?  If I can't figure it out, I may just make all of the constants uint.

Also, one more thing, does "rotate(x, y)" compile to 1 instruction in SDK 2.1?  Running 2.4, explicitly using amd_bitalign does not improve performance (might be cleaner if I can just use rotate(x, y) regardless of whether BITALIGN is defined).

I was also thinking of possibly just precompiling different versions of the kernel and using them, therefore, you'd be able to use the faster 2.4 kernel even if you use SDK 2.1.  I'm not sure if this is possible, but I will look into it.
Diapolo
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
August 01, 2011, 07:28:40 PM
 #126

I've done a few things over the weekend (increased performance another ~.5%) and cleaned up my code a lot, so I will release another version when I figure what is causing some of the issues that people are having...

Diapolo, I know you made some modifications to my kernel to make it compatible with 2.1, are they basically type casting issues like the one above?  If I can't figure it out, I may just make all of the constants uint.

I really don't understand, why the compiler needs so much help and why one has to use such ugly code to get the best performance ... I hope AMD can optimize the compiler, so that we can use clean and straight forward code. I tried to reorder the comands and did not change the code itself and it saved 3 ALU OPs ... for nothing. that sucks so bad!

The SDK 2.1 compatibilty was achieved via type-casts in front of hex-values in the code. Simply add (u) in front, where you use such values.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Phateus (OP)
Newbie
*
Offline Offline

Activity: 52
Merit: 0


View Profile
August 01, 2011, 08:16:15 PM
 #127

I've done a few things over the weekend (increased performance another ~.5%) and cleaned up my code a lot, so I will release another version when I figure what is causing some of the issues that people are having...

Diapolo, I know you made some modifications to my kernel to make it compatible with 2.1, are they basically type casting issues like the one above?  If I can't figure it out, I may just make all of the constants uint.

I really don't understand, why the compiler needs so much help and why one has to use such ugly code to get the best performance ... I hope AMD can optimize the compiler, so that we can use clean and straight forward code. I tried to reorder the comands and did not change the code itself and it saved 3 ALU OPs ... for nothing. that sucks so bad!

The SDK 2.1 compatibilty was achieved via type-casts in front of hex-values in the code. Simply add (u) in front, where you use such values.

Dia

OMG yeah, I know... They really need to work on the compiler...

I actually work at the US Patent Office and work in instruction processing... VLIW is a fairly new area and there is a lot of new work coming out.. so give it a couple years (sigh)... What you have to remember that compiling VLIW code is extremely complicated (The kernel itself only uses 21 registers) and most of the instructions have to be based solely on the previous instruction.


from Wikipedia [http://en.wikipedia.org/wiki/Very_long_instruction_word]
Quote
As a result, VLIW CPUs offer significant computational power with less hardware complexity (but greater compiler complexity) than is associated with most superscalar CPUs.

As is the case with any novel architectural approach, the concept is only as useful as code generation makes it. That is, the fact that a number of special-purpose instructions are available to facilitate certain complicated operations... is useless if compilers are unable to spot relevant source code constructs and generate target code that duly utilizes the CPU's advanced offerings. Therefore, programmers must be able to express their algorithms in a manner that makes the compiler's task easier.

With all of that said, it would be amazing if you could just write:
Code:
Init1();
for (int n = 0; n != 64; n++)
{
SHARound();
}
Init2();
for (int n = 0; n != 64; n++)
{
SHARound();
}
and let the compiler sort it out...
iopq
Hero Member
*****
Offline Offline

Activity: 658
Merit: 500


View Profile
August 01, 2011, 09:59:10 PM
 #128


change
Code:
#define rot(x, y) amd_bitalign(x, x, (32-y))
#else
#define rot(x, y) rotate(x, y)
#endif
to
Code:
#define rot(x, y) amd_bitalign(x, x, (uint)(32-y))
#else
#define rot(x, y) rotate(x, (uint)(y))
#endif
and
Code:
#define rot2(x, y) rotate(x, y)
to
Code:
#define rot2(x, y) rotate(x, (uint)(y))
If anyone tries this out, let me know if it changes anything.
this works on 2.1 SDK
Phateus (OP)
Newbie
*
Offline Offline

Activity: 52
Merit: 0


View Profile
August 01, 2011, 11:16:49 PM
 #129


change
Code:
#define rot(x, y) amd_bitalign(x, x, (32-y))
#else
#define rot(x, y) rotate(x, y)
#endif
to
Code:
#define rot(x, y) amd_bitalign(x, x, (uint)(32-y))
#else
#define rot(x, y) rotate(x, (uint)(y))
#endif
and
Code:
#define rot2(x, y) rotate(x, y)
to
Code:
#define rot2(x, y) rotate(x, (uint)(y))
If anyone tries this out, let me know if it changes anything.
this works on 2.1 SDK


Awesome, Thanks.  I'll implement the changes and release soon.

On another note, I just was searching through AMD's downloads and the KernelAnalyzer 1.9 just came out today with "Support for AMD APP SDK 2.5."... I think someone said that SDK 2.5 is supposed to support BFI_INT natively, so, maybe we can get some better performance with 2.5 *crosses fingers* Smiley
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
August 02, 2011, 01:21:35 AM
 #130

Quote
I think someone said that SDK 2.5 is supposed to support BFI_INT natively,

sounds like it

Quote
"In SDK 2.5 we are expanding that, along with other optimizations, to generate BFI instructions."

mooo for rent
Diapolo
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
August 02, 2011, 05:10:57 AM
 #131

Quote
I think someone said that SDK 2.5 is supposed to support BFI_INT natively,

sounds like it

Quote
"In SDK 2.5 we are expanding that, along with other optimizations, to generate BFI instructions."

Seems you are wrong (at least for now):

Quote
The optimization has been disabled in the current SDK due to a bug in the implementation that didn't get fixed in time.

By the way is there any official Download link for the KernelAnalyzer 1.9?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
August 02, 2011, 05:13:43 AM
 #132

By the way is there any official Download link for the KernelAnalyzer 1.9?
Dia
http://developer.amd.com/TOOLS/AMDAPPKERNELANALYZER/Pages/default.aspx
http://developer.amd.com/Downloads/AMDAPPKernelAnalyzer-v1.9.1016.msi
Diapolo
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
August 02, 2011, 05:34:27 AM
 #133


Thank you very much! But bad news, I checked phatk 2.0, my old and my new kernel version and all of em use less GPRs. but 1 - 2 ALU OPs more ... SDK 2.5 is a sucker until (again) some optimisations have been done. Phat, how do you order the commands to achieve best performance, are you using the ASM code from KernelAnalyzer or is it trial and error?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
August 02, 2011, 05:36:16 AM
 #134

Quote
Seems you are wrong (at least for now):

read it again.. he is asking if it is in 2.4..he says I read here.. it will be in 2.5.. isnt it already in the current one.. meaning 2.4.. they answer,, no it was disabled in the current one, meaning 2.4 as it wasnt fixed in time.

at least that is how i read it... note the dates of the posts.. they have to be talking about 2.4

mooo for rent
Phateus (OP)
Newbie
*
Offline Offline

Activity: 52
Merit: 0


View Profile
August 02, 2011, 05:41:36 AM
 #135


Thank you very much! But bad news, I checked phatk 2.0, my old and my new kernel version and all of em use less GPRs. but 1 - 2 ALU OPs more ... SDK 2.5 is a sucker until (again) some optimisations have been done. Phat, how do you order the commands to achieve best performance, are you using the ASM code from KernelAnalyzer or is it trial and error?

Dia

edit:  BTW, I always thought your numbers were a couple lower than mine because you defined OUTPUT_MASK as something like "0x10" or something... doing that makes all my numbers match the ones on your thread
lol.... mostly trial and error, Initially, for version 1.1, I looked at filling the gaps in the VLIW assembly (see which VLIW5 only had 4 instructions using barrier(0) instructions to see where in the assembly the OpenCL code is), but that took a LONG time, and I think I am done with that... (it turned out it only gave me like 3 ALU ops anyway).


Quote
Seems you are wrong (at least for now):

read it again.. he is asking if it is in 2.4..he says I read here.. it will be in 2.5.. isnt it already in the current one.. meaning 2.4.. they answer,, no it was disabled in the current one, meaning 2.4 as it wasnt fixed in time.

at least that is how i read it... note the dates of the posts.. they have to be talking about 2.4

Yeah, I said that KernelAnalyzer 1.9 was out today saying that it supports 2.5, but 2.5 isn't out yet... probably tomorrow.


And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link
Diapolo
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
August 02, 2011, 05:45:26 AM
 #136


Thank you very much! But bad news, I checked phatk 2.0, my old and my new kernel version and all of em use less GPRs. but 1 - 2 ALU OPs more ... SDK 2.5 is a sucker until (again) some optimisations have been done. Phat, how do you order the commands to achieve best performance, are you using the ASM code from KernelAnalyzer or is it trial and error?

Dia

lol.... mostly trial and error, Initially, for version 1.1, I looked at filling the gaps in the VLIW assembly (see which VLIW5 only had 4 instructions using barrier(0) instructions to see where in the assembly the OpenCL code is), but that took a LONG time, and I think I am done with that... (it turned out it only gave me like 3 ALU ops anyway).


Quote
Seems you are wrong (at least for now):

read it again.. he is asking if it is in 2.4..he says I read here.. it will be in 2.5.. isnt it already in the current one.. meaning 2.4.. they answer,, no it was disabled in the current one, meaning 2.4 as it wasnt fixed in time.

at least that is how i read it... note the dates of the posts.. they have to be talking about 2.4

Yeah, I said that KernelAnalyzer 1.9 was out today saying that it supports 2.5, but 2.5 isn't out yet... probably tomorrow.


And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link

Cat 11.8 preview and Cat 11.7 have the SDK 2.5 runtime, so my tests are real :-/.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Phateus (OP)
Newbie
*
Offline Offline

Activity: 52
Merit: 0


View Profile
August 02, 2011, 05:46:45 AM
 #137

oooh, I will have to try that out... boo for AMD
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
August 02, 2011, 05:55:32 AM
 #138

Quote
And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link

I'm still getting miner idle errors in guiminer  with  VECTORS BFI_INT -k phatk FASTLOOP=false WORKSIZE=256 AGGRESSION=12 -q2

is it just guiminer?

edit:works fine with aoclbf 1.75.. i wonder why guiminer has such trouble

speed 318 over 315 with diablo 7-17

mooo for rent
Diapolo
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
August 02, 2011, 05:56:07 AM
 #139

You are using the OpenCL rotate() instead of amd_bitalign(), what's the benefit here (is it the same under the hood)?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
pennytrader
Sr. Member
****
Offline Offline

Activity: 254
Merit: 250


View Profile
August 02, 2011, 05:57:59 AM
 #140

With catalyst 11.6 + SDK 2.1, 975/300 setting, I'm only getting 176 mhs with phatk 2.1

With Diapolo's kernel, I was able to get 314 mhs

please donate to 1P3m2resGCP2o2sFX324DP1mfqHgGPA8BL
Pages: « 1 2 3 4 5 6 [7] 8 9 10 11 12 13 14 15 16 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!