Bitcoin Forum
April 19, 2024, 01:44:22 PM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 [14] 15 16 17 18 19 20 »  All
  Print  
Author Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels  (Read 61211 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
February 27, 2015, 10:34:32 PM
 #261

I believe multiple threads help with algos which use gpu ram: groestl does not. Only WS and intensity matter. TC is a buffer in ram so not relevant as well.

Activity + Trust + Earned Merit == The Most Recognized Users on Bitcointalk
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
February 27, 2015, 10:41:24 PM
 #262

OK that makes sense Smiley
So I can still play with WS in your new OCL? (I think WS may be card specific tuning).
HR
Legendary
*
Offline Offline

Activity: 1176
Merit: 1011


Transparency & Integrity


View Profile
February 28, 2015, 09:35:46 AM
 #263


Pallas,

Are you planning on adding myriad-groestl support in the future? If not, could you explain why not? Is it because your groestl kernel is already faster than the myriad-groestl?

Also, are you planning on putting your work on github? Again, if not, could you explain why not?

It seems to me that both are important ways to further your efforts and establish your reputation.

Best regards as always.

HR

pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
February 28, 2015, 09:51:37 AM
 #264

Myr-groestl: If there is interest and I can get some free time I'd love to do it :-)
Github: for a couple files it's not worth, IMHO

utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
February 28, 2015, 10:41:20 AM
 #265

Myr-groestl: If there is interest and I can get some free time I'd love to do it :-)
Github: for a couple files it's not worth, IMHO
There is interest, I am leaving DMD groestl as I am sick of crap there.  HR has convinced me to move to Digibyte which is myriad-groestl?  IDK yet I have to d/l wallet and blockchain ...
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
February 28, 2015, 11:29:29 AM
 #266

Myr-groestl: If there is interest and I can get some free time I'd love to do it :-)
Github: for a couple files it's not worth, IMHO
There is interest, I am leaving DMD groestl as I am sick of crap there.  HR has convinced me to move to Digibyte which is myriad-groestl?  IDK yet I have to d/l wallet and blockchain ...

I think it's multi-algo including myr-groestl, skein etc.

Heavyiron
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
March 01, 2015, 07:06:44 PM
 #267

Hello pallas and thank you for your kernel.
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series. But it is not adapted for sgminer and I have no skills to do this job.

Code:
#define CONSTANT __constant
#define LOCAL __local
#define GLOBAL __global
#define RESTRICT restrict
#define GLOBALID (uint)(get_global_id(0))
#define LOCALID get_local_id(0)

#define EXT_BYTE32_0(n) ((uint)(as_uchar4((uint)(n)).x))
#define EXT_BYTE32_1(n) ((uint)(as_uchar4((uint)(n)).y))
#define EXT_BYTE32_2(n) ((uint)(as_uchar4((uint)(n)).z))
#define EXT_BYTE32_3(n) ((uint)(as_uchar4((uint)(n)).w))

#define groestl_EXT_BYTE_0(n) EXT_BYTE32_0(n)
#define groestl_EXT_BYTE_1(n) EXT_BYTE32_1(n)
#define groestl_EXT_BYTE_2(n) EXT_BYTE32_2(n)
#define groestl_EXT_BYTE_3(n) EXT_BYTE32_3(n)


#define groestl_PMIX(src, dst, r)\
src[ 0] ^= (r);\
src[ 2] ^= 0x00000010u^(r);\
src[ 4] ^= 0x00000020u^(r);\
src[ 6] ^= 0x00000030u^(r);\
src[ 8] ^= 0x00000040u^(r);\
src[10] ^= 0x00000050u^(r);\
src[12] ^= 0x00000060u^(r);\
src[14] ^= 0x00000070u^(r);\
src[16] ^= 0x00000080u^(r);\
src[18] ^= 0x00000090u^(r);\
src[20] ^= 0x000000a0u^(r);\
src[22] ^= 0x000000b0u^(r);\
src[24] ^= 0x000000c0u^(r);\
src[26] ^= 0x000000d0u^(r);\
src[28] ^= 0x000000e0u^(r);\
src[30] ^= 0x000000f0u^(r);\
dst[ 0]  = groestl_T0[groestl_EXT_BYTE_0(src[ 0])];\
dst[ 1]  = groestl_T0[groestl_EXT_BYTE_0(src[ 9])];\
dst[ 2]  = groestl_T0[groestl_EXT_BYTE_0(src[ 2])];\
dst[ 3]  = groestl_T0[groestl_EXT_BYTE_0(src[11])];\
dst[ 4]  = groestl_T0[groestl_EXT_BYTE_0(src[ 4])];\
dst[ 5]  = groestl_T0[groestl_EXT_BYTE_0(src[13])];\
dst[ 6]  = groestl_T0[groestl_EXT_BYTE_0(src[ 6])];\
dst[ 7]  = groestl_T0[groestl_EXT_BYTE_0(src[15])];\
dst[ 8]  = groestl_T0[groestl_EXT_BYTE_0(src[ 8])];\
dst[ 9]  = groestl_T0[groestl_EXT_BYTE_0(src[17])];\
dst[10]  = groestl_T0[groestl_EXT_BYTE_0(src[10])];\
dst[11]  = groestl_T0[groestl_EXT_BYTE_0(src[19])];\
dst[12]  = groestl_T0[groestl_EXT_BYTE_0(src[12])];\
dst[13]  = groestl_T0[groestl_EXT_BYTE_0(src[21])];\
dst[14]  = groestl_T0[groestl_EXT_BYTE_0(src[14])];\
dst[15]  = groestl_T0[groestl_EXT_BYTE_0(src[23])];\
dst[16]  = groestl_T0[groestl_EXT_BYTE_0(src[16])];\
dst[17]  = groestl_T0[groestl_EXT_BYTE_0(src[25])];\
dst[18]  = groestl_T0[groestl_EXT_BYTE_0(src[18])];\
dst[19]  = groestl_T0[groestl_EXT_BYTE_0(src[27])];\
dst[20]  = groestl_T0[groestl_EXT_BYTE_0(src[20])];\
dst[21]  = groestl_T0[groestl_EXT_BYTE_0(src[29])];\
dst[22]  = groestl_T0[groestl_EXT_BYTE_0(src[22])];\
dst[23]  = groestl_T0[groestl_EXT_BYTE_0(src[31])];\
dst[24]  = groestl_T0[groestl_EXT_BYTE_0(src[24])];\
dst[25]  = groestl_T0[groestl_EXT_BYTE_0(src[ 1])];\
dst[26]  = groestl_T0[groestl_EXT_BYTE_0(src[26])];\
dst[27]  = groestl_T0[groestl_EXT_BYTE_0(src[ 3])];\
dst[28]  = groestl_T0[groestl_EXT_BYTE_0(src[28])];\
dst[29]  = groestl_T0[groestl_EXT_BYTE_0(src[ 5])];\
dst[30]  = groestl_T0[groestl_EXT_BYTE_0(src[30])];\
dst[31]  = groestl_T0[groestl_EXT_BYTE_0(src[ 7])];\
dst[ 0] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 2])];\
dst[ 1] ^= groestl_T1[groestl_EXT_BYTE_1(src[11])];\
dst[ 2] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 4])];\
dst[ 3] ^= groestl_T1[groestl_EXT_BYTE_1(src[13])];\
dst[ 4] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 6])];\
dst[ 5] ^= groestl_T1[groestl_EXT_BYTE_1(src[15])];\
dst[ 6] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 8])];\
dst[ 7] ^= groestl_T1[groestl_EXT_BYTE_1(src[17])];\
dst[ 8] ^= groestl_T1[groestl_EXT_BYTE_1(src[10])];\
dst[ 9] ^= groestl_T1[groestl_EXT_BYTE_1(src[19])];\
dst[10] ^= groestl_T1[groestl_EXT_BYTE_1(src[12])];\
dst[11] ^= groestl_T1[groestl_EXT_BYTE_1(src[21])];\
dst[12] ^= groestl_T1[groestl_EXT_BYTE_1(src[14])];\
dst[13] ^= groestl_T1[groestl_EXT_BYTE_1(src[23])];\
dst[14] ^= groestl_T1[groestl_EXT_BYTE_1(src[16])];\
dst[15] ^= groestl_T1[groestl_EXT_BYTE_1(src[25])];\
dst[16] ^= groestl_T1[groestl_EXT_BYTE_1(src[18])];\
dst[17] ^= groestl_T1[groestl_EXT_BYTE_1(src[27])];\
dst[18] ^= groestl_T1[groestl_EXT_BYTE_1(src[20])];\
dst[19] ^= groestl_T1[groestl_EXT_BYTE_1(src[29])];\
dst[20] ^= groestl_T1[groestl_EXT_BYTE_1(src[22])];\
dst[21] ^= groestl_T1[groestl_EXT_BYTE_1(src[31])];\
dst[22] ^= groestl_T1[groestl_EXT_BYTE_1(src[24])];\
dst[23] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 1])];\
dst[24] ^= groestl_T1[groestl_EXT_BYTE_1(src[26])];\
dst[25] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 3])];\
dst[26] ^= groestl_T1[groestl_EXT_BYTE_1(src[28])];\
dst[27] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 5])];\
dst[28] ^= groestl_T1[groestl_EXT_BYTE_1(src[30])];\
dst[29] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 7])];\
dst[30] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 0])];\
dst[31] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 9])];\
dst[ 0] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 4])];\
dst[ 1] ^= groestl_T2[groestl_EXT_BYTE_2(src[13])];\
dst[ 2] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 6])];\
dst[ 3] ^= groestl_T2[groestl_EXT_BYTE_2(src[15])];\
dst[ 4] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 8])];\
dst[ 5] ^= groestl_T2[groestl_EXT_BYTE_2(src[17])];\
dst[ 6] ^= groestl_T2[groestl_EXT_BYTE_2(src[10])];\
dst[ 7] ^= groestl_T2[groestl_EXT_BYTE_2(src[19])];\
dst[ 8] ^= groestl_T2[groestl_EXT_BYTE_2(src[12])];\
dst[ 9] ^= groestl_T2[groestl_EXT_BYTE_2(src[21])];\
dst[10] ^= groestl_T2[groestl_EXT_BYTE_2(src[14])];\
dst[11] ^= groestl_T2[groestl_EXT_BYTE_2(src[23])];\
dst[12] ^= groestl_T2[groestl_EXT_BYTE_2(src[16])];\
dst[13] ^= groestl_T2[groestl_EXT_BYTE_2(src[25])];\
dst[14] ^= groestl_T2[groestl_EXT_BYTE_2(src[18])];\
dst[15] ^= groestl_T2[groestl_EXT_BYTE_2(src[27])];\
dst[16] ^= groestl_T2[groestl_EXT_BYTE_2(src[20])];\
dst[17] ^= groestl_T2[groestl_EXT_BYTE_2(src[29])];\
dst[18] ^= groestl_T2[groestl_EXT_BYTE_2(src[22])];\
dst[19] ^= groestl_T2[groestl_EXT_BYTE_2(src[31])];\
dst[20] ^= groestl_T2[groestl_EXT_BYTE_2(src[24])];\
dst[21] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 1])];\
dst[22] ^= groestl_T2[groestl_EXT_BYTE_2(src[26])];\
dst[23] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 3])];\
dst[24] ^= groestl_T2[groestl_EXT_BYTE_2(src[28])];\
dst[25] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 5])];\
dst[26] ^= groestl_T2[groestl_EXT_BYTE_2(src[30])];\
dst[27] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 7])];\
dst[28] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 0])];\
dst[29] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 9])];\
dst[30] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 2])];\
dst[31] ^= groestl_T2[groestl_EXT_BYTE_2(src[11])];\
dst[ 0] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 6])];\
dst[ 1] ^= groestl_T3[groestl_EXT_BYTE_3(src[23])];\
dst[ 2] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 8])];\
dst[ 3] ^= groestl_T3[groestl_EXT_BYTE_3(src[25])];\
dst[ 4] ^= groestl_T3[groestl_EXT_BYTE_3(src[10])];\
dst[ 5] ^= groestl_T3[groestl_EXT_BYTE_3(src[27])];\
dst[ 6] ^= groestl_T3[groestl_EXT_BYTE_3(src[12])];\
dst[ 7] ^= groestl_T3[groestl_EXT_BYTE_3(src[29])];\
dst[ 8] ^= groestl_T3[groestl_EXT_BYTE_3(src[14])];\
dst[ 9] ^= groestl_T3[groestl_EXT_BYTE_3(src[31])];\
dst[10] ^= groestl_T3[groestl_EXT_BYTE_3(src[16])];\
dst[11] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 1])];\
dst[12] ^= groestl_T3[groestl_EXT_BYTE_3(src[18])];\
dst[13] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 3])];\
dst[14] ^= groestl_T3[groestl_EXT_BYTE_3(src[20])];\
dst[15] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 5])];\
dst[16] ^= groestl_T3[groestl_EXT_BYTE_3(src[22])];\
dst[17] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 7])];\
dst[18] ^= groestl_T3[groestl_EXT_BYTE_3(src[24])];\
dst[19] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 9])];\
dst[20] ^= groestl_T3[groestl_EXT_BYTE_3(src[26])];\
dst[21] ^= groestl_T3[groestl_EXT_BYTE_3(src[11])];\
dst[22] ^= groestl_T3[groestl_EXT_BYTE_3(src[28])];\
dst[23] ^= groestl_T3[groestl_EXT_BYTE_3(src[13])];\
dst[24] ^= groestl_T3[groestl_EXT_BYTE_3(src[30])];\
dst[25] ^= groestl_T3[groestl_EXT_BYTE_3(src[15])];\
dst[26] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 0])];\
dst[27] ^= groestl_T3[groestl_EXT_BYTE_3(src[17])];\
dst[28] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 2])];\
dst[29] ^= groestl_T3[groestl_EXT_BYTE_3(src[19])];\
dst[30] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 4])];\
dst[31] ^= groestl_T3[groestl_EXT_BYTE_3(src[21])];\
dst[ 0] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 9])];\
dst[ 1] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 0])];\
dst[ 2] ^= groestl_T4[groestl_EXT_BYTE_0(src[11])];\
dst[ 3] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 2])];\
dst[ 4] ^= groestl_T4[groestl_EXT_BYTE_0(src[13])];\
dst[ 5] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 4])];\
dst[ 6] ^= groestl_T4[groestl_EXT_BYTE_0(src[15])];\
dst[ 7] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 6])];\
dst[ 8] ^= groestl_T4[groestl_EXT_BYTE_0(src[17])];\
dst[ 9] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 8])];\
dst[10] ^= groestl_T4[groestl_EXT_BYTE_0(src[19])];\
dst[11] ^= groestl_T4[groestl_EXT_BYTE_0(src[10])];\
dst[12] ^= groestl_T4[groestl_EXT_BYTE_0(src[21])];\
dst[13] ^= groestl_T4[groestl_EXT_BYTE_0(src[12])];\
dst[14] ^= groestl_T4[groestl_EXT_BYTE_0(src[23])];\
dst[15] ^= groestl_T4[groestl_EXT_BYTE_0(src[14])];\
dst[16] ^= groestl_T4[groestl_EXT_BYTE_0(src[25])];\
dst[17] ^= groestl_T4[groestl_EXT_BYTE_0(src[16])];\
dst[18] ^= groestl_T4[groestl_EXT_BYTE_0(src[27])];\
dst[19] ^= groestl_T4[groestl_EXT_BYTE_0(src[18])];\
dst[20] ^= groestl_T4[groestl_EXT_BYTE_0(src[29])];\
dst[21] ^= groestl_T4[groestl_EXT_BYTE_0(src[20])];\
dst[22] ^= groestl_T4[groestl_EXT_BYTE_0(src[31])];\
dst[23] ^= groestl_T4[groestl_EXT_BYTE_0(src[22])];\
dst[24] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 1])];\
dst[25] ^= groestl_T4[groestl_EXT_BYTE_0(src[24])];\
dst[26] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 3])];\
dst[27] ^= groestl_T4[groestl_EXT_BYTE_0(src[26])];\
dst[28] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 5])];\
dst[29] ^= groestl_T4[groestl_EXT_BYTE_0(src[28])];\
dst[30] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 7])];\
dst[31] ^= groestl_T4[groestl_EXT_BYTE_0(src[30])];\
dst[ 0] ^= groestl_T5[groestl_EXT_BYTE_1(src[11])];\
dst[ 1] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 2])];\
dst[ 2] ^= groestl_T5[groestl_EXT_BYTE_1(src[13])];\
dst[ 3] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 4])];\
dst[ 4] ^= groestl_T5[groestl_EXT_BYTE_1(src[15])];\
dst[ 5] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 6])];\
dst[ 6] ^= groestl_T5[groestl_EXT_BYTE_1(src[17])];\
dst[ 7] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 8])];\
dst[ 8] ^= groestl_T5[groestl_EXT_BYTE_1(src[19])];\
dst[ 9] ^= groestl_T5[groestl_EXT_BYTE_1(src[10])];\
dst[10] ^= groestl_T5[groestl_EXT_BYTE_1(src[21])];\
dst[11] ^= groestl_T5[groestl_EXT_BYTE_1(src[12])];\
dst[12] ^= groestl_T5[groestl_EXT_BYTE_1(src[23])];\
dst[13] ^= groestl_T5[groestl_EXT_BYTE_1(src[14])];\
dst[14] ^= groestl_T5[groestl_EXT_BYTE_1(src[25])];\
dst[15] ^= groestl_T5[groestl_EXT_BYTE_1(src[16])];\
dst[16] ^= groestl_T5[groestl_EXT_BYTE_1(src[27])];\
dst[17] ^= groestl_T5[groestl_EXT_BYTE_1(src[18])];\
dst[18] ^= groestl_T5[groestl_EXT_BYTE_1(src[29])];\
dst[19] ^= groestl_T5[groestl_EXT_BYTE_1(src[20])];\
dst[20] ^= groestl_T5[groestl_EXT_BYTE_1(src[31])];\
dst[21] ^= groestl_T5[groestl_EXT_BYTE_1(src[22])];\
dst[22] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 1])];\
dst[23] ^= groestl_T5[groestl_EXT_BYTE_1(src[24])];\
dst[24] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 3])];\
dst[25] ^= groestl_T5[groestl_EXT_BYTE_1(src[26])];\
dst[26] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 5])];\
dst[27] ^= groestl_T5[groestl_EXT_BYTE_1(src[28])];\
dst[28] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 7])];\
dst[29] ^= groestl_T5[groestl_EXT_BYTE_1(src[30])];\
dst[30] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 9])];\
dst[31] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 0])];\
dst[ 0] ^= groestl_T6[groestl_EXT_BYTE_2(src[13])];\
dst[ 1] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 4])];\
dst[ 2] ^= groestl_T6[groestl_EXT_BYTE_2(src[15])];\
dst[ 3] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 6])];\
dst[ 4] ^= groestl_T6[groestl_EXT_BYTE_2(src[17])];\
dst[ 5] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 8])];\
dst[ 6] ^= groestl_T6[groestl_EXT_BYTE_2(src[19])];\
dst[ 7] ^= groestl_T6[groestl_EXT_BYTE_2(src[10])];\
dst[ 8] ^= groestl_T6[groestl_EXT_BYTE_2(src[21])];\
dst[ 9] ^= groestl_T6[groestl_EXT_BYTE_2(src[12])];\
dst[10] ^= groestl_T6[groestl_EXT_BYTE_2(src[23])];\
dst[11] ^= groestl_T6[groestl_EXT_BYTE_2(src[14])];\
dst[12] ^= groestl_T6[groestl_EXT_BYTE_2(src[25])];\
dst[13] ^= groestl_T6[groestl_EXT_BYTE_2(src[16])];\
dst[14] ^= groestl_T6[groestl_EXT_BYTE_2(src[27])];\
dst[15] ^= groestl_T6[groestl_EXT_BYTE_2(src[18])];\
dst[16] ^= groestl_T6[groestl_EXT_BYTE_2(src[29])];\
dst[17] ^= groestl_T6[groestl_EXT_BYTE_2(src[20])];\
dst[18] ^= groestl_T6[groestl_EXT_BYTE_2(src[31])];\
dst[19] ^= groestl_T6[groestl_EXT_BYTE_2(src[22])];\
dst[20] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 1])];\
dst[21] ^= groestl_T6[groestl_EXT_BYTE_2(src[24])];\
dst[22] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 3])];\
dst[23] ^= groestl_T6[groestl_EXT_BYTE_2(src[26])];\
dst[24] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 5])];\
dst[25] ^= groestl_T6[groestl_EXT_BYTE_2(src[28])];\
dst[26] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 7])];\
dst[27] ^= groestl_T6[groestl_EXT_BYTE_2(src[30])];\
dst[28] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 9])];\
dst[29] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 0])];\
dst[30] ^= groestl_T6[groestl_EXT_BYTE_2(src[11])];\
dst[31] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 2])];\
dst[ 0] ^= groestl_T7[groestl_EXT_BYTE_3(src[23])];\
dst[ 1] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 6])];\
dst[ 2] ^= groestl_T7[groestl_EXT_BYTE_3(src[25])];\
dst[ 3] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 8])];\
dst[ 4] ^= groestl_T7[groestl_EXT_BYTE_3(src[27])];\
dst[ 5] ^= groestl_T7[groestl_EXT_BYTE_3(src[10])];\
dst[ 6] ^= groestl_T7[groestl_EXT_BYTE_3(src[29])];\
dst[ 7] ^= groestl_T7[groestl_EXT_BYTE_3(src[12])];\
dst[ 8] ^= groestl_T7[groestl_EXT_BYTE_3(src[31])];\
dst[ 9] ^= groestl_T7[groestl_EXT_BYTE_3(src[14])];\
dst[10] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 1])];\
dst[11] ^= groestl_T7[groestl_EXT_BYTE_3(src[16])];\
dst[12] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 3])];\
dst[13] ^= groestl_T7[groestl_EXT_BYTE_3(src[18])];\
dst[14] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 5])];\
dst[15] ^= groestl_T7[groestl_EXT_BYTE_3(src[20])];\
dst[16] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 7])];\
dst[17] ^= groestl_T7[groestl_EXT_BYTE_3(src[22])];\
dst[18] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 9])];\
dst[19] ^= groestl_T7[groestl_EXT_BYTE_3(src[24])];\
dst[20] ^= groestl_T7[groestl_EXT_BYTE_3(src[11])];\
dst[21] ^= groestl_T7[groestl_EXT_BYTE_3(src[26])];\
dst[22] ^= groestl_T7[groestl_EXT_BYTE_3(src[13])];\
dst[23] ^= groestl_T7[groestl_EXT_BYTE_3(src[28])];\
dst[24] ^= groestl_T7[groestl_EXT_BYTE_3(src[15])];\
dst[25] ^= groestl_T7[groestl_EXT_BYTE_3(src[30])];\
dst[26] ^= groestl_T7[groestl_EXT_BYTE_3(src[17])];\
dst[27] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 0])];\
dst[28] ^= groestl_T7[groestl_EXT_BYTE_3(src[19])];\
dst[29] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 2])];\
dst[30] ^= groestl_T7[groestl_EXT_BYTE_3(src[21])];\
dst[31] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 4])];

#define groestl_QMIX(src, dst, r)\
src[ 0] = ~src[ 0];\
src[ 1] ^= ~(r);\
src[ 2] = ~src[ 2];\
src[ 3] ^= 0xefffffffu^(r);\
src[ 4] = ~src[ 4];\
src[ 5] ^= 0xdfffffffu^(r);\
src[ 6] = ~src[ 6];\
src[ 7] ^= 0xcfffffffu^(r);\
src[ 8] = ~src[ 8];\
src[ 9] ^= 0xbfffffffu^(r);\
src[10] = ~src[10];\
src[11] ^= 0xafffffffu^(r);\
src[12] = ~src[12];\
src[13] ^= 0x9fffffffu^(r);\
src[14] = ~src[14];\
src[15] ^= 0x8fffffffu^(r);\
src[16] = ~src[16];\
src[17] ^= 0x7fffffffu^(r);\
src[18] = ~src[18];\
src[19] ^= 0x6fffffffu^(r);\
src[20] = ~src[20];\
src[21] ^= 0x5fffffffu^(r);\
src[22] = ~src[22];\
src[23] ^= 0x4fffffffu^(r);\
src[24] = ~src[24];\
src[25] ^= 0x3fffffffu^(r);\
src[26] = ~src[26];\
src[27] ^= 0x2fffffffu^(r);\
src[28] = ~src[28];\
src[29] ^= 0x1fffffffu^(r);\
src[30] = ~src[30];\
src[31] ^= 0x0fffffffu^(r);\
dst[ 0]  = groestl_T0[groestl_EXT_BYTE_0(src[ 2])];\
dst[ 1]  = groestl_T0[groestl_EXT_BYTE_0(src[ 1])];\
dst[ 2]  = groestl_T0[groestl_EXT_BYTE_0(src[ 4])];\
dst[ 3]  = groestl_T0[groestl_EXT_BYTE_0(src[ 3])];\
dst[ 4]  = groestl_T0[groestl_EXT_BYTE_0(src[ 6])];\
dst[ 5]  = groestl_T0[groestl_EXT_BYTE_0(src[ 5])];\
dst[ 6]  = groestl_T0[groestl_EXT_BYTE_0(src[ 8])];\
dst[ 7]  = groestl_T0[groestl_EXT_BYTE_0(src[ 7])];\
dst[ 8]  = groestl_T0[groestl_EXT_BYTE_0(src[10])];\
dst[ 9]  = groestl_T0[groestl_EXT_BYTE_0(src[ 9])];\
dst[10]  = groestl_T0[groestl_EXT_BYTE_0(src[12])];\
dst[11]  = groestl_T0[groestl_EXT_BYTE_0(src[11])];\
dst[12]  = groestl_T0[groestl_EXT_BYTE_0(src[14])];\
dst[13]  = groestl_T0[groestl_EXT_BYTE_0(src[13])];\
dst[14]  = groestl_T0[groestl_EXT_BYTE_0(src[16])];\
dst[15]  = groestl_T0[groestl_EXT_BYTE_0(src[15])];\
dst[16]  = groestl_T0[groestl_EXT_BYTE_0(src[18])];\
dst[17]  = groestl_T0[groestl_EXT_BYTE_0(src[17])];\
dst[18]  = groestl_T0[groestl_EXT_BYTE_0(src[20])];\
dst[19]  = groestl_T0[groestl_EXT_BYTE_0(src[19])];\
dst[20]  = groestl_T0[groestl_EXT_BYTE_0(src[22])];\
dst[21]  = groestl_T0[groestl_EXT_BYTE_0(src[21])];\
dst[22]  = groestl_T0[groestl_EXT_BYTE_0(src[24])];\
dst[23]  = groestl_T0[groestl_EXT_BYTE_0(src[23])];\
dst[24]  = groestl_T0[groestl_EXT_BYTE_0(src[26])];\
dst[25]  = groestl_T0[groestl_EXT_BYTE_0(src[25])];\
dst[26]  = groestl_T0[groestl_EXT_BYTE_0(src[28])];\
dst[27]  = groestl_T0[groestl_EXT_BYTE_0(src[27])];\
dst[28]  = groestl_T0[groestl_EXT_BYTE_0(src[30])];\
dst[29]  = groestl_T0[groestl_EXT_BYTE_0(src[29])];\
dst[30]  = groestl_T0[groestl_EXT_BYTE_0(src[ 0])];\
dst[31]  = groestl_T0[groestl_EXT_BYTE_0(src[31])];\
dst[ 0] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 6])];\
dst[ 1] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 5])];\
dst[ 2] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 8])];\
dst[ 3] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 7])];\
dst[ 4] ^= groestl_T1[groestl_EXT_BYTE_1(src[10])];\
dst[ 5] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 9])];\
dst[ 6] ^= groestl_T1[groestl_EXT_BYTE_1(src[12])];\
dst[ 7] ^= groestl_T1[groestl_EXT_BYTE_1(src[11])];\
dst[ 8] ^= groestl_T1[groestl_EXT_BYTE_1(src[14])];\
dst[ 9] ^= groestl_T1[groestl_EXT_BYTE_1(src[13])];\
dst[10] ^= groestl_T1[groestl_EXT_BYTE_1(src[16])];\
dst[11] ^= groestl_T1[groestl_EXT_BYTE_1(src[15])];\
dst[12] ^= groestl_T1[groestl_EXT_BYTE_1(src[18])];\
dst[13] ^= groestl_T1[groestl_EXT_BYTE_1(src[17])];\
dst[14] ^= groestl_T1[groestl_EXT_BYTE_1(src[20])];\
dst[15] ^= groestl_T1[groestl_EXT_BYTE_1(src[19])];\
dst[16] ^= groestl_T1[groestl_EXT_BYTE_1(src[22])];\
dst[17] ^= groestl_T1[groestl_EXT_BYTE_1(src[21])];\
dst[18] ^= groestl_T1[groestl_EXT_BYTE_1(src[24])];\
dst[19] ^= groestl_T1[groestl_EXT_BYTE_1(src[23])];\
dst[20] ^= groestl_T1[groestl_EXT_BYTE_1(src[26])];\
dst[21] ^= groestl_T1[groestl_EXT_BYTE_1(src[25])];\
dst[22] ^= groestl_T1[groestl_EXT_BYTE_1(src[28])];\
dst[23] ^= groestl_T1[groestl_EXT_BYTE_1(src[27])];\
dst[24] ^= groestl_T1[groestl_EXT_BYTE_1(src[30])];\
dst[25] ^= groestl_T1[groestl_EXT_BYTE_1(src[29])];\
dst[26] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 0])];\
dst[27] ^= groestl_T1[groestl_EXT_BYTE_1(src[31])];\
dst[28] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 2])];\
dst[29] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 1])];\
dst[30] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 4])];\
dst[31] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 3])];\
dst[ 0] ^= groestl_T2[groestl_EXT_BYTE_2(src[10])];\
dst[ 1] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 9])];\
dst[ 2] ^= groestl_T2[groestl_EXT_BYTE_2(src[12])];\
dst[ 3] ^= groestl_T2[groestl_EXT_BYTE_2(src[11])];\
dst[ 4] ^= groestl_T2[groestl_EXT_BYTE_2(src[14])];\
dst[ 5] ^= groestl_T2[groestl_EXT_BYTE_2(src[13])];\
dst[ 6] ^= groestl_T2[groestl_EXT_BYTE_2(src[16])];\
dst[ 7] ^= groestl_T2[groestl_EXT_BYTE_2(src[15])];\
dst[ 8] ^= groestl_T2[groestl_EXT_BYTE_2(src[18])];\
dst[ 9] ^= groestl_T2[groestl_EXT_BYTE_2(src[17])];\
dst[10] ^= groestl_T2[groestl_EXT_BYTE_2(src[20])];\
dst[11] ^= groestl_T2[groestl_EXT_BYTE_2(src[19])];\
dst[12] ^= groestl_T2[groestl_EXT_BYTE_2(src[22])];\
dst[13] ^= groestl_T2[groestl_EXT_BYTE_2(src[21])];\
dst[14] ^= groestl_T2[groestl_EXT_BYTE_2(src[24])];\
dst[15] ^= groestl_T2[groestl_EXT_BYTE_2(src[23])];\
dst[16] ^= groestl_T2[groestl_EXT_BYTE_2(src[26])];\
dst[17] ^= groestl_T2[groestl_EXT_BYTE_2(src[25])];\
dst[18] ^= groestl_T2[groestl_EXT_BYTE_2(src[28])];\
dst[19] ^= groestl_T2[groestl_EXT_BYTE_2(src[27])];\
dst[20] ^= groestl_T2[groestl_EXT_BYTE_2(src[30])];\
dst[21] ^= groestl_T2[groestl_EXT_BYTE_2(src[29])];\
dst[22] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 0])];\
dst[23] ^= groestl_T2[groestl_EXT_BYTE_2(src[31])];\
dst[24] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 2])];\
dst[25] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 1])];\
dst[26] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 4])];\
dst[27] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 3])];\
dst[28] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 6])];\
dst[29] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 5])];\
dst[30] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 8])];\
dst[31] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 7])];\
dst[ 0] ^= groestl_T3[groestl_EXT_BYTE_3(src[22])];\
dst[ 1] ^= groestl_T3[groestl_EXT_BYTE_3(src[13])];\
dst[ 2] ^= groestl_T3[groestl_EXT_BYTE_3(src[24])];\
dst[ 3] ^= groestl_T3[groestl_EXT_BYTE_3(src[15])];\
dst[ 4] ^= groestl_T3[groestl_EXT_BYTE_3(src[26])];\
dst[ 5] ^= groestl_T3[groestl_EXT_BYTE_3(src[17])];\
dst[ 6] ^= groestl_T3[groestl_EXT_BYTE_3(src[28])];\
dst[ 7] ^= groestl_T3[groestl_EXT_BYTE_3(src[19])];\
dst[ 8] ^= groestl_T3[groestl_EXT_BYTE_3(src[30])];\
dst[ 9] ^= groestl_T3[groestl_EXT_BYTE_3(src[21])];\
dst[10] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 0])];\
dst[11] ^= groestl_T3[groestl_EXT_BYTE_3(src[23])];\
dst[12] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 2])];\
dst[13] ^= groestl_T3[groestl_EXT_BYTE_3(src[25])];\
dst[14] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 4])];\
dst[15] ^= groestl_T3[groestl_EXT_BYTE_3(src[27])];\
dst[16] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 6])];\
dst[17] ^= groestl_T3[groestl_EXT_BYTE_3(src[29])];\
dst[18] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 8])];\
dst[19] ^= groestl_T3[groestl_EXT_BYTE_3(src[31])];\
dst[20] ^= groestl_T3[groestl_EXT_BYTE_3(src[10])];\
dst[21] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 1])];\
dst[22] ^= groestl_T3[groestl_EXT_BYTE_3(src[12])];\
dst[23] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 3])];\
dst[24] ^= groestl_T3[groestl_EXT_BYTE_3(src[14])];\
dst[25] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 5])];\
dst[26] ^= groestl_T3[groestl_EXT_BYTE_3(src[16])];\
dst[27] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 7])];\
dst[28] ^= groestl_T3[groestl_EXT_BYTE_3(src[18])];\
dst[29] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 9])];\
dst[30] ^= groestl_T3[groestl_EXT_BYTE_3(src[20])];\
dst[31] ^= groestl_T3[groestl_EXT_BYTE_3(src[11])];\
dst[ 0] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 1])];\
dst[ 1] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 2])];\
dst[ 2] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 3])];\
dst[ 3] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 4])];\
dst[ 4] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 5])];\
dst[ 5] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 6])];\
dst[ 6] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 7])];\
dst[ 7] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 8])];\
dst[ 8] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 9])];\
dst[ 9] ^= groestl_T4[groestl_EXT_BYTE_0(src[10])];\
dst[10] ^= groestl_T4[groestl_EXT_BYTE_0(src[11])];\
dst[11] ^= groestl_T4[groestl_EXT_BYTE_0(src[12])];\
dst[12] ^= groestl_T4[groestl_EXT_BYTE_0(src[13])];\
dst[13] ^= groestl_T4[groestl_EXT_BYTE_0(src[14])];\
dst[14] ^= groestl_T4[groestl_EXT_BYTE_0(src[15])];\
dst[15] ^= groestl_T4[groestl_EXT_BYTE_0(src[16])];\
dst[16] ^= groestl_T4[groestl_EXT_BYTE_0(src[17])];\
dst[17] ^= groestl_T4[groestl_EXT_BYTE_0(src[18])];\
dst[18] ^= groestl_T4[groestl_EXT_BYTE_0(src[19])];\
dst[19] ^= groestl_T4[groestl_EXT_BYTE_0(src[20])];\
dst[20] ^= groestl_T4[groestl_EXT_BYTE_0(src[21])];\
dst[21] ^= groestl_T4[groestl_EXT_BYTE_0(src[22])];\
dst[22] ^= groestl_T4[groestl_EXT_BYTE_0(src[23])];\
dst[23] ^= groestl_T4[groestl_EXT_BYTE_0(src[24])];\
dst[24] ^= groestl_T4[groestl_EXT_BYTE_0(src[25])];\
dst[25] ^= groestl_T4[groestl_EXT_BYTE_0(src[26])];\
dst[26] ^= groestl_T4[groestl_EXT_BYTE_0(src[27])];\
dst[27] ^= groestl_T4[groestl_EXT_BYTE_0(src[28])];\
dst[28] ^= groestl_T4[groestl_EXT_BYTE_0(src[29])];\
dst[29] ^= groestl_T4[groestl_EXT_BYTE_0(src[30])];\
dst[30] ^= groestl_T4[groestl_EXT_BYTE_0(src[31])];\
dst[31] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 0])];\
dst[ 0] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 5])];\
dst[ 1] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 6])];\
dst[ 2] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 7])];\
dst[ 3] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 8])];\
dst[ 4] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 9])];\
dst[ 5] ^= groestl_T5[groestl_EXT_BYTE_1(src[10])];\
dst[ 6] ^= groestl_T5[groestl_EXT_BYTE_1(src[11])];\
dst[ 7] ^= groestl_T5[groestl_EXT_BYTE_1(src[12])];\
dst[ 8] ^= groestl_T5[groestl_EXT_BYTE_1(src[13])];\
dst[ 9] ^= groestl_T5[groestl_EXT_BYTE_1(src[14])];\
dst[10] ^= groestl_T5[groestl_EXT_BYTE_1(src[15])];\
dst[11] ^= groestl_T5[groestl_EXT_BYTE_1(src[16])];\
dst[12] ^= groestl_T5[groestl_EXT_BYTE_1(src[17])];\
dst[13] ^= groestl_T5[groestl_EXT_BYTE_1(src[18])];\
dst[14] ^= groestl_T5[groestl_EXT_BYTE_1(src[19])];\
dst[15] ^= groestl_T5[groestl_EXT_BYTE_1(src[20])];\
dst[16] ^= groestl_T5[groestl_EXT_BYTE_1(src[21])];\
dst[17] ^= groestl_T5[groestl_EXT_BYTE_1(src[22])];\
dst[18] ^= groestl_T5[groestl_EXT_BYTE_1(src[23])];\
dst[19] ^= groestl_T5[groestl_EXT_BYTE_1(src[24])];\
dst[20] ^= groestl_T5[groestl_EXT_BYTE_1(src[25])];\
dst[21] ^= groestl_T5[groestl_EXT_BYTE_1(src[26])];\
dst[22] ^= groestl_T5[groestl_EXT_BYTE_1(src[27])];\
dst[23] ^= groestl_T5[groestl_EXT_BYTE_1(src[28])];\
dst[24] ^= groestl_T5[groestl_EXT_BYTE_1(src[29])];\
dst[25] ^= groestl_T5[groestl_EXT_BYTE_1(src[30])];\
dst[26] ^= groestl_T5[groestl_EXT_BYTE_1(src[31])];\
dst[27] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 0])];\
dst[28] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 1])];\
dst[29] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 2])];\
dst[30] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 3])];\
dst[31] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 4])];\
dst[ 0] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 9])];\
dst[ 1] ^= groestl_T6[groestl_EXT_BYTE_2(src[10])];\
dst[ 2] ^= groestl_T6[groestl_EXT_BYTE_2(src[11])];\
dst[ 3] ^= groestl_T6[groestl_EXT_BYTE_2(src[12])];\
dst[ 4] ^= groestl_T6[groestl_EXT_BYTE_2(src[13])];\
dst[ 5] ^= groestl_T6[groestl_EXT_BYTE_2(src[14])];\
dst[ 6] ^= groestl_T6[groestl_EXT_BYTE_2(src[15])];\
dst[ 7] ^= groestl_T6[groestl_EXT_BYTE_2(src[16])];\
dst[ 8] ^= groestl_T6[groestl_EXT_BYTE_2(src[17])];\
dst[ 9] ^= groestl_T6[groestl_EXT_BYTE_2(src[18])];\
dst[10] ^= groestl_T6[groestl_EXT_BYTE_2(src[19])];\
dst[11] ^= groestl_T6[groestl_EXT_BYTE_2(src[20])];\
dst[12] ^= groestl_T6[groestl_EXT_BYTE_2(src[21])];\
dst[13] ^= groestl_T6[groestl_EXT_BYTE_2(src[22])];\
dst[14] ^= groestl_T6[groestl_EXT_BYTE_2(src[23])];\
dst[15] ^= groestl_T6[groestl_EXT_BYTE_2(src[24])];\
dst[16] ^= groestl_T6[groestl_EXT_BYTE_2(src[25])];\
dst[17] ^= groestl_T6[groestl_EXT_BYTE_2(src[26])];\
dst[18] ^= groestl_T6[groestl_EXT_BYTE_2(src[27])];\
dst[19] ^= groestl_T6[groestl_EXT_BYTE_2(src[28])];\
dst[20] ^= groestl_T6[groestl_EXT_BYTE_2(src[29])];\
dst[21] ^= groestl_T6[groestl_EXT_BYTE_2(src[30])];\
dst[22] ^= groestl_T6[groestl_EXT_BYTE_2(src[31])];\
dst[23] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 0])];\
dst[24] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 1])];\
dst[25] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 2])];\
dst[26] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 3])];\
dst[27] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 4])];\
dst[28] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 5])];\
dst[29] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 6])];\
dst[30] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 7])];\
dst[31] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 8])];\
dst[ 0] ^= groestl_T7[groestl_EXT_BYTE_3(src[13])];\
dst[ 1] ^= groestl_T7[groestl_EXT_BYTE_3(src[22])];\
dst[ 2] ^= groestl_T7[groestl_EXT_BYTE_3(src[15])];\
dst[ 3] ^= groestl_T7[groestl_EXT_BYTE_3(src[24])];\
dst[ 4] ^= groestl_T7[groestl_EXT_BYTE_3(src[17])];\
dst[ 5] ^= groestl_T7[groestl_EXT_BYTE_3(src[26])];\
dst[ 6] ^= groestl_T7[groestl_EXT_BYTE_3(src[19])];\
dst[ 7] ^= groestl_T7[groestl_EXT_BYTE_3(src[28])];\
dst[ 8] ^= groestl_T7[groestl_EXT_BYTE_3(src[21])];\
dst[ 9] ^= groestl_T7[groestl_EXT_BYTE_3(src[30])];\
dst[10] ^= groestl_T7[groestl_EXT_BYTE_3(src[23])];\
dst[11] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 0])];\
dst[12] ^= groestl_T7[groestl_EXT_BYTE_3(src[25])];\
dst[13] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 2])];\
dst[14] ^= groestl_T7[groestl_EXT_BYTE_3(src[27])];\
dst[15] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 4])];\
dst[16] ^= groestl_T7[groestl_EXT_BYTE_3(src[29])];\
dst[17] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 6])];\
dst[18] ^= groestl_T7[groestl_EXT_BYTE_3(src[31])];\
dst[19] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 8])];\
dst[20] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 1])];\
dst[21] ^= groestl_T7[groestl_EXT_BYTE_3(src[10])];\
dst[22] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 3])];\
dst[23] ^= groestl_T7[groestl_EXT_BYTE_3(src[12])];\
dst[24] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 5])];\
dst[25] ^= groestl_T7[groestl_EXT_BYTE_3(src[14])];\
dst[26] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 7])];\
dst[27] ^= groestl_T7[groestl_EXT_BYTE_3(src[16])];\
dst[28] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 9])];\
dst[29] ^= groestl_T7[groestl_EXT_BYTE_3(src[18])];\
dst[30] ^= groestl_T7[groestl_EXT_BYTE_3(src[11])];\
dst[31] ^= groestl_T7[groestl_EXT_BYTE_3(src[20])];

// global
const CONSTANT UINT32 groestl_T_init[256*8] =
{
0xa5f432c6UL,0x84976ff8UL,0x99b05eeeUL,0x8d8c7af6UL,0x0d17e8ffUL,0xbddc0ad6UL,0xb1c816deUL,0x54fc6d91UL,0x50f09060UL,0x03050702UL,0xa9e02eceUL,0x7d87d156UL,0x192bcce7UL,0x62a613b5UL,0xe6317c4dUL,0x9ab559ecUL,0x45cf408fUL,0x9dbca31fUL,0x40c04989UL,0x879268faUL,0x153fd0efUL,0xeb2694b2UL,0xc940ce8eUL,0x0b1de6fbUL,0xec2f6e41UL,0x67a91ab3UL,0xfd1c435fUL,0xea256045UL,0xbfdaf923UL,0xf7025153UL,0x96a145e4UL,0x5bed769bUL,0xc25d2875UL,0x1c24c5e1UL,0xaee9d43dUL,0x6abef24cUL,0x5aee826cUL,0x41c3bd7eUL,0x0206f3f5UL,0x4fd15283UL,0x5ce48c68UL,0xf4075651UL,0x345c8dd1UL,0x0818e1f9UL,0x93ae4ce2UL,0x73953eabUL,0x53f59762UL,0x3f416b2aUL,0x0c141c08UL,0x52f66395UL,0x65afe946UL,0x5ee27f9dUL,0x28784830UL,0xa1f8cf37UL,0x0f111b0aUL,0xb5c4eb2fUL,0x091b150eUL,0x365a7e24UL,0x9bb6ad1bUL,0x3d4798dfUL,0x266aa7cdUL,0x69bbf54eUL,0xcd4c337fUL,0x9fba50eaUL,0x1b2d3f12UL,0x9eb9a41dUL,0x749cc458UL,0x2e724634UL,0x2d774136UL,0xb2cd11dcUL,0xee299db4UL,0xfb164d5bUL,0xf601a5a4UL,0x4dd7a176UL,0x61a314b7UL,0xce49347dUL,0x7b8ddf52UL,0x3e429fddUL,0x7193cd5eUL,0x97a2b113UL,0xf504a2a6UL,0x68b801b9UL,0x00000000UL,0x2c74b5c1UL,0x60a0e040UL,0x1f21c2e3UL,0xc8433a79UL,0xed2c9ab6UL,0xbed90dd4UL,0x46ca478dUL,0xd9701767UL,0x4bddaf72UL,0xde79ed94UL,0xd467ff98UL,0xe82393b0UL,0x4ade5b85UL,0x6bbd06bbUL,0x2a7ebbc5UL,0xe5347b4fUL,0x163ad7edUL,0xc554d286UL,0xd762f89aUL,0x55ff9966UL,0x94a7b611UL,0xcf4ac08aUL,0x1030d9e9UL,0x060a0e04UL,0x819866feUL,0xf00baba0UL,0x44ccb478UL,0xbad5f025UL,0xe33e754bUL,0xf30eaca2UL,0xfe19445dUL,0xc05bdb80UL,0x8a858005UL,0xadecd33fUL,0xbcdffe21UL,0x48d8a870UL,0x040cfdf1UL,0xdf7a1963UL,0xc1582f77UL,0x759f30afUL,0x63a5e742UL,0x30507020UL,0x1a2ecbe5UL,0x0e12effdUL,0x6db708bfUL,0x4cd45581UL,0x143c2418UL,0x355f7926UL,0x2f71b2c3UL,0xe13886beUL,0xa2fdc835UL,0xcc4fc788UL,0x394b652eUL,0x57f96a93UL,0xf20d5855UL,0x829d61fcUL,0x47c9b37aUL,0xacef27c8UL,0xe73288baUL,0x2b7d4f32UL,0x95a442e6UL,0xa0fb3bc0UL,0x98b3aa19UL,0xd168f69eUL,0x7f8122a3UL,0x66aaee44UL,0x7e82d654UL,0xabe6dd3bUL,0x839e950bUL,0xca45c98cUL,0x297bbcc7UL,0xd36e056bUL,0x3c446c28UL,0x798b2ca7UL,0xe23d81bcUL,0x1d273116UL,0x769a37adUL,0x3b4d96dbUL,0x56fa9e64UL,0x4ed2a674UL,0x1e223614UL,0xdb76e492UL,0x0a1e120cUL,0x6cb4fc48UL,0xe4378fb8UL,0x5de7789fUL,0x6eb20fbdUL,0xef2a6943UL,0xa6f135c4UL,0xa8e3da39UL,0xa4f7c631UL,0x37598ad3UL,0x8b8674f2UL,0x325683d5UL,0x43c54e8bUL,0x59eb856eUL,0xb7c218daUL,0x8c8f8e01UL,0x64ac1db1UL,0xd26df19cUL,0xe03b7249UL,0xb4c71fd8UL,0xfa15b9acUL,0x0709faf3UL,0x256fa0cfUL,0xafea20caUL,0x8e897df4UL,0xe9206747UL,0x18283810UL,0xd5640b6fUL,0x888373f0UL,0x6fb1fb4aUL,0x7296ca5cUL,0x246c5438UL,0xf1085f57UL,0xc7522173UL,0x51f36497UL,0x2365aecbUL,0x7c8425a1UL,0x9cbf57e8UL,0x21635d3eUL,0xdd7cea96UL,0xdc7f1e61UL,0x86919c0dUL,0x85949b0fUL,0x90ab4be0UL,0x42c6ba7cUL,0xc4572671UL,0xaae529ccUL,0xd873e390UL,0x050f0906UL,0x0103f4f7UL,0x12362a1cUL,0xa3fe3cc2UL,0x5fe18b6aUL,0xf910beaeUL,0xd06b0269UL,0x91a8bf17UL,0x58e87199UL,0x2769533aUL,0xb9d0f727UL,0x384891d9UL,0x1335deebUL,0xb3cee52bUL,0x33557722UL,0xbbd604d2UL,0x709039a9UL,0x89808707UL,0xa7f2c133UL,0xb6c1ec2dUL,0x22665a3cUL,0x92adb815UL,0x2060a9c9UL,0x49db5c87UL,0xff1ab0aaUL,0x7888d850UL,0x7a8e2ba5UL,0x8f8a8903UL,0xf8134a59UL,0x809b9209UL,0x1739231aUL,0xda751065UL,0x315384d7UL,0xc651d584UL,0xb8d303d0UL,0xc35edc82UL,0xb0cbe229UL,0x7799c35aUL,0x11332d1eUL,0xcb463d7bUL,0xfc1fb7a8UL,0xd6610c6dUL,0x3a4e622cUL,
0xf432c6c6UL,0x976ff8f8UL,0xb05eeeeeUL,0x8c7af6f6UL,0x17e8ffffUL,0xdc0ad6d6UL,0xc816dedeUL,0xfc6d9191UL,0xf0906060UL,0x05070202UL,0xe02ececeUL,0x87d15656UL,0x2bcce7e7UL,0xa613b5b5UL,0x317c4d4dUL,0xb559ececUL,0xcf408f8fUL,0xbca31f1fUL,0xc0498989UL,0x9268fafaUL,0x3fd0efefUL,0x2694b2b2UL,0x40ce8e8eUL,0x1de6fbfbUL,0x2f6e4141UL,0xa91ab3b3UL,0x1c435f5fUL,0x25604545UL,0xdaf92323UL,0x02515353UL,0xa145e4e4UL,0xed769b9bUL,0x5d287575UL,0x24c5e1e1UL,0xe9d43d3dUL,0xbef24c4cUL,0xee826c6cUL,0xc3bd7e7eUL,0x06f3f5f5UL,0xd1528383UL,0xe48c6868UL,0x07565151UL,0x5c8dd1d1UL,0x18e1f9f9UL,0xae4ce2e2UL,0x953eababUL,0xf5976262UL,0x416b2a2aUL,0x141c0808UL,0xf6639595UL,0xafe94646UL,0xe27f9d9dUL,0x78483030UL,0xf8cf3737UL,0x111b0a0aUL,0xc4eb2f2fUL,0x1b150e0eUL,0x5a7e2424UL,0xb6ad1b1bUL,0x4798dfdfUL,0x6aa7cdcdUL,0xbbf54e4eUL,0x4c337f7fUL,0xba50eaeaUL,0x2d3f1212UL,0xb9a41d1dUL,0x9cc45858UL,0x72463434UL,0x77413636UL,0xcd11dcdcUL,0x299db4b4UL,0x164d5b5bUL,0x01a5a4a4UL,0xd7a17676UL,0xa314b7b7UL,0x49347d7dUL,0x8ddf5252UL,0x429fddddUL,0x93cd5e5eUL,0xa2b11313UL,0x04a2a6a6UL,0xb801b9b9UL,0x00000000UL,0x74b5c1c1UL,0xa0e04040UL,0x21c2e3e3UL,0x433a7979UL,0x2c9ab6b6UL,0xd90dd4d4UL,0xca478d8dUL,0x70176767UL,0xddaf7272UL,0x79ed9494UL,0x67ff9898UL,0x2393b0b0UL,0xde5b8585UL,0xbd06bbbbUL,0x7ebbc5c5UL,0x347b4f4fUL,0x3ad7ededUL,0x54d28686UL,0x62f89a9aUL,0xff996666UL,0xa7b61111UL,0x4ac08a8aUL,0x30d9e9e9UL,0x0a0e0404UL,0x9866fefeUL,0x0baba0a0UL,0xccb47878UL,0xd5f02525UL,0x3e754b4bUL,0x0eaca2a2UL,0x19445d5dUL,0x5bdb8080UL,0x85800505UL,0xecd33f3fUL,0xdffe2121UL,0xd8a87070UL,0x0cfdf1f1UL,0x7a196363UL,0x582f7777UL,0x9f30afafUL,0xa5e74242UL,0x50702020UL,0x2ecbe5e5UL,0x12effdfdUL,0xb708bfbfUL,0xd4558181UL,0x3c241818UL,0x5f792626UL,0x71b2c3c3UL,0x3886bebeUL,0xfdc83535UL,0x4fc78888UL,0x4b652e2eUL,0xf96a9393UL,0x0d585555UL,0x9d61fcfcUL,0xc9b37a7aUL,0xef27c8c8UL,0x3288babaUL,0x7d4f3232UL,0xa442e6e6UL,0xfb3bc0c0UL,0xb3aa1919UL,0x68f69e9eUL,0x8122a3a3UL,0xaaee4444UL,0x82d65454UL,0xe6dd3b3bUL,0x9e950b0bUL,0x45c98c8cUL,0x7bbcc7c7UL,0x6e056b6bUL,0x446c2828UL,0x8b2ca7a7UL,0x3d81bcbcUL,0x27311616UL,0x9a37adadUL,0x4d96dbdbUL,0xfa9e6464UL,0xd2a67474UL,0x22361414UL,0x76e49292UL,0x1e120c0cUL,0xb4fc4848UL,0x378fb8b8UL,0xe7789f9fUL,0xb20fbdbdUL,0x2a694343UL,0xf135c4c4UL,0xe3da3939UL,0xf7c63131UL,0x598ad3d3UL,0x8674f2f2UL,0x5683d5d5UL,0xc54e8b8bUL,0xeb856e6eUL,0xc218dadaUL,0x8f8e0101UL,0xac1db1b1UL,0x6df19c9cUL,0x3b724949UL,0xc71fd8d8UL,0x15b9acacUL,0x09faf3f3UL,0x6fa0cfcfUL,0xea20cacaUL,0x897df4f4UL,0x20674747UL,0x28381010UL,0x640b6f6fUL,0x8373f0f0UL,0xb1fb4a4aUL,0x96ca5c5cUL,0x6c543838UL,0x085f5757UL,0x52217373UL,0xf3649797UL,0x65aecbcbUL,0x8425a1a1UL,0xbf57e8e8UL,0x635d3e3eUL,0x7cea9696UL,0x7f1e6161UL,0x919c0d0dUL,0x949b0f0fUL,0xab4be0e0UL,0xc6ba7c7cUL,0x57267171UL,0xe529ccccUL,0x73e39090UL,0x0f090606UL,0x03f4f7f7UL,0x362a1c1cUL,0xfe3cc2c2UL,0xe18b6a6aUL,0x10beaeaeUL,0x6b026969UL,0xa8bf1717UL,0xe8719999UL,0x69533a3aUL,0xd0f72727UL,0x4891d9d9UL,0x35deebebUL,0xcee52b2bUL,0x55772222UL,0xd604d2d2UL,0x9039a9a9UL,0x80870707UL,0xf2c13333UL,0xc1ec2d2dUL,0x665a3c3cUL,0xadb81515UL,0x60a9c9c9UL,0xdb5c8787UL,0x1ab0aaaaUL,0x88d85050UL,0x8e2ba5a5UL,0x8a890303UL,0x134a5959UL,0x9b920909UL,0x39231a1aUL,0x75106565UL,0x5384d7d7UL,0x51d58484UL,0xd303d0d0UL,0x5edc8282UL,0xcbe22929UL,0x99c35a5aUL,0x332d1e1eUL,0x463d7b7bUL,0x1fb7a8a8UL,0x610c6d6dUL,0x4e622c2cUL,
0x32c6c6a5UL,0x6ff8f884UL,0x5eeeee99UL,0x7af6f68dUL,0xe8ffff0dUL,0x0ad6d6bdUL,0x16dedeb1UL,0x6d919154UL,0x90606050UL,0x07020203UL,0x2ececea9UL,0xd156567dUL,0xcce7e719UL,0x13b5b562UL,0x7c4d4de6UL,0x59ecec9aUL,0x408f8f45UL,0xa31f1f9dUL,0x49898940UL,0x68fafa87UL,0xd0efef15UL,0x94b2b2ebUL,0xce8e8ec9UL,0xe6fbfb0bUL,0x6e4141ecUL,0x1ab3b367UL,0x435f5ffdUL,0x604545eaUL,0xf92323bfUL,0x515353f7UL,0x45e4e496UL,0x769b9b5bUL,0x287575c2UL,0xc5e1e11cUL,0xd43d3daeUL,0xf24c4c6aUL,0x826c6c5aUL,0xbd7e7e41UL,0xf3f5f502UL,0x5283834fUL,0x8c68685cUL,0x565151f4UL,0x8dd1d134UL,0xe1f9f908UL,0x4ce2e293UL,0x3eabab73UL,0x97626253UL,0x6b2a2a3fUL,0x1c08080cUL,0x63959552UL,0xe9464665UL,0x7f9d9d5eUL,0x48303028UL,0xcf3737a1UL,0x1b0a0a0fUL,0xeb2f2fb5UL,0x150e0e09UL,0x7e242436UL,0xad1b1b9bUL,0x98dfdf3dUL,0xa7cdcd26UL,0xf54e4e69UL,0x337f7fcdUL,0x50eaea9fUL,0x3f12121bUL,0xa41d1d9eUL,0xc4585874UL,0x4634342eUL,0x4136362dUL,0x11dcdcb2UL,0x9db4b4eeUL,0x4d5b5bfbUL,0xa5a4a4f6UL,0xa176764dUL,0x14b7b761UL,0x347d7dceUL,0xdf52527bUL,0x9fdddd3eUL,0xcd5e5e71UL,0xb1131397UL,0xa2a6a6f5UL,0x01b9b968UL,0x00000000UL,0xb5c1c12cUL,0xe0404060UL,0xc2e3e31fUL,0x3a7979c8UL,0x9ab6b6edUL,0x0dd4d4beUL,0x478d8d46UL,0x176767d9UL,0xaf72724bUL,0xed9494deUL,0xff9898d4UL,0x93b0b0e8UL,0x5b85854aUL,0x06bbbb6bUL,0xbbc5c52aUL,0x7b4f4fe5UL,0xd7eded16UL,0xd28686c5UL,0xf89a9ad7UL,0x99666655UL,0xb6111194UL,0xc08a8acfUL,0xd9e9e910UL,0x0e040406UL,0x66fefe81UL,0xaba0a0f0UL,0xb4787844UL,0xf02525baUL,0x754b4be3UL,0xaca2a2f3UL,0x445d5dfeUL,0xdb8080c0UL,0x8005058aUL,0xd33f3fadUL,0xfe2121bcUL,0xa8707048UL,0xfdf1f104UL,0x196363dfUL,0x2f7777c1UL,0x30afaf75UL,0xe7424263UL,0x70202030UL,0xcbe5e51aUL,0xeffdfd0eUL,0x08bfbf6dUL,0x5581814cUL,0x24181814UL,0x79262635UL,0xb2c3c32fUL,0x86bebee1UL,0xc83535a2UL,0xc78888ccUL,0x652e2e39UL,0x6a939357UL,0x585555f2UL,0x61fcfc82UL,0xb37a7a47UL,0x27c8c8acUL,0x88babae7UL,0x4f32322bUL,0x42e6e695UL,0x3bc0c0a0UL,0xaa191998UL,0xf69e9ed1UL,0x22a3a37fUL,0xee444466UL,0xd654547eUL,0xdd3b3babUL,0x950b0b83UL,0xc98c8ccaUL,0xbcc7c729UL,0x056b6bd3UL,0x6c28283cUL,0x2ca7a779UL,0x81bcbce2UL,0x3116161dUL,0x37adad76UL,0x96dbdb3bUL,0x9e646456UL,0xa674744eUL,0x3614141eUL,0xe49292dbUL,0x120c0c0aUL,0xfc48486cUL,0x8fb8b8e4UL,0x789f9f5dUL,0x0fbdbd6eUL,0x694343efUL,0x35c4c4a6UL,0xda3939a8UL,0xc63131a4UL,0x8ad3d337UL,0x74f2f28bUL,0x83d5d532UL,0x4e8b8b43UL,0x856e6e59UL,0x18dadab7UL,0x8e01018cUL,0x1db1b164UL,0xf19c9cd2UL,0x724949e0UL,0x1fd8d8b4UL,0xb9acacfaUL,0xfaf3f307UL,0xa0cfcf25UL,0x20cacaafUL,0x7df4f48eUL,0x674747e9UL,0x38101018UL,0x0b6f6fd5UL,0x73f0f088UL,0xfb4a4a6fUL,0xca5c5c72UL,0x54383824UL,0x5f5757f1UL,0x217373c7UL,0x64979751UL,0xaecbcb23UL,0x25a1a17cUL,0x57e8e89cUL,0x5d3e3e21UL,0xea9696ddUL,0x1e6161dcUL,0x9c0d0d86UL,0x9b0f0f85UL,0x4be0e090UL,0xba7c7c42UL,0x267171c4UL,0x29ccccaaUL,0xe39090d8UL,0x09060605UL,0xf4f7f701UL,0x2a1c1c12UL,0x3cc2c2a3UL,0x8b6a6a5fUL,0xbeaeaef9UL,0x026969d0UL,0xbf171791UL,0x71999958UL,0x533a3a27UL,0xf72727b9UL,0x91d9d938UL,0xdeebeb13UL,0xe52b2bb3UL,0x77222233UL,0x04d2d2bbUL,0x39a9a970UL,0x87070789UL,0xc13333a7UL,0xec2d2db6UL,0x5a3c3c22UL,0xb8151592UL,0xa9c9c920UL,0x5c878749UL,0xb0aaaaffUL,0xd8505078UL,0x2ba5a57aUL,0x8903038fUL,0x4a5959f8UL,0x92090980UL,0x231a1a17UL,0x106565daUL,0x84d7d731UL,0xd58484c6UL,0x03d0d0b8UL,0xdc8282c3UL,0xe22929b0UL,0xc35a5a77UL,0x2d1e1e11UL,0x3d7b7bcbUL,0xb7a8a8fcUL,0x0c6d6dd6UL,0x622c2c3aUL,
0xc6c6a597UL,0xf8f884ebUL,0xeeee99c7UL,0xf6f68df7UL,0xffff0de5UL,0xd6d6bdb7UL,0xdedeb1a7UL,0x91915439UL,0x606050c0UL,0x02020304UL,0xcecea987UL,0x56567dacUL,0xe7e719d5UL,0xb5b56271UL,0x4d4de69aUL,0xecec9ac3UL,0x8f8f4505UL,0x1f1f9d3eUL,0x89894009UL,0xfafa87efUL,0xefef15c5UL,0xb2b2eb7fUL,0x8e8ec907UL,0xfbfb0bedUL,0x4141ec82UL,0xb3b3677dUL,0x5f5ffdbeUL,0x4545ea8aUL,0x2323bf46UL,0x5353f7a6UL,0xe4e496d3UL,0x9b9b5b2dUL,0x7575c2eaUL,0xe1e11cd9UL,0x3d3dae7aUL,0x4c4c6a98UL,0x6c6c5ad8UL,0x7e7e41fcUL,0xf5f502f1UL,0x83834f1dUL,0x68685cd0UL,0x5151f4a2UL,0xd1d134b9UL,0xf9f908e9UL,0xe2e293dfUL,0xabab734dUL,0x626253c4UL,0x2a2a3f54UL,0x08080c10UL,0x95955231UL,0x4646658cUL,0x9d9d5e21UL,0x30302860UL,0x3737a16eUL,0x0a0a0f14UL,0x2f2fb55eUL,0x0e0e091cUL,0x24243648UL,0x1b1b9b36UL,0xdfdf3da5UL,0xcdcd2681UL,0x4e4e699cUL,0x7f7fcdfeUL,0xeaea9fcfUL,0x12121b24UL,0x1d1d9e3aUL,0x585874b0UL,0x34342e68UL,0x36362d6cUL,0xdcdcb2a3UL,0xb4b4ee73UL,0x5b5bfbb6UL,0xa4a4f653UL,0x76764decUL,0xb7b76175UL,0x7d7dcefaUL,0x52527ba4UL,0xdddd3ea1UL,0x5e5e71bcUL,0x13139726UL,0xa6a6f557UL,0xb9b96869UL,0x00000000UL,0xc1c12c99UL,0x40406080UL,0xe3e31fddUL,0x7979c8f2UL,0xb6b6ed77UL,0xd4d4beb3UL,0x8d8d4601UL,0x6767d9ceUL,0x72724be4UL,0x9494de33UL,0x9898d42bUL,0xb0b0e87bUL,0x85854a11UL,0xbbbb6b6dUL,0xc5c52a91UL,0x4f4fe59eUL,0xeded16c1UL,0x8686c517UL,0x9a9ad72fUL,0x666655ccUL,0x11119422UL,0x8a8acf0fUL,0xe9e910c9UL,0x04040608UL,0xfefe81e7UL,0xa0a0f05bUL,0x787844f0UL,0x2525ba4aUL,0x4b4be396UL,0xa2a2f35fUL,0x5d5dfebaUL,0x8080c01bUL,0x05058a0aUL,0x3f3fad7eUL,0x2121bc42UL,0x707048e0UL,0xf1f104f9UL,0x6363dfc6UL,0x7777c1eeUL,0xafaf7545UL,0x42426384UL,0x20203040UL,0xe5e51ad1UL,0xfdfd0ee1UL,0xbfbf6d65UL,0x81814c19UL,0x18181430UL,0x2626354cUL,0xc3c32f9dUL,0xbebee167UL,0x3535a26aUL,0x8888cc0bUL,0x2e2e395cUL,0x9393573dUL,0x5555f2aaUL,0xfcfc82e3UL,0x7a7a47f4UL,0xc8c8ac8bUL,0xbabae76fUL,0x32322b64UL,0xe6e695d7UL,0xc0c0a09bUL,0x19199832UL,0x9e9ed127UL,0xa3a37f5dUL,0x44446688UL,0x54547ea8UL,0x3b3bab76UL,0x0b0b8316UL,0x8c8cca03UL,0xc7c72995UL,0x6b6bd3d6UL,0x28283c50UL,0xa7a77955UL,0xbcbce263UL,0x16161d2cUL,0xadad7641UL,0xdbdb3badUL,0x646456c8UL,0x74744ee8UL,0x14141e28UL,0x9292db3fUL,0x0c0c0a18UL,0x48486c90UL,0xb8b8e46bUL,0x9f9f5d25UL,0xbdbd6e61UL,0x4343ef86UL,0xc4c4a693UL,0x3939a872UL,0x3131a462UL,0xd3d337bdUL,0xf2f28bffUL,0xd5d532b1UL,0x8b8b430dUL,0x6e6e59dcUL,0xdadab7afUL,0x01018c02UL,0xb1b16479UL,0x9c9cd223UL,0x4949e092UL,0xd8d8b4abUL,0xacacfa43UL,0xf3f307fdUL,0xcfcf2585UL,0xcacaaf8fUL,0xf4f48ef3UL,0x4747e98eUL,0x10101820UL,0x6f6fd5deUL,0xf0f088fbUL,0x4a4a6f94UL,0x5c5c72b8UL,0x38382470UL,0x5757f1aeUL,0x7373c7e6UL,0x97975135UL,0xcbcb238dUL,0xa1a17c59UL,0xe8e89ccbUL,0x3e3e217cUL,0x9696dd37UL,0x6161dcc2UL,0x0d0d861aUL,0x0f0f851eUL,0xe0e090dbUL,0x7c7c42f8UL,0x7171c4e2UL,0xccccaa83UL,0x9090d83bUL,0x0606050cUL,0xf7f701f5UL,0x1c1c1238UL,0xc2c2a39fUL,0x6a6a5fd4UL,0xaeaef947UL,0x6969d0d2UL,0x1717912eUL,0x99995829UL,0x3a3a2774UL,0x2727b94eUL,0xd9d938a9UL,0xebeb13cdUL,0x2b2bb356UL,0x22223344UL,0xd2d2bbbfUL,0xa9a97049UL,0x0707890eUL,0x3333a766UL,0x2d2db65aUL,0x3c3c2278UL,0x1515922aUL,0xc9c92089UL,0x87874915UL,0xaaaaff4fUL,0x505078a0UL,0xa5a57a51UL,0x03038f06UL,0x5959f8b2UL,0x09098012UL,0x1a1a1734UL,0x6565dacaUL,0xd7d731b5UL,0x8484c613UL,0xd0d0b8bbUL,0x8282c31fUL,0x2929b052UL,0x5a5a77b4UL,0x1e1e113cUL,0x7b7bcbf6UL,0xa8a8fc4bUL,0x6d6dd6daUL,0x2c2c3a58UL,
0xc6a597f4UL,0xf884eb97UL,0xee99c7b0UL,0xf68df78cUL,0xff0de517UL,0xd6bdb7dcUL,0xdeb1a7c8UL,0x915439fcUL,0x6050c0f0UL,0x02030405UL,0xcea987e0UL,0x567dac87UL,0xe719d52bUL,0xb56271a6UL,0x4de69a31UL,0xec9ac3b5UL,0x8f4505cfUL,0x1f9d3ebcUL,0x894009c0UL,0xfa87ef92UL,0xef15c53fUL,0xb2eb7f26UL,0x8ec90740UL,0xfb0bed1dUL,0x41ec822fUL,0xb3677da9UL,0x5ffdbe1cUL,0x45ea8a25UL,0x23bf46daUL,0x53f7a602UL,0xe496d3a1UL,0x9b5b2dedUL,0x75c2ea5dUL,0xe11cd924UL,0x3dae7ae9UL,0x4c6a98beUL,0x6c5ad8eeUL,0x7e41fcc3UL,0xf502f106UL,0x834f1dd1UL,0x685cd0e4UL,0x51f4a207UL,0xd134b95cUL,0xf908e918UL,0xe293dfaeUL,0xab734d95UL,0x6253c4f5UL,0x2a3f5441UL,0x080c1014UL,0x955231f6UL,0x46658cafUL,0x9d5e21e2UL,0x30286078UL,0x37a16ef8UL,0x0a0f1411UL,0x2fb55ec4UL,0x0e091c1bUL,0x2436485aUL,0x1b9b36b6UL,0xdf3da547UL,0xcd26816aUL,0x4e699cbbUL,0x7fcdfe4cUL,0xea9fcfbaUL,0x121b242dUL,0x1d9e3ab9UL,0x5874b09cUL,0x342e6872UL,0x362d6c77UL,0xdcb2a3cdUL,0xb4ee7329UL,0x5bfbb616UL,0xa4f65301UL,0x764decd7UL,0xb76175a3UL,0x7dcefa49UL,0x527ba48dUL,0xdd3ea142UL,0x5e71bc93UL,0x139726a2UL,0xa6f55704UL,0xb96869b8UL,0x00000000UL,0xc12c9974UL,0x406080a0UL,0xe31fdd21UL,0x79c8f243UL,0xb6ed772cUL,0xd4beb3d9UL,0x8d4601caUL,0x67d9ce70UL,0x724be4ddUL,0x94de3379UL,0x98d42b67UL,0xb0e87b23UL,0x854a11deUL,0xbb6b6dbdUL,0xc52a917eUL,0x4fe59e34UL,0xed16c13aUL,0x86c51754UL,0x9ad72f62UL,0x6655ccffUL,0x119422a7UL,0x8acf0f4aUL,0xe910c930UL,0x0406080aUL,0xfe81e798UL,0xa0f05b0bUL,0x7844f0ccUL,0x25ba4ad5UL,0x4be3963eUL,0xa2f35f0eUL,0x5dfeba19UL,0x80c01b5bUL,0x058a0a85UL,0x3fad7eecUL,0x21bc42dfUL,0x7048e0d8UL,0xf104f90cUL,0x63dfc67aUL,0x77c1ee58UL,0xaf75459fUL,0x426384a5UL,0x20304050UL,0xe51ad12eUL,0xfd0ee112UL,0xbf6d65b7UL,0x814c19d4UL,0x1814303cUL,0x26354c5fUL,0xc32f9d71UL,0xbee16738UL,0x35a26afdUL,0x88cc0b4fUL,0x2e395c4bUL,0x93573df9UL,0x55f2aa0dUL,0xfc82e39dUL,0x7a47f4c9UL,0xc8ac8befUL,0xbae76f32UL,0x322b647dUL,0xe695d7a4UL,0xc0a09bfbUL,0x199832b3UL,0x9ed12768UL,0xa37f5d81UL,0x446688aaUL,0x547ea882UL,0x3bab76e6UL,0x0b83169eUL,0x8cca0345UL,0xc729957bUL,0x6bd3d66eUL,0x283c5044UL,0xa779558bUL,0xbce2633dUL,0x161d2c27UL,0xad76419aUL,0xdb3bad4dUL,0x6456c8faUL,0x744ee8d2UL,0x141e2822UL,0x92db3f76UL,0x0c0a181eUL,0x486c90b4UL,0xb8e46b37UL,0x9f5d25e7UL,0xbd6e61b2UL,0x43ef862aUL,0xc4a693f1UL,0x39a872e3UL,0x31a462f7UL,0xd337bd59UL,0xf28bff86UL,0xd532b156UL,0x8b430dc5UL,0x6e59dcebUL,0xdab7afc2UL,0x018c028fUL,0xb16479acUL,0x9cd2236dUL,0x49e0923bUL,0xd8b4abc7UL,0xacfa4315UL,0xf307fd09UL,0xcf25856fUL,0xcaaf8feaUL,0xf48ef389UL,0x47e98e20UL,0x10182028UL,0x6fd5de64UL,0xf088fb83UL,0x4a6f94b1UL,0x5c72b896UL,0x3824706cUL,0x57f1ae08UL,0x73c7e652UL,0x975135f3UL,0xcb238d65UL,0xa17c5984UL,0xe89ccbbfUL,0x3e217c63UL,0x96dd377cUL,0x61dcc27fUL,0x0d861a91UL,0x0f851e94UL,0xe090dbabUL,0x7c42f8c6UL,0x71c4e257UL,0xccaa83e5UL,0x90d83b73UL,0x06050c0fUL,0xf701f503UL,0x1c123836UL,0xc2a39ffeUL,0x6a5fd4e1UL,0xaef94710UL,0x69d0d26bUL,0x17912ea8UL,0x995829e8UL,0x3a277469UL,0x27b94ed0UL,0xd938a948UL,0xeb13cd35UL,0x2bb356ceUL,0x22334455UL,0xd2bbbfd6UL,0xa9704990UL,0x07890e80UL,0x33a766f2UL,0x2db65ac1UL,0x3c227866UL,0x15922aadUL,0xc9208960UL,0x874915dbUL,0xaaff4f1aUL,0x5078a088UL,0xa57a518eUL,0x038f068aUL,0x59f8b213UL,0x0980129bUL,0x1a173439UL,0x65daca75UL,0xd731b553UL,0x84c61351UL,0xd0b8bbd3UL,0x82c31f5eUL,0x29b052cbUL,0x5a77b499UL,0x1e113c33UL,0x7bcbf646UL,0xa8fc4b1fUL,0x6dd6da61UL,0x2c3a584eUL,
0xa597f4a5UL,0x84eb9784UL,0x99c7b099UL,0x8df78c8dUL,0x0de5170dUL,0xbdb7dcbdUL,0xb1a7c8b1UL,0x5439fc54UL,0x50c0f050UL,0x03040503UL,0xa987e0a9UL,0x7dac877dUL,0x19d52b19UL,0x6271a662UL,0xe69a31e6UL,0x9ac3b59aUL,0x4505cf45UL,0x9d3ebc9dUL,0x4009c040UL,0x87ef9287UL,0x15c53f15UL,0xeb7f26ebUL,0xc90740c9UL,0x0bed1d0bUL,0xec822fecUL,0x677da967UL,0xfdbe1cfdUL,0xea8a25eaUL,0xbf46dabfUL,0xf7a602f7UL,0x96d3a196UL,0x5b2ded5bUL,0xc2ea5dc2UL,0x1cd9241cUL,0xae7ae9aeUL,0x6a98be6aUL,0x5ad8ee5aUL,0x41fcc341UL,0x02f10602UL,0x4f1dd14fUL,0x5cd0e45cUL,0xf4a207f4UL,0x34b95c34UL,0x08e91808UL,0x93dfae93UL,0x734d9573UL,0x53c4f553UL,0x3f54413fUL,0x0c10140cUL,0x5231f652UL,0x658caf65UL,0x5e21e25eUL,0x28607828UL,0xa16ef8a1UL,0x0f14110fUL,0xb55ec4b5UL,0x091c1b09UL,0x36485a36UL,0x9b36b69bUL,0x3da5473dUL,0x26816a26UL,0x699cbb69UL,0xcdfe4ccdUL,0x9fcfba9fUL,0x1b242d1bUL,0x9e3ab99eUL,0x74b09c74UL,0x2e68722eUL,0x2d6c772dUL,0xb2a3cdb2UL,0xee7329eeUL,0xfbb616fbUL,0xf65301f6UL,0x4decd74dUL,0x6175a361UL,0xcefa49ceUL,0x7ba48d7bUL,0x3ea1423eUL,0x71bc9371UL,0x9726a297UL,0xf55704f5UL,0x6869b868UL,0x00000000UL,0x2c99742cUL,0x6080a060UL,0x1fdd211fUL,0xc8f243c8UL,0xed772cedUL,0xbeb3d9beUL,0x4601ca46UL,0xd9ce70d9UL,0x4be4dd4bUL,0xde3379deUL,0xd42b67d4UL,0xe87b23e8UL,0x4a11de4aUL,0x6b6dbd6bUL,0x2a917e2aUL,0xe59e34e5UL,0x16c13a16UL,0xc51754c5UL,0xd72f62d7UL,0x55ccff55UL,0x9422a794UL,0xcf0f4acfUL,0x10c93010UL,0x06080a06UL,0x81e79881UL,0xf05b0bf0UL,0x44f0cc44UL,0xba4ad5baUL,0xe3963ee3UL,0xf35f0ef3UL,0xfeba19feUL,0xc01b5bc0UL,0x8a0a858aUL,0xad7eecadUL,0xbc42dfbcUL,0x48e0d848UL,0x04f90c04UL,0xdfc67adfUL,0xc1ee58c1UL,0x75459f75UL,0x6384a563UL,0x30405030UL,0x1ad12e1aUL,0x0ee1120eUL,0x6d65b76dUL,0x4c19d44cUL,0x14303c14UL,0x354c5f35UL,0x2f9d712fUL,0xe16738e1UL,0xa26afda2UL,0xcc0b4fccUL,0x395c4b39UL,0x573df957UL,0xf2aa0df2UL,0x82e39d82UL,0x47f4c947UL,0xac8befacUL,0xe76f32e7UL,0x2b647d2bUL,0x95d7a495UL,0xa09bfba0UL,0x9832b398UL,0xd12768d1UL,0x7f5d817fUL,0x6688aa66UL,0x7ea8827eUL,0xab76e6abUL,0x83169e83UL,0xca0345caUL,0x29957b29UL,0xd3d66ed3UL,0x3c50443cUL,0x79558b79UL,0xe2633de2UL,0x1d2c271dUL,0x76419a76UL,0x3bad4d3bUL,0x56c8fa56UL,0x4ee8d24eUL,0x1e28221eUL,0xdb3f76dbUL,0x0a181e0aUL,0x6c90b46cUL,0xe46b37e4UL,0x5d25e75dUL,0x6e61b26eUL,0xef862aefUL,0xa693f1a6UL,0xa872e3a8UL,0xa462f7a4UL,0x37bd5937UL,0x8bff868bUL,0x32b15632UL,0x430dc543UL,0x59dceb59UL,0xb7afc2b7UL,0x8c028f8cUL,0x6479ac64UL,0xd2236dd2UL,0xe0923be0UL,0xb4abc7b4UL,0xfa4315faUL,0x07fd0907UL,0x25856f25UL,0xaf8feaafUL,0x8ef3898eUL,0xe98e20e9UL,0x18202818UL,0xd5de64d5UL,0x88fb8388UL,0x6f94b16fUL,0x72b89672UL,0x24706c24UL,0xf1ae08f1UL,0xc7e652c7UL,0x5135f351UL,0x238d6523UL,0x7c59847cUL,0x9ccbbf9cUL,0x217c6321UL,0xdd377cddUL,0xdcc27fdcUL,0x861a9186UL,0x851e9485UL,0x90dbab90UL,0x42f8c642UL,0xc4e257c4UL,0xaa83e5aaUL,0xd83b73d8UL,0x050c0f05UL,0x01f50301UL,0x12383612UL,0xa39ffea3UL,0x5fd4e15fUL,0xf94710f9UL,0xd0d26bd0UL,0x912ea891UL,0x5829e858UL,0x27746927UL,0xb94ed0b9UL,0x38a94838UL,0x13cd3513UL,0xb356ceb3UL,0x33445533UL,0xbbbfd6bbUL,0x70499070UL,0x890e8089UL,0xa766f2a7UL,0xb65ac1b6UL,0x22786622UL,0x922aad92UL,0x20896020UL,0x4915db49UL,0xff4f1affUL,0x78a08878UL,0x7a518e7aUL,0x8f068a8fUL,0xf8b213f8UL,0x80129b80UL,0x17343917UL,0xdaca75daUL,0x31b55331UL,0xc61351c6UL,0xb8bbd3b8UL,0xc31f5ec3UL,0xb052cbb0UL,0x77b49977UL,0x113c3311UL,0xcbf646cbUL,0xfc4b1ffcUL,0xd6da61d6UL,0x3a584e3aUL,
0x97f4a5f4UL,0xeb978497UL,0xc7b099b0UL,0xf78c8d8cUL,0xe5170d17UL,0xb7dcbddcUL,0xa7c8b1c8UL,0x39fc54fcUL,0xc0f050f0UL,0x04050305UL,0x87e0a9e0UL,0xac877d87UL,0xd52b192bUL,0x71a662a6UL,0x9a31e631UL,0xc3b59ab5UL,0x05cf45cfUL,0x3ebc9dbcUL,0x09c040c0UL,0xef928792UL,0xc53f153fUL,0x7f26eb26UL,0x0740c940UL,0xed1d0b1dUL,0x822fec2fUL,0x7da967a9UL,0xbe1cfd1cUL,0x8a25ea25UL,0x46dabfdaUL,0xa602f702UL,0xd3a196a1UL,0x2ded5bedUL,0xea5dc25dUL,0xd9241c24UL,0x7ae9aee9UL,0x98be6abeUL,0xd8ee5aeeUL,0xfcc341c3UL,0xf1060206UL,0x1dd14fd1UL,0xd0e45ce4UL,0xa207f407UL,0xb95c345cUL,0xe9180818UL,0xdfae93aeUL,0x4d957395UL,0xc4f553f5UL,0x54413f41UL,0x10140c14UL,0x31f652f6UL,0x8caf65afUL,0x21e25ee2UL,0x60782878UL,0x6ef8a1f8UL,0x14110f11UL,0x5ec4b5c4UL,0x1c1b091bUL,0x485a365aUL,0x36b69bb6UL,0xa5473d47UL,0x816a266aUL,0x9cbb69bbUL,0xfe4ccd4cUL,0xcfba9fbaUL,0x242d1b2dUL,0x3ab99eb9UL,0xb09c749cUL,0x68722e72UL,0x6c772d77UL,0xa3cdb2cdUL,0x7329ee29UL,0xb616fb16UL,0x5301f601UL,0xecd74dd7UL,0x75a361a3UL,0xfa49ce49UL,0xa48d7b8dUL,0xa1423e42UL,0xbc937193UL,0x26a297a2UL,0x5704f504UL,0x69b868b8UL,0x00000000UL,0x99742c74UL,0x80a060a0UL,0xdd211f21UL,0xf243c843UL,0x772ced2cUL,0xb3d9bed9UL,0x01ca46caUL,0xce70d970UL,0xe4dd4bddUL,0x3379de79UL,0x2b67d467UL,0x7b23e823UL,0x11de4adeUL,0x6dbd6bbdUL,0x917e2a7eUL,0x9e34e534UL,0xc13a163aUL,0x1754c554UL,0x2f62d762UL,0xccff55ffUL,0x22a794a7UL,0x0f4acf4aUL,0xc9301030UL,0x080a060aUL,0xe7988198UL,0x5b0bf00bUL,0xf0cc44ccUL,0x4ad5bad5UL,0x963ee33eUL,0x5f0ef30eUL,0xba19fe19UL,0x1b5bc05bUL,0x0a858a85UL,0x7eecadecUL,0x42dfbcdfUL,0xe0d848d8UL,0xf90c040cUL,0xc67adf7aUL,0xee58c158UL,0x459f759fUL,0x84a563a5UL,0x40503050UL,0xd12e1a2eUL,0xe1120e12UL,0x65b76db7UL,0x19d44cd4UL,0x303c143cUL,0x4c5f355fUL,0x9d712f71UL,0x6738e138UL,0x6afda2fdUL,0x0b4fcc4fUL,0x5c4b394bUL,0x3df957f9UL,0xaa0df20dUL,0xe39d829dUL,0xf4c947c9UL,0x8befacefUL,0x6f32e732UL,0x647d2b7dUL,0xd7a495a4UL,0x9bfba0fbUL,0x32b398b3UL,0x2768d168UL,0x5d817f81UL,0x88aa66aaUL,0xa8827e82UL,0x76e6abe6UL,0x169e839eUL,0x0345ca45UL,0x957b297bUL,0xd66ed36eUL,0x50443c44UL,0x558b798bUL,0x633de23dUL,0x2c271d27UL,0x419a769aUL,0xad4d3b4dUL,0xc8fa56faUL,0xe8d24ed2UL,0x28221e22UL,0x3f76db76UL,0x181e0a1eUL,0x90b46cb4UL,0x6b37e437UL,0x25e75de7UL,0x61b26eb2UL,0x862aef2aUL,0x93f1a6f1UL,0x72e3a8e3UL,0x62f7a4f7UL,0xbd593759UL,0xff868b86UL,0xb1563256UL,0x0dc543c5UL,0xdceb59ebUL,0xafc2b7c2UL,0x028f8c8fUL,0x79ac64acUL,0x236dd26dUL,0x923be03bUL,0xabc7b4c7UL,0x4315fa15UL,0xfd090709UL,0x856f256fUL,0x8feaafeaUL,0xf3898e89UL,0x8e20e920UL,0x20281828UL,0xde64d564UL,0xfb838883UL,0x94b16fb1UL,0xb8967296UL,0x706c246cUL,0xae08f108UL,0xe652c752UL,0x35f351f3UL,0x8d652365UL,0x59847c84UL,0xcbbf9cbfUL,0x7c632163UL,0x377cdd7cUL,0xc27fdc7fUL,0x1a918691UL,0x1e948594UL,0xdbab90abUL,0xf8c642c6UL,0xe257c457UL,0x83e5aae5UL,0x3b73d873UL,0x0c0f050fUL,0xf5030103UL,0x38361236UL,0x9ffea3feUL,0xd4e15fe1UL,0x4710f910UL,0xd26bd06bUL,0x2ea891a8UL,0x29e858e8UL,0x74692769UL,0x4ed0b9d0UL,0xa9483848UL,0xcd351335UL,0x56ceb3ceUL,0x44553355UL,0xbfd6bbd6UL,0x49907090UL,0x0e808980UL,0x66f2a7f2UL,0x5ac1b6c1UL,0x78662266UL,0x2aad92adUL,0x89602060UL,0x15db49dbUL,0x4f1aff1aUL,0xa0887888UL,0x518e7a8eUL,0x068a8f8aUL,0xb213f813UL,0x129b809bUL,0x34391739UL,0xca75da75UL,0xb5533153UL,0x1351c651UL,0xbbd3b8d3UL,0x1f5ec35eUL,0x52cbb0cbUL,0xb4997799UL,0x3c331133UL,0xf646cb46UL,0x4b1ffc1fUL,0xda61d661UL,0x584e3a4eUL,
0xf4a5f432UL,0x9784976fUL,0xb099b05eUL,0x8c8d8c7aUL,0x170d17e8UL,0xdcbddc0aUL,0xc8b1c816UL,0xfc54fc6dUL,0xf050f090UL,0x05030507UL,0xe0a9e02eUL,0x877d87d1UL,0x2b192bccUL,0xa662a613UL,0x31e6317cUL,0xb59ab559UL,0xcf45cf40UL,0xbc9dbca3UL,0xc040c049UL,0x92879268UL,0x3f153fd0UL,0x26eb2694UL,0x40c940ceUL,0x1d0b1de6UL,0x2fec2f6eUL,0xa967a91aUL,0x1cfd1c43UL,0x25ea2560UL,0xdabfdaf9UL,0x02f70251UL,0xa196a145UL,0xed5bed76UL,0x5dc25d28UL,0x241c24c5UL,0xe9aee9d4UL,0xbe6abef2UL,0xee5aee82UL,0xc341c3bdUL,0x060206f3UL,0xd14fd152UL,0xe45ce48cUL,0x07f40756UL,0x5c345c8dUL,0x180818e1UL,0xae93ae4cUL,0x9573953eUL,0xf553f597UL,0x413f416bUL,0x140c141cUL,0xf652f663UL,0xaf65afe9UL,0xe25ee27fUL,0x78287848UL,0xf8a1f8cfUL,0x110f111bUL,0xc4b5c4ebUL,0x1b091b15UL,0x5a365a7eUL,0xb69bb6adUL,0x473d4798UL,0x6a266aa7UL,0xbb69bbf5UL,0x4ccd4c33UL,0xba9fba50UL,0x2d1b2d3fUL,0xb99eb9a4UL,0x9c749cc4UL,0x722e7246UL,0x772d7741UL,0xcdb2cd11UL,0x29ee299dUL,0x16fb164dUL,0x01f601a5UL,0xd74dd7a1UL,0xa361a314UL,0x49ce4934UL,0x8d7b8ddfUL,0x423e429fUL,0x937193cdUL,0xa297a2b1UL,0x04f504a2UL,0xb868b801UL,0x00000000UL,0x742c74b5UL,0xa060a0e0UL,0x211f21c2UL,0x43c8433aUL,0x2ced2c9aUL,0xd9bed90dUL,0xca46ca47UL,0x70d97017UL,0xdd4bddafUL,0x79de79edUL,0x67d467ffUL,0x23e82393UL,0xde4ade5bUL,0xbd6bbd06UL,0x7e2a7ebbUL,0x34e5347bUL,0x3a163ad7UL,0x54c554d2UL,0x62d762f8UL,0xff55ff99UL,0xa794a7b6UL,0x4acf4ac0UL,0x301030d9UL,0x0a060a0eUL,0x98819866UL,0x0bf00babUL,0xcc44ccb4UL,0xd5bad5f0UL,0x3ee33e75UL,0x0ef30eacUL,0x19fe1944UL,0x5bc05bdbUL,0x858a8580UL,0xecadecd3UL,0xdfbcdffeUL,0xd848d8a8UL,0x0c040cfdUL,0x7adf7a19UL,0x58c1582fUL,0x9f759f30UL,0xa563a5e7UL,0x50305070UL,0x2e1a2ecbUL,0x120e12efUL,0xb76db708UL,0xd44cd455UL,0x3c143c24UL,0x5f355f79UL,0x712f71b2UL,0x38e13886UL,0xfda2fdc8UL,0x4fcc4fc7UL,0x4b394b65UL,0xf957f96aUL,0x0df20d58UL,0x9d829d61UL,0xc947c9b3UL,0xefacef27UL,0x32e73288UL,0x7d2b7d4fUL,0xa495a442UL,0xfba0fb3bUL,0xb398b3aaUL,0x68d168f6UL,0x817f8122UL,0xaa66aaeeUL,0x827e82d6UL,0xe6abe6ddUL,0x9e839e95UL,0x45ca45c9UL,0x7b297bbcUL,0x6ed36e05UL,0x443c446cUL,0x8b798b2cUL,0x3de23d81UL,0x271d2731UL,0x9a769a37UL,0x4d3b4d96UL,0xfa56fa9eUL,0xd24ed2a6UL,0x221e2236UL,0x76db76e4UL,0x1e0a1e12UL,0xb46cb4fcUL,0x37e4378fUL,0xe75de778UL,0xb26eb20fUL,0x2aef2a69UL,0xf1a6f135UL,0xe3a8e3daUL,0xf7a4f7c6UL,0x5937598aUL,0x868b8674UL,0x56325683UL,0xc543c54eUL,0xeb59eb85UL,0xc2b7c218UL,0x8f8c8f8eUL,0xac64ac1dUL,0x6dd26df1UL,0x3be03b72UL,0xc7b4c71fUL,0x15fa15b9UL,0x090709faUL,0x6f256fa0UL,0xeaafea20UL,0x898e897dUL,0x20e92067UL,0x28182838UL,0x64d5640bUL,0x83888373UL,0xb16fb1fbUL,0x967296caUL,0x6c246c54UL,0x08f1085fUL,0x52c75221UL,0xf351f364UL,0x652365aeUL,0x847c8425UL,0xbf9cbf57UL,0x6321635dUL,0x7cdd7ceaUL,0x7fdc7f1eUL,0x9186919cUL,0x9485949bUL,0xab90ab4bUL,0xc642c6baUL,0x57c45726UL,0xe5aae529UL,0x73d873e3UL,0x0f050f09UL,0x030103f4UL,0x3612362aUL,0xfea3fe3cUL,0xe15fe18bUL,0x10f910beUL,0x6bd06b02UL,0xa891a8bfUL,0xe858e871UL,0x69276953UL,0xd0b9d0f7UL,0x48384891UL,0x351335deUL,0xceb3cee5UL,0x55335577UL,0xd6bbd604UL,0x90709039UL,0x80898087UL,0xf2a7f2c1UL,0xc1b6c1ecUL,0x6622665aUL,0xad92adb8UL,0x602060a9UL,0xdb49db5cUL,0x1aff1ab0UL,0x887888d8UL,0x8e7a8e2bUL,0x8a8f8a89UL,0x13f8134aUL,0x9b809b92UL,0x39173923UL,0x75da7510UL,0x53315384UL,0x51c651d5UL,0xd3b8d303UL,0x5ec35edcUL,0xcbb0cbe2UL,0x997799c3UL,0x3311332dUL,0x46cb463dUL,0x1ffc1fb7UL,0x61d6610cUL,0x4e3a4e62UL
};

// local table
LOCAL UINT32 groestl_T_local[256*8];
const UINT32 LOCAL *groestl_T0 = &groestl_T_local[0 * 256];
const UINT32 LOCAL *groestl_T1 = &groestl_T_local[1 * 256];
const UINT32 LOCAL *groestl_T2 = &groestl_T_local[2 * 256];
const UINT32 LOCAL *groestl_T3 = &groestl_T_local[3 * 256];
const UINT32 LOCAL *groestl_T4 = &groestl_T_local[4 * 256];
const UINT32 LOCAL *groestl_T5 = &groestl_T_local[5 * 256];
const UINT32 LOCAL *groestl_T6 = &groestl_T_local[6 * 256];
const UINT32 LOCAL *groestl_T7 = &groestl_T_local[7 * 256];

// init, once per kernel
UINT32 nLocalId = LOCALID;
{
for(i = 0; i < 256 * 8; i += WORKSIZE)
groestl_T_local[i + nLocalId ] = groestl_T_init[i + nLocalId];
}

// declarations
UINT32 hash[32]; // hash[16..31] - scratch buffer

UINT32 groestl_BuffB[32];
UINT32 groestl_BuffC[32];
unsigned groestl_i;
unsigned index;

// inlined function body
groestl_BuffC[16] = hash[16] = 0x80;
groestl_BuffC[17] = hash[17] = 0;
groestl_BuffC[18] = hash[18] = 0;
groestl_BuffC[19] = hash[19] = 0;
groestl_BuffC[20] = hash[20] = 0;
groestl_BuffC[21] = hash[21] = 0;
groestl_BuffC[22] = hash[22] = 0;
groestl_BuffC[23] = hash[23] = 0;
groestl_BuffC[24] = hash[24] = 0;
groestl_BuffC[25] = hash[25] = 0;
groestl_BuffC[26] = hash[26] = 0;
groestl_BuffC[27] = hash[27] = 0;
groestl_BuffC[28] = hash[28] = 0;
groestl_BuffC[29] = hash[29] = 0;
groestl_BuffC[30] = hash[30] = 0;
hash[31] = 0x01000000;
groestl_BuffC[31] = 0x01020000L;

#pragma unroll 16
for (groestl_i = 0; groestl_i < 16; groestl_i++)
{
groestl_BuffC[groestl_i] = hash[groestl_i];
}

for(groestl_i=0; groestl_i < 0x0d000000u; groestl_i+=0x01000000u)
{
groestl_QMIX(hash, groestl_BuffB, groestl_i)
groestl_i+=0x01000000u;
groestl_QMIX(groestl_BuffB, hash, groestl_i)
}

for(groestl_i=0; groestl_i<13; ++groestl_i)
{
groestl_PMIX(groestl_BuffC, groestl_BuffB, groestl_i)
++groestl_i;
groestl_PMIX(groestl_BuffB, groestl_BuffC, groestl_i)
}

#pragma unroll 32
for(groestl_i = 0; groestl_i < 32-1; groestl_i++)
{
hash[groestl_i] ^= groestl_BuffC[groestl_i];
groestl_BuffB[groestl_i] = hash[groestl_i];
}
hash[31] ^= 0x00020000UL ^ groestl_BuffC[31];
groestl_BuffB[31] = hash[31];

for(groestl_i = 0; groestl_i < 14;)
{
groestl_PMIX(groestl_BuffB, groestl_BuffC, groestl_i)
++groestl_i;
groestl_PMIX(groestl_BuffC, groestl_BuffB, groestl_i)
++groestl_i;
}

#pragma unroll 16
for(groestl_i = 0; groestl_i < 16; ++groestl_i)
{
hash[groestl_i] = groestl_BuffB[16+groestl_i] ^ hash[16+groestl_i];
}
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
March 05, 2015, 08:45:32 AM
 #268


Pallas,

Are you planning on adding myriad-groestl support in the future? If not, could you explain why not? Is it because your groestl kernel is already faster than the myriad-groestl?

Also, are you planning on putting your work on github? Again, if not, could you explain why not?

It seems to me that both are important ways to further your efforts and establish your reputation.

Best regards as always.

HR

Myr-Groestl must do SHA256 as well, IIRC - of course pure Groestl is faster.

myr-groestl should be faster because its has a single round of groestl (14 iterations) + sha; groestlcoin is groestl + groestl again, so slower.
it's just that I do not have enough free time to work on all these algos.....
Now wolf0 just did a fantastic job on whirlpoolx and I want to understand the magic ;-)

pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
March 05, 2015, 08:56:35 AM
 #269


Pallas,

Are you planning on adding myriad-groestl support in the future? If not, could you explain why not? Is it because your groestl kernel is already faster than the myriad-groestl?

Also, are you planning on putting your work on github? Again, if not, could you explain why not?

It seems to me that both are important ways to further your efforts and establish your reputation.

Best regards as always.

HR

Myr-Groestl must do SHA256 as well, IIRC - of course pure Groestl is faster.

myr-groestl should be faster because its has a single round of groestl (14 iterations) + sha; groestlcoin is groestl + groestl again, so slower.
it's just that I do not have enough free time to work on all these algos.....
Now wolf0 just did a fantastic job on whirlpoolx and I want to understand the magic ;-)

Haha, you ain't seen impressive yet! Check the thread, I'm about to post again!

OMG, this means a lot less reading and TV this week for me LoL!

smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
March 06, 2015, 04:26:47 AM
 #270

I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1's bitsliced and byteshuffled code

Of course I gave you bad advice. Good one is way out of your price range.
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
March 06, 2015, 09:04:19 AM
 #271

I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1's bitsliced and byteshuffled code

I'd really like to see that implemented on sgminer, but I'm not sure it'll be faster: nvidia is a much different architecture.
On a side note, interest in mining groestlcoin based PoW coins is fading because the only coin with enough volume is switching to 1/10 reward soon, the others are dying, dead or with very little reward anyway.

smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
March 07, 2015, 03:55:11 AM
 #272

It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.
Yes, transpose, do bitsliced calculation and transpose back, that will work. Does GCN have something like PMOVMSKB?

Of course I gave you bad advice. Good one is way out of your price range.
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
March 09, 2015, 08:45:42 AM
 #273

I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1's bitsliced and byteshuffled code

I'd really like to see that implemented on sgminer, but I'm not sure it'll be faster: nvidia is a much different architecture.
On a side note, interest in mining groestlcoin based PoW coins is fading because the only coin with enough volume is switching to 1/10 reward soon, the others are dying, dead or with very little reward anyway.

It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.

I've seen what you achived on whirlpoolx: assuming a similar improvement can be made on groestl as well, that would mean more than 80 Mh/s.
Now, since the time I can dedicate to such project is a few minutes a day, it would take months. Volounters? :-)

pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
March 16, 2015, 12:46:38 PM
 #274

I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1's bitsliced and byteshuffled code

I'd really like to see that implemented on sgminer, but I'm not sure it'll be faster: nvidia is a much different architecture.
On a side note, interest in mining groestlcoin based PoW coins is fading because the only coin with enough volume is switching to 1/10 reward soon, the others are dying, dead or with very little reward anyway.

It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.

I've seen what you achived on whirlpoolx: assuming a similar improvement can be made on groestl as well, that would mean more than 80 Mh/s.
Now, since the time I can dedicate to such project is a few minutes a day, it would take months. Volounters? :-)

Hey - a few hours ago, I remembered your OpenCL frustrations with 14.9 and above, and decided to take a look at your Groestlcoin code again. Fixed it up just a little bit, and while the resulting binaries aren't quite as good as the ones using GCN assembly, they outperform the original OpenCL on its intended driver.

Stock Pallas' OpenCL, available in the OP, used with 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinpallas-03162015.png
Modified version of that OpenCL, used with the same 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinwolf-03162015.png

I'm not running old drivers on any rig right now, and I don't intend to change that in the near future, so comparing my numbers to the numbers in the OP, 290X goes from 26.4MH/s to 29.11MH/s - substantial.
Other cards, as well as clocks and such are in the screenshot. Oh, and I know memclock doesn't matter here, but I set it to 1500 by force of habit.

thanks Wolf0, but I already got over 34, see op. (experimental v2, bin some posts ago)
it's 2-3% faster than asm version.
it's only for Hawaii and 14.12, though; 14.9 is damned!
next step is bitslicing, but I do not have the time to work on it ;-)

pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
March 16, 2015, 01:12:34 PM
 #275

I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1's bitsliced and byteshuffled code

I'd really like to see that implemented on sgminer, but I'm not sure it'll be faster: nvidia is a much different architecture.
On a side note, interest in mining groestlcoin based PoW coins is fading because the only coin with enough volume is switching to 1/10 reward soon, the others are dying, dead or with very little reward anyway.

It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.

I've seen what you achived on whirlpoolx: assuming a similar improvement can be made on groestl as well, that would mean more than 80 Mh/s.
Now, since the time I can dedicate to such project is a few minutes a day, it would take months. Volounters? :-)

Hey - a few hours ago, I remembered your OpenCL frustrations with 14.9 and above, and decided to take a look at your Groestlcoin code again. Fixed it up just a little bit, and while the resulting binaries aren't quite as good as the ones using GCN assembly, they outperform the original OpenCL on its intended driver.

Stock Pallas' OpenCL, available in the OP, used with 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinpallas-03162015.png
Modified version of that OpenCL, used with the same 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinwolf-03162015.png

I'm not running old drivers on any rig right now, and I don't intend to change that in the near future, so comparing my numbers to the numbers in the OP, 290X goes from 26.4MH/s to 29.11MH/s - substantial.
Other cards, as well as clocks and such are in the screenshot. Oh, and I know memclock doesn't matter here, but I set it to 1500 by force of habit.

thanks Wolf0, but I already got over 34, see op. (experimental v2, bin some posts ago)
it's 2-3% faster than asm version.
it's only for Hawaii and 14.12, though; 14.9 is damned!
next step is bitslicing, but I do not have the time to work on it ;-)

As I said - I noticed. However, notice the 280X speeds? You haven't been able to create binaries that good for any chip but Hawaii, AFAIK.

I do not have the card so I can't test it, but I know that on hawaii it can use two wavefronts, but only 1 on tahiti.
Does your kernel run 2 wavefronts on tahiti, as the asm version does?

iju76
Full Member
***
Offline Offline

Activity: 194
Merit: 100


View Profile
March 16, 2015, 01:44:41 PM
 #276

win7-64 -- sgminer-5-dev-neoscrypt-windows-new2 -- dr-14.7

http://s001.radikal.ru/i194/1503/f3/09a2627a6270.png
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
March 16, 2015, 03:33:40 PM
 #277

I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1's bitsliced and byteshuffled code
Hi,

Because of my curiosity I really had to check that bitsliced code Cheesy and well... I must say that NV has better instructions to do it:
__byte_perm(x, 0, 1010)>>s:  this could be emulated by an AND and a MAD24 and az SHR. 3 instead of 2 cycle.
__byte_perm(x, 0, 3232)>>s:  SHR, MAD24, SHR   also 3 instead of 2.
__byte_perm(x, y, 5410)      :  SHL, BFE      2 instead of 1 instr.  (Even the Intel SSE has many instructions for these things since ages :S)
And there are lots of bitwise logical instructions where NV is 2x faster because NV has a 3 op logic instruction with all the possible 16*16 logic operator combinations.
There are shuffling between 4 lanes: That is not a problem on GCN with ds_swizzle, otherwise it needs LDS on OpenCL.
I've just checked the GCN 1.3 ISA manual and (at least there) I haven't found byte_swizzle and no 3 operand logic instructions either.

Anyways, It would be interesting that how this totally different approach can perform compared to the table based one.
smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
March 16, 2015, 07:47:11 PM
 #278

Yes, transpose, do bitsliced calculation and transpose back, that will work. Does GCN have something like PMOVMSKB?
I don't think so.
May be VCC (vector condition code) will do the trick, so normal and bitsliced operations could be cheaply interleaved

NV has a 3 op logic instruction with all the possible 16*16 logic operator combinations.
I've just checked the GCN 1.3 ISA manual and (at least there) I haven't found byte_swizzle and no 3 operand logic instructions either.
Yes, AMD's GCN is overplayed by VPTERNLOGD and VPTERNLOGQ from Intel AVX512 and LOP3.LUT by NVidia Sad

Of course I gave you bad advice. Good one is way out of your price range.
smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
March 16, 2015, 07:54:01 PM
 #279

Hmm... it's one hell of a lot harder than I anticipated to lose two goddamned VGPRs than I thought it'd be.
Have you rotated table values left by 3 bits? Wink Not sure it will help with register usage through...

Of course I gave you bad advice. Good one is way out of your price range.
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
March 16, 2015, 09:02:43 PM
 #280

Hmm... it's one hell of a lot harder than I anticipated to lose two goddamned VGPRs than I thought it'd be.
Have you rotated table values left by 3 bits? Wink Not sure it will help with register usage through...

Rotations seem to hurt reg usage a bit. The source REALLY needs cleaning, but IMO, it's rather well done code by Pallas. I'm not really used to seeing anyone with a semblance of clue doing AMD miners.  Tongue

Now I've put some parts of the code (ex. the list of rbtts) in pragma unrolled for loops and it looks much better ;-)

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 [14] 15 16 17 18 19 20 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!