Bitcoin Forum
May 10, 2024, 05:15:14 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 [10] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ... 78 »
  Print  
Author Topic: [ANN] [SKC] Skeincoin 0.9.3.1 | Skein-SHA2  (Read 161507 times)
feeleep
Legendary
*
Offline Offline

Activity: 1197
Merit: 1000


View Profile WWW
November 14, 2013, 02:44:48 PM
 #181

Just noticed on skc.coinmine.pl:
Quote
Pool is closed. If you want to withdraw your coins please send an email to coinmine.pl@gmail.com with account details and coin balance.
Why?

My mistake - not this pool Wink

"Governments are good at cutting off the heads of a centrally controlled networks like Napster, but pure P2P networks like Gnutella and Tor seem to be holding their own." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715361314
Hero Member
*
Offline Offline

Posts: 1715361314

View Profile Personal Message (Offline)

Ignore
1715361314
Reply with quote  #2

1715361314
Report to moderator
asor
Hero Member
*****
Offline Offline

Activity: 809
Merit: 1008



View Profile
November 14, 2013, 05:09:14 PM
Last edit: November 14, 2013, 05:20:49 PM by asor
 #182

Quote
were you around when orbitcoin happened? or where you there when gamecoin was dumped? or recently have you seen COL? if you still think like that, you are yet to understand the norms of cryptos.

I don't really need another rebuttal.  I was trying to inject some common sense, but ended up with your paranoid reply.

I understand greedy unskilled cocksuckers is the norm for coins.  As for the condescension, I don't need a lecture.  I've been around the block a few times.

What I was trying to push across is if someone had a GPU miner, and put in the effort into it, they would be more likely to release it, instead of killing a worthless coin and slipping into the darkness.  Hyping, then dumping X coin isn't quite the same.  Come to me with proof the coin dev has a gpu implementation.

 I am running two amd quad core phenom II, and a pentium p6000 in a laptop.  I would imagine i7's and i5's on their own would do better than my whole setup. I come in at ~7-8mhash. A little bit of MATH and REASON would show reorder has an office full of machines at his disposal, and not tuning a gpu miner. When gpu miners hit, you will know it.  You will pray for a diff of 11 again.

I've a half-way working GPU (OpenCL) implementation for SKC but it's currently for testing purposes only (no fully working miner). It's been the first time I wrote code for GPU's, hence the code is poorly written with some register spillings (doesn't take any effect to the result, but lowers the performance) and some other issues. So I think other developers can create much better code. I can't say anything regarding hashing speed or similar... but for OpenCL developers it should be pretty simple to implement the given algorithm...
Gunther
Legendary
*
Offline Offline

Activity: 840
Merit: 1000


View Profile
November 14, 2013, 05:42:07 PM
 #183

It just irritates me a little bit that the developer doesn't give any feedback on our comments. What does he want with this coin?
The website of Skeincoin isn't updated since november.

Is it worth it to keep mining SKC, or not?

I think the coin is something different than other alt-coins, but without feedback it's just another coin, waiting to fade away.

And Asor: great work. Maybe the coin will finally get the attention it deserves! Mining goes very fast atm with 5 computers (CPU, dual/quadcore). Does it stay that way? Or will it be more difficult just like Litecoin?

milly6
Legendary
*
Offline Offline

Activity: 1632
Merit: 1010



View Profile WWW
November 14, 2013, 05:46:12 PM
 #184

It just irritates me a little bit that the developer doesn't give any feedback on our comments. What does he want with this coin?
The website of Skeincoin isn't updated since november.

Is it worth it to keep mining SKC, or not?

I think the coin is something different than other alt-coins, but without feedback it's just another coin, waiting to fade away.

And Asor: great work. Maybe the coin will finally get the attention it deserves! Mining goes very fast atm with 5 computers (CPU, dual/quadcore). Does it stay that way? Or will it be more difficult just like Litecoin?



It is November.

Eyes open, No Fear. Be Safe! Trinity: Currency Without Bias
Gunther
Legendary
*
Offline Offline

Activity: 840
Merit: 1000


View Profile
November 14, 2013, 05:52:55 PM
 #185

It just irritates me a little bit that the developer doesn't give any feedback on our comments. What does he want with this coin?
The website of Skeincoin isn't updated since november.

Is it worth it to keep mining SKC, or not?

I think the coin is something different than other alt-coins, but without feedback it's just another coin, waiting to fade away.

And Asor: great work. Maybe the coin will finally get the attention it deserves! Mining goes very fast atm with 5 computers (CPU, dual/quadcore). Does it stay that way? Or will it be more difficult just like Litecoin?



It is November.

Got ya...

 Wink
Red Kendra (OP)
Newbie
*
Offline Offline

Activity: 59
Merit: 0


View Profile
November 14, 2013, 06:33:46 PM
 #186

I will offer a small reward of 2 LTC and 512 SKC for the first working, open-source GPU miner for Skeincoin.

Like asor mentioned, it shouldn't be too difficult to implement for someone with experience of GPU programming.

Others feel free to add to this bounty.
Red Kendra (OP)
Newbie
*
Offline Offline

Activity: 59
Merit: 0


View Profile
November 14, 2013, 07:05:25 PM
Last edit: November 17, 2013, 02:21:07 PM by Red Kendra
 #187

CPU mining guide - Digital Ocean

1. Sign up for an account at Digital Ocean
   Consider using this referal code - https://www.digitalocean.com/?refcode=5ab24fa2bd9a
2. Create a droplet
   Enter host name
   A 512 MB/1 CPU droplet works fine
   Select a region
   Choose Ubuntu 13.04 x64 as your operating system
   Choose "Create Droplet"
3. Once completed you will receive your droplet IP and root password via e-mail
4. Use putty or similar ssh client to connect to the IP of your droplet
5. Login as root with password provided
6. Paste the following script in your ssh client

Code:
{
# Make swap space
sudo dd if=/dev/zero of=/swapfile bs=64M count=16
sudo mkswap /swapfile
sudo swapon /swapfile

# Install libraries
apt-get install build-essential curl git libboost-all-dev libdb++5.3-dev libdb5.3++-dev libminiupnpc-dev libssl-dev m4 -y

# Install GMP
cd ~/
rm -rf gmp-5.1.2.tar.bz2 gmp-5.1.2
wget http://mirrors.kernel.org/gnu/gmp/gmp-5.1.2.tar.bz2
tar xjvf gmp-5.1.2.tar.bz2
cd gmp-5.1.2
./configure --enable-cxx
make
make install
rm -rf gmp-5.1.2.tar.bz2 gmp-5.1.2
cd ~/

# Install Skeincoin
git clone git://github.com/skeincoin/skeincoin.git
cd ~/skeincoin/src
make -f makefile.unix USE_UPNP=- DEBUGFLAGS="" CXXFLAGS=

sudo cp ./skeincoind /usr/local/bin/

mkdir ~/.skeincoin
echo "rpcusername=myskeincoinusername
rpcpassword=$(cat /dev/urandom | tr -cd '[:alnum:]' | head -c32)
listen=1
server=1
maxconnections=200
gen=1
genproclimit=-1" > ~/.skeincoin/skeincoin.conf


skeincoind --daemon

}

7. Monitor your daemon with:

Code:
skeincoind getinfo

8. Send your coins to another wallet with:

Code:
skeincoind sendtoaddress <skeincoinaddress> <amount>
reorder
Sr. Member
****
Offline Offline

Activity: 462
Merit: 250


View Profile
November 14, 2013, 08:47:05 PM
 #188

Not to claim the bounty, but here goes my kernel. Poclbm API, hashes at ~140MHs @7870 @1150MHz. Probably best with modern SDK and GCN, being vectors and all. Feel free to use it whatever you like.
Code:
#include "crycl.h"

inline uint sha256_res(uint16 data)
{
    u32 temp1, W[62];
    vstore16(TOBE32V16(data), 0, W);

#define S0(x) (ROL32(x, 25) ^ ROL32(x, 14) ^  SHR(x, 3))
#define S1(x) (ROL32(x, 15) ^ ROL32(x, 13) ^  SHR(x, 10))

#define S2(x) (ROL32(x, 30) ^ ROL32(x, 19) ^ ROL32(x, 10))
#define S3(x) (ROL32(x, 26) ^ ROL32(x, 21) ^ ROL32(x, 7))

#define F0(y, x, z) bitselect(z, y, z ^ x)
#define F1(x, y, z) bitselect(z, y, x)

#define R(t)                                    \
(                                               \
    W[t] = S1(W[t -  2]) + W[t -  7] +          \
           S0(W[t - 15]) + W[t - 16]            \
)

#define RD(t)                                   \
(                                               \
    S1(W[t -  2]) + W[t -  7] +                 \
           S0(W[t - 15]) + W[t - 16]            \
)

#define P(a,b,c,d,e,f,g,h,x,K)                  \
{                                               \
    temp1 = h + S3(e) + F1(e,f,g) + K + x;      \
    d += temp1; h = temp1 + S2(a) + F0(a,b,c);  \
}
#define PS(a,b,c,d,e,f,g,h,S)                  \
{                                               \
    temp1 = h + S3(e) + F1(e,f,g) + S;      \
    d += temp1; h = temp1 + S2(a) + F0(a,b,c);              \
}

#define PSLAST(a,b,c,d,e,f,g,h,S)                  \
{                                               \
    d += h + S3(e) + F1(e,f,g) + S;              \
}

    uint8 state = (uint8)(0x6A09E667, 0xBB67AE85, 0x3C6EF372, 0xA54FF53A, 0x510E527F, 0x9B05688C, 0x1F83D9AB, 0x5BE0CD19);
    uint8 vars = state;

    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, W[ 0], 0x428A2F98 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, W[ 1], 0x71374491 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, W[ 2], 0xB5C0FBCF );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, W[ 3], 0xE9B5DBA5 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, W[ 4], 0x3956C25B );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, W[ 5], 0x59F111F1 );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, W[ 6], 0x923F82A4 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, W[ 7], 0xAB1C5ED5 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, W[ 8], 0xD807AA98 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, W[ 9], 0x12835B01 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, W[10], 0x243185BE );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, W[11], 0x550C7DC3 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, W[12], 0x72BE5D74 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, W[13], 0x80DEB1FE );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, W[14], 0x9BDC06A7 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, W[15], 0xC19BF174 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(16), 0xE49B69C1 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(17), 0xEFBE4786 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(18), 0x0FC19DC6 );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(19), 0x240CA1CC );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(20), 0x2DE92C6F );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(21), 0x4A7484AA );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(22), 0x5CB0A9DC );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(23), 0x76F988DA );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(24), 0x983E5152 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(25), 0xA831C66D );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(26), 0xB00327C8 );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(27), 0xBF597FC7 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(28), 0xC6E00BF3 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(29), 0xD5A79147 );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(30), 0x06CA6351 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(31), 0x14292967 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(32), 0x27B70A85 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(33), 0x2E1B2138 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(34), 0x4D2C6DFC );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(35), 0x53380D13 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(36), 0x650A7354 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(37), 0x766A0ABB );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(38), 0x81C2C92E );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(39), 0x92722C85 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(40), 0xA2BFE8A1 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(41), 0xA81A664B );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(42), 0xC24B8B70 );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(43), 0xC76C51A3 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(44), 0xD192E819 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(45), 0xD6990624 );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(46), 0xF40E3585 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(47), 0x106AA070 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(48), 0x19A4C116 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(49), 0x1E376C08 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(50), 0x2748774C );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(51), 0x34B0BCB5 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(52), 0x391C0CB3 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(53), 0x4ED8AA4A );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(54), 0x5B9CCA4F );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(55), 0x682E6FF3 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(56), 0x748F82EE );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(57), 0x78A5636F );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(58), 0x84C87814 );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(59), 0x8CC70208 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(60), 0x90BEFFFA );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(61), 0xA4506CEB );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, RD(62), 0xBEF9A3F7 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, RD(63), 0xC67178F2 );

    state += vars;
    vars = state;

    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x80000000 + 0x428A2F98 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0 + 0x71374491 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0 + 0xB5C0FBCF );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0 + 0xE9B5DBA5 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0 + 0x3956C25B );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0 + 0x59F111F1 );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0 + 0x923F82A4 );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0 + 0xAB1C5ED5 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0 + 0xD807AA98 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0 + 0x12835B01 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0 + 0x243185BE );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0 + 0x550C7DC3 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0 + 0x72BE5D74 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0 + 0x80DEB1FE );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0 + 0x9BDC06A7 );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 512 + 0xC19BF174 );

    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x80000000 + 0xE49B69C1 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x01400000 + 0xEFBE4786 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0x00205000 + 0x0FC19DC6 );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0x00005088 + 0x240CA1CC );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0x22000800 + 0x2DE92C6F );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0x22550014 + 0x4A7484AA );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0x05089742 + 0x5CB0A9DC );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0xa0000020 + 0x76F988DA );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x5a880000 + 0x983E5152 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x005c9400 + 0xA831C66D );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0x0016d49d + 0xB00327C8 );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0xfa801f00 + 0xBF597FC7 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0xd33225d0 + 0xC6E00BF3 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0x11675959 + 0xD5A79147 );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0xf6e6bfda + 0x06CA6351 );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0xb30c1549 + 0x14292967 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x08b2b050 + 0x27B70A85 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x9d7c4c27 + 0x2E1B2138 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0x0ce2a393 + 0x4D2C6DFC );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0x88e6e1ea + 0x53380D13 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0xa52b4335 + 0x650A7354 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0x67a16f49 + 0x766A0ABB );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0xd732016f + 0x81C2C92E );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0x4eeb2e91 + 0x92722C85 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x5dbf55e5 + 0xA2BFE8A1 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x8eee2335 + 0xA81A664B );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0xe2bc5ec2 + 0xC24B8B70 );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0xa83f4394 + 0xC76C51A3 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0x45ad78f7 + 0xD192E819 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0x36f3d0cd + 0xD6990624 );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0xd99c05e8 + 0xF40E3585 );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0xb0511dc7 + 0x106AA070 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x69bc7ac4 + 0x19A4C116 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0xbd11375b + 0x1E376C08 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0xe3ba71e5 + 0x2748774C );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0x3b209ff2 + 0x34B0BCB5 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0x18feee17 + 0x391C0CB3 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0xe25ad9e7 + 0x4ED8AA4A );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0x13375046 + 0x5B9CCA4F );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0x0515089d + 0x682E6FF3 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x4f0d0f04 + 0x748F82EE );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x2627484e + 0x78A5636F );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0x310128d2 + 0x84C87814 );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0xc668b434 + 0x8CC70208 );
    PSLAST( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0x420841cc + 0x90BEFFFA );

    return vars.s7 + state.s7;
}

#define rolhackl(n) \
inline ulong rol ## n  (ulong l) \
{ \
    uint2 t = rotate(as_uint2(l), (n)); \
    return as_ulong((uint2)(bitselect(t.s0, t.s1, (uint)(1 << (n)) - 1), bitselect(t.s0, t.s1, (uint)(~((1 << (n)) - 1))))); \
}

rolhackl(8)
rolhackl(9)
rolhackl(10)
rolhackl(13)
rolhackl(14)
rolhackl(17)
rolhackl(19)
rolhackl(22)
rolhackl(24)
rolhackl(25)
rolhackl(27)
rolhackl(29)
rolhackl(30)

#define rolhackr(n) \
inline ulong rol ## n  (ulong l) \
{ \
    uint2 t = rotate(as_uint2(l), (n - 32)); \
    return as_ulong((uint2)(bitselect(t.s1, t.s0, (uint)(1 << (n - 32)) - 1), bitselect(t.s1, t.s0, (uint)(~((1 << (n - 32)) - 1))))); \
}

rolhackr(33)
rolhackr(34)
rolhackr(35)
rolhackr(36)
rolhackr(37)
rolhackr(39)
rolhackr(42)
rolhackr(43)
rolhackr(44)
rolhackr(46)
rolhackr(49)
rolhackr(50)
rolhackr(54)
rolhackr(56)

#define SKEIN_ROL_0_0(x) rol46(x)
#define SKEIN_ROL_0_1(x) rol36(x)
#define SKEIN_ROL_0_2(x) rol19(x)
#define SKEIN_ROL_0_3(x) rol37(x)
#define SKEIN_ROL_1_0(x) rol33(x)
#define SKEIN_ROL_1_1(x) rol27(x)
#define SKEIN_ROL_1_2(x) rol14(x)
#define SKEIN_ROL_1_3(x) rol42(x)
#define SKEIN_ROL_2_0(x) rol17(x)
#define SKEIN_ROL_2_1(x) rol49(x)
#define SKEIN_ROL_2_2(x) rol36(x)
#define SKEIN_ROL_2_3(x) rol39(x)
#define SKEIN_ROL_3_0(x) rol44(x)
#define SKEIN_ROL_3_1(x) rol9(x)
#define SKEIN_ROL_3_2(x) rol54(x)
#define SKEIN_ROL_3_3(x) rol56(x)
#define SKEIN_ROL_4_0(x) rol39(x)
#define SKEIN_ROL_4_1(x) rol30(x)
#define SKEIN_ROL_4_2(x) rol34(x)
#define SKEIN_ROL_4_3(x) rol24(x)
#define SKEIN_ROL_5_0(x) rol13(x)
#define SKEIN_ROL_5_1(x) rol50(x)
#define SKEIN_ROL_5_2(x) rol10(x)
#define SKEIN_ROL_5_3(x) rol17(x)
#define SKEIN_ROL_6_0(x) rol25(x)
#define SKEIN_ROL_6_1(x) rol29(x)
#define SKEIN_ROL_6_2(x) rol39(x)
#define SKEIN_ROL_6_3(x) rol43(x)
#define SKEIN_ROL_7_0(x) rol8(x)
#define SKEIN_ROL_7_1(x) rol35(x)
#define SKEIN_ROL_7_2(x) rol56(x)
#define SKEIN_ROL_7_3(x) rol22(x)

#define SKEIN_KS_PARITY         0x1BD11BDAA9FC1A22UL

#define SKEIN_R512(p0,p1,p2,p3,p4,p5,p6,p7,ROTS)                      \
    X.s##p0 += X.s##p1; \
    X.s##p2 += X.s##p3; \
    X.s##p4 += X.s##p5; \
    X.s##p6 += X.s##p7; \
    X.s##p1 = SKEIN_ROL_ ## ROTS ## _0(X.s##p1) ^ X.s##p0; \
    X.s##p3 = SKEIN_ROL_ ## ROTS ## _1(X.s##p3) ^ X.s##p2; \
    X.s##p5 = SKEIN_ROL_ ## ROTS ## _2(X.s##p5) ^ X.s##p4; \
    X.s##p7 = SKEIN_ROL_ ## ROTS ## _3(X.s##p7) ^ X.s##p6;

#define SKEIN_I512(R)                                                     \
    X.s0   += ks[((R)+1) % 9];   /* inject the key schedule value */  \
    X.s1   += ks[((R)+2) % 9];                                        \
    X.s2   += ks[((R)+3) % 9];                                        \
    X.s3   += ks[((R)+4) % 9];                                        \
    X.s4   += ks[((R)+5) % 9];                                        \
    X.s5   += ks[((R)+6) % 9] + ts[((R)+1) % 3];                      \
    X.s6   += ks[((R)+7) % 9] + ts[((R)+2) % 3];                      \
    X.s7   += ks[((R)+8) % 9] +     (R)+1;                            \

#define SKEIN_R512_8_rounds(R) \
        SKEIN_R512(0,1,2,3,4,5,6,7, 0);   \
        SKEIN_R512(2,1,4,7,6,5,0,3, 1);   \
        SKEIN_R512(4,1,6,3,0,5,2,7, 2);   \
        SKEIN_R512(6,1,0,7,2,5,4,3, 3);   \
        SKEIN_I512(2*(R));                              \
        SKEIN_R512(0,1,2,3,4,5,6,7, 4);   \
        SKEIN_R512(2,1,4,7,6,5,0,3, 5);   \
        SKEIN_R512(4,1,6,3,0,5,2,7, 6);   \
        SKEIN_R512(6,1,0,7,2,5,4,3, 7);   \
        SKEIN_I512(2*(R)+1);

inline ulong8 skein512_mid_impl(ulong8 X, ulong2 msg)
{
    u64 ts[3], ks[9];

    vstore8(X, 0, ks);
    X.s01 += msg;

    ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^
            ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;

    ts[0] = 80;
    ts[1] = 176UL << 56;
    ts[2] = 0xB000000000000050UL;

    X.s5 += 80;
    X.s6 += 176UL << 56;

    SKEIN_R512_8_rounds( 0);
    SKEIN_R512_8_rounds( 1);
    SKEIN_R512_8_rounds( 2);
    SKEIN_R512_8_rounds( 3);
    SKEIN_R512_8_rounds( 4);
    SKEIN_R512_8_rounds( 5);
    SKEIN_R512_8_rounds( 6);
    SKEIN_R512_8_rounds( 7);
    SKEIN_R512_8_rounds( 8);

    X.s01 ^= msg;
    vstore8(X, 0, ks);

    ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^
            ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;

    ts[0] = 8UL;
    ts[1] = 255UL << 56;
    ts[2] = 0xFF00000000000008UL;

    X.s5 += 8UL;
    X.s6 += 255UL << 56;

    SKEIN_R512_8_rounds( 0);
    SKEIN_R512_8_rounds( 1);
    SKEIN_R512_8_rounds( 2);
    SKEIN_R512_8_rounds( 3);
    SKEIN_R512_8_rounds( 4);
    SKEIN_R512_8_rounds( 5);
    SKEIN_R512_8_rounds( 6);
    SKEIN_R512_8_rounds( 7);
    SKEIN_R512_8_rounds( 8);

    return TOLE64V8(X);
}

__kernel void search(const u64 state0, const u64 state1, const u64 state2, const u64 state3,
                     const u64 state4, const u64 state5, const u64 state6, const u64 state7,
                     const u32 data16, const u32 data17, const u32 data18,
                     const u32 base,
                     __global u32* output)
{
    local u32 nonce;
    nonce = FROMLE32(base) + get_global_id(0);
    ulong8 state = (ulong8)(FROMLE64(state0), FROMLE64(state1), FROMLE64(state2), FROMLE64(state3),
                          FROMLE64(state4), FROMLE64(state5), FROMLE64(state6), FROMLE64(state7));

    ulong2 msg = as_ulong2((uint4)(data16, data17, data18, TOBE32(nonce)));

    if(sha256_res(as_uint16(skein512_mid_impl(state, msg)))/* & 0xc0ffffff*/)
        return;
    output[OUTPUT_SIZE] = output[nonce & OUTPUT_MASK] = nonce;
}

asor
Hero Member
*****
Offline Offline

Activity: 809
Merit: 1008



View Profile
November 14, 2013, 10:39:34 PM
 #189

Not to claim the bounty, but here goes my kernel. Poclbm API, hashes at ~140MHs @7870 @1150MHz. Probably best with modern SDK and GCN, being vectors and all. Feel free to use it whatever you like.
Code:
#include "crycl.h"

inline uint sha256_res(uint16 data)
{
    u32 temp1, W[62];
    vstore16(TOBE32V16(data), 0, W);

#define S0(x) (ROL32(x, 25) ^ ROL32(x, 14) ^  SHR(x, 3))
#define S1(x) (ROL32(x, 15) ^ ROL32(x, 13) ^  SHR(x, 10))

#define S2(x) (ROL32(x, 30) ^ ROL32(x, 19) ^ ROL32(x, 10))
#define S3(x) (ROL32(x, 26) ^ ROL32(x, 21) ^ ROL32(x, 7))

#define F0(y, x, z) bitselect(z, y, z ^ x)
#define F1(x, y, z) bitselect(z, y, x)

#define R(t)                                    \
(                                               \
    W[t] = S1(W[t -  2]) + W[t -  7] +          \
           S0(W[t - 15]) + W[t - 16]            \
)

#define RD(t)                                   \
(                                               \
    S1(W[t -  2]) + W[t -  7] +                 \
           S0(W[t - 15]) + W[t - 16]            \
)

#define P(a,b,c,d,e,f,g,h,x,K)                  \
{                                               \
    temp1 = h + S3(e) + F1(e,f,g) + K + x;      \
    d += temp1; h = temp1 + S2(a) + F0(a,b,c);  \
}
#define PS(a,b,c,d,e,f,g,h,S)                  \
{                                               \
    temp1 = h + S3(e) + F1(e,f,g) + S;      \
    d += temp1; h = temp1 + S2(a) + F0(a,b,c);              \
}

#define PSLAST(a,b,c,d,e,f,g,h,S)                  \
{                                               \
    d += h + S3(e) + F1(e,f,g) + S;              \
}

    uint8 state = (uint8)(0x6A09E667, 0xBB67AE85, 0x3C6EF372, 0xA54FF53A, 0x510E527F, 0x9B05688C, 0x1F83D9AB, 0x5BE0CD19);
    uint8 vars = state;

    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, W[ 0], 0x428A2F98 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, W[ 1], 0x71374491 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, W[ 2], 0xB5C0FBCF );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, W[ 3], 0xE9B5DBA5 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, W[ 4], 0x3956C25B );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, W[ 5], 0x59F111F1 );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, W[ 6], 0x923F82A4 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, W[ 7], 0xAB1C5ED5 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, W[ 8], 0xD807AA98 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, W[ 9], 0x12835B01 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, W[10], 0x243185BE );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, W[11], 0x550C7DC3 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, W[12], 0x72BE5D74 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, W[13], 0x80DEB1FE );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, W[14], 0x9BDC06A7 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, W[15], 0xC19BF174 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(16), 0xE49B69C1 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(17), 0xEFBE4786 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(18), 0x0FC19DC6 );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(19), 0x240CA1CC );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(20), 0x2DE92C6F );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(21), 0x4A7484AA );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(22), 0x5CB0A9DC );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(23), 0x76F988DA );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(24), 0x983E5152 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(25), 0xA831C66D );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(26), 0xB00327C8 );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(27), 0xBF597FC7 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(28), 0xC6E00BF3 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(29), 0xD5A79147 );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(30), 0x06CA6351 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(31), 0x14292967 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(32), 0x27B70A85 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(33), 0x2E1B2138 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(34), 0x4D2C6DFC );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(35), 0x53380D13 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(36), 0x650A7354 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(37), 0x766A0ABB );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(38), 0x81C2C92E );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(39), 0x92722C85 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(40), 0xA2BFE8A1 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(41), 0xA81A664B );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(42), 0xC24B8B70 );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(43), 0xC76C51A3 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(44), 0xD192E819 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(45), 0xD6990624 );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(46), 0xF40E3585 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(47), 0x106AA070 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(48), 0x19A4C116 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(49), 0x1E376C08 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(50), 0x2748774C );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(51), 0x34B0BCB5 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(52), 0x391C0CB3 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(53), 0x4ED8AA4A );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(54), 0x5B9CCA4F );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(55), 0x682E6FF3 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(56), 0x748F82EE );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(57), 0x78A5636F );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(58), 0x84C87814 );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(59), 0x8CC70208 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(60), 0x90BEFFFA );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(61), 0xA4506CEB );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, RD(62), 0xBEF9A3F7 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, RD(63), 0xC67178F2 );

    state += vars;
    vars = state;

    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x80000000 + 0x428A2F98 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0 + 0x71374491 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0 + 0xB5C0FBCF );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0 + 0xE9B5DBA5 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0 + 0x3956C25B );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0 + 0x59F111F1 );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0 + 0x923F82A4 );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0 + 0xAB1C5ED5 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0 + 0xD807AA98 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0 + 0x12835B01 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0 + 0x243185BE );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0 + 0x550C7DC3 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0 + 0x72BE5D74 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0 + 0x80DEB1FE );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0 + 0x9BDC06A7 );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 512 + 0xC19BF174 );

    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x80000000 + 0xE49B69C1 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x01400000 + 0xEFBE4786 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0x00205000 + 0x0FC19DC6 );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0x00005088 + 0x240CA1CC );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0x22000800 + 0x2DE92C6F );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0x22550014 + 0x4A7484AA );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0x05089742 + 0x5CB0A9DC );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0xa0000020 + 0x76F988DA );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x5a880000 + 0x983E5152 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x005c9400 + 0xA831C66D );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0x0016d49d + 0xB00327C8 );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0xfa801f00 + 0xBF597FC7 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0xd33225d0 + 0xC6E00BF3 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0x11675959 + 0xD5A79147 );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0xf6e6bfda + 0x06CA6351 );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0xb30c1549 + 0x14292967 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x08b2b050 + 0x27B70A85 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x9d7c4c27 + 0x2E1B2138 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0x0ce2a393 + 0x4D2C6DFC );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0x88e6e1ea + 0x53380D13 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0xa52b4335 + 0x650A7354 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0x67a16f49 + 0x766A0ABB );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0xd732016f + 0x81C2C92E );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0x4eeb2e91 + 0x92722C85 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x5dbf55e5 + 0xA2BFE8A1 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x8eee2335 + 0xA81A664B );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0xe2bc5ec2 + 0xC24B8B70 );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0xa83f4394 + 0xC76C51A3 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0x45ad78f7 + 0xD192E819 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0x36f3d0cd + 0xD6990624 );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0xd99c05e8 + 0xF40E3585 );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0xb0511dc7 + 0x106AA070 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x69bc7ac4 + 0x19A4C116 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0xbd11375b + 0x1E376C08 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0xe3ba71e5 + 0x2748774C );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0x3b209ff2 + 0x34B0BCB5 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0x18feee17 + 0x391C0CB3 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0xe25ad9e7 + 0x4ED8AA4A );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0x13375046 + 0x5B9CCA4F );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0x0515089d + 0x682E6FF3 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x4f0d0f04 + 0x748F82EE );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x2627484e + 0x78A5636F );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0x310128d2 + 0x84C87814 );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0xc668b434 + 0x8CC70208 );
    PSLAST( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0x420841cc + 0x90BEFFFA );

    return vars.s7 + state.s7;
}

#define rolhackl(n) \
inline ulong rol ## n  (ulong l) \
{ \
    uint2 t = rotate(as_uint2(l), (n)); \
    return as_ulong((uint2)(bitselect(t.s0, t.s1, (uint)(1 << (n)) - 1), bitselect(t.s0, t.s1, (uint)(~((1 << (n)) - 1))))); \
}

rolhackl(8)
rolhackl(9)
rolhackl(10)
rolhackl(13)
rolhackl(14)
rolhackl(17)
rolhackl(19)
rolhackl(22)
rolhackl(24)
rolhackl(25)
rolhackl(27)
rolhackl(29)
rolhackl(30)

#define rolhackr(n) \
inline ulong rol ## n  (ulong l) \
{ \
    uint2 t = rotate(as_uint2(l), (n - 32)); \
    return as_ulong((uint2)(bitselect(t.s1, t.s0, (uint)(1 << (n - 32)) - 1), bitselect(t.s1, t.s0, (uint)(~((1 << (n - 32)) - 1))))); \
}

rolhackr(33)
rolhackr(34)
rolhackr(35)
rolhackr(36)
rolhackr(37)
rolhackr(39)
rolhackr(42)
rolhackr(43)
rolhackr(44)
rolhackr(46)
rolhackr(49)
rolhackr(50)
rolhackr(54)
rolhackr(56)

#define SKEIN_ROL_0_0(x) rol46(x)
#define SKEIN_ROL_0_1(x) rol36(x)
#define SKEIN_ROL_0_2(x) rol19(x)
#define SKEIN_ROL_0_3(x) rol37(x)
#define SKEIN_ROL_1_0(x) rol33(x)
#define SKEIN_ROL_1_1(x) rol27(x)
#define SKEIN_ROL_1_2(x) rol14(x)
#define SKEIN_ROL_1_3(x) rol42(x)
#define SKEIN_ROL_2_0(x) rol17(x)
#define SKEIN_ROL_2_1(x) rol49(x)
#define SKEIN_ROL_2_2(x) rol36(x)
#define SKEIN_ROL_2_3(x) rol39(x)
#define SKEIN_ROL_3_0(x) rol44(x)
#define SKEIN_ROL_3_1(x) rol9(x)
#define SKEIN_ROL_3_2(x) rol54(x)
#define SKEIN_ROL_3_3(x) rol56(x)
#define SKEIN_ROL_4_0(x) rol39(x)
#define SKEIN_ROL_4_1(x) rol30(x)
#define SKEIN_ROL_4_2(x) rol34(x)
#define SKEIN_ROL_4_3(x) rol24(x)
#define SKEIN_ROL_5_0(x) rol13(x)
#define SKEIN_ROL_5_1(x) rol50(x)
#define SKEIN_ROL_5_2(x) rol10(x)
#define SKEIN_ROL_5_3(x) rol17(x)
#define SKEIN_ROL_6_0(x) rol25(x)
#define SKEIN_ROL_6_1(x) rol29(x)
#define SKEIN_ROL_6_2(x) rol39(x)
#define SKEIN_ROL_6_3(x) rol43(x)
#define SKEIN_ROL_7_0(x) rol8(x)
#define SKEIN_ROL_7_1(x) rol35(x)
#define SKEIN_ROL_7_2(x) rol56(x)
#define SKEIN_ROL_7_3(x) rol22(x)

#define SKEIN_KS_PARITY         0x1BD11BDAA9FC1A22UL

#define SKEIN_R512(p0,p1,p2,p3,p4,p5,p6,p7,ROTS)                      \
    X.s##p0 += X.s##p1; \
    X.s##p2 += X.s##p3; \
    X.s##p4 += X.s##p5; \
    X.s##p6 += X.s##p7; \
    X.s##p1 = SKEIN_ROL_ ## ROTS ## _0(X.s##p1) ^ X.s##p0; \
    X.s##p3 = SKEIN_ROL_ ## ROTS ## _1(X.s##p3) ^ X.s##p2; \
    X.s##p5 = SKEIN_ROL_ ## ROTS ## _2(X.s##p5) ^ X.s##p4; \
    X.s##p7 = SKEIN_ROL_ ## ROTS ## _3(X.s##p7) ^ X.s##p6;

#define SKEIN_I512(R)                                                     \
    X.s0   += ks[((R)+1) % 9];   /* inject the key schedule value */  \
    X.s1   += ks[((R)+2) % 9];                                        \
    X.s2   += ks[((R)+3) % 9];                                        \
    X.s3   += ks[((R)+4) % 9];                                        \
    X.s4   += ks[((R)+5) % 9];                                        \
    X.s5   += ks[((R)+6) % 9] + ts[((R)+1) % 3];                      \
    X.s6   += ks[((R)+7) % 9] + ts[((R)+2) % 3];                      \
    X.s7   += ks[((R)+8) % 9] +     (R)+1;                            \

#define SKEIN_R512_8_rounds(R) \
        SKEIN_R512(0,1,2,3,4,5,6,7, 0);   \
        SKEIN_R512(2,1,4,7,6,5,0,3, 1);   \
        SKEIN_R512(4,1,6,3,0,5,2,7, 2);   \
        SKEIN_R512(6,1,0,7,2,5,4,3, 3);   \
        SKEIN_I512(2*(R));                              \
        SKEIN_R512(0,1,2,3,4,5,6,7, 4);   \
        SKEIN_R512(2,1,4,7,6,5,0,3, 5);   \
        SKEIN_R512(4,1,6,3,0,5,2,7, 6);   \
        SKEIN_R512(6,1,0,7,2,5,4,3, 7);   \
        SKEIN_I512(2*(R)+1);

inline ulong8 skein512_mid_impl(ulong8 X, ulong2 msg)
{
    u64 ts[3], ks[9];

    vstore8(X, 0, ks);
    X.s01 += msg;

    ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^
            ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;

    ts[0] = 80;
    ts[1] = 176UL << 56;
    ts[2] = 0xB000000000000050UL;

    X.s5 += 80;
    X.s6 += 176UL << 56;

    SKEIN_R512_8_rounds( 0);
    SKEIN_R512_8_rounds( 1);
    SKEIN_R512_8_rounds( 2);
    SKEIN_R512_8_rounds( 3);
    SKEIN_R512_8_rounds( 4);
    SKEIN_R512_8_rounds( 5);
    SKEIN_R512_8_rounds( 6);
    SKEIN_R512_8_rounds( 7);
    SKEIN_R512_8_rounds( 8);

    X.s01 ^= msg;
    vstore8(X, 0, ks);

    ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^
            ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;

    ts[0] = 8UL;
    ts[1] = 255UL << 56;
    ts[2] = 0xFF00000000000008UL;

    X.s5 += 8UL;
    X.s6 += 255UL << 56;

    SKEIN_R512_8_rounds( 0);
    SKEIN_R512_8_rounds( 1);
    SKEIN_R512_8_rounds( 2);
    SKEIN_R512_8_rounds( 3);
    SKEIN_R512_8_rounds( 4);
    SKEIN_R512_8_rounds( 5);
    SKEIN_R512_8_rounds( 6);
    SKEIN_R512_8_rounds( 7);
    SKEIN_R512_8_rounds( 8);

    return TOLE64V8(X);
}

__kernel void search(const u64 state0, const u64 state1, const u64 state2, const u64 state3,
                     const u64 state4, const u64 state5, const u64 state6, const u64 state7,
                     const u32 data16, const u32 data17, const u32 data18,
                     const u32 base,
                     __global u32* output)
{
    local u32 nonce;
    nonce = FROMLE32(base) + get_global_id(0);
    ulong8 state = (ulong8)(FROMLE64(state0), FROMLE64(state1), FROMLE64(state2), FROMLE64(state3),
                          FROMLE64(state4), FROMLE64(state5), FROMLE64(state6), FROMLE64(state7));

    ulong2 msg = as_ulong2((uint4)(data16, data17, data18, TOBE32(nonce)));

    if(sha256_res(as_uint16(skein512_mid_impl(state, msg)))/* & 0xc0ffffff*/)
        return;
    output[OUTPUT_SIZE] = output[nonce & OUTPUT_MASK] = nonce;
}


Thanks reorder! But I never used poclbm - what is "crycl.h"? I found no information where I can get it...
klintay
Legendary
*
Offline Offline

Activity: 1775
Merit: 1032


Value will be measured in sats


View Profile WWW
November 15, 2013, 03:18:18 AM
 #190

Not to claim the bounty, but here goes my kernel. Poclbm API, hashes at ~140MHs @7870 @1150MHz. Probably best with modern SDK and GCN, being vectors and all. Feel free to use it whatever you like.
Code:
#include "crycl.h"

inline uint sha256_res(uint16 data)
{
    u32 temp1, W[62];
    vstore16(TOBE32V16(data), 0, W);

#define S0(x) (ROL32(x, 25) ^ ROL32(x, 14) ^  SHR(x, 3))
#define S1(x) (ROL32(x, 15) ^ ROL32(x, 13) ^  SHR(x, 10))

#define S2(x) (ROL32(x, 30) ^ ROL32(x, 19) ^ ROL32(x, 10))
#define S3(x) (ROL32(x, 26) ^ ROL32(x, 21) ^ ROL32(x, 7))

#define F0(y, x, z) bitselect(z, y, z ^ x)
#define F1(x, y, z) bitselect(z, y, x)

#define R(t)                                    \
(                                               \
    W[t] = S1(W[t -  2]) + W[t -  7] +          \
           S0(W[t - 15]) + W[t - 16]            \
)

#define RD(t)                                   \
(                                               \
    S1(W[t -  2]) + W[t -  7] +                 \
           S0(W[t - 15]) + W[t - 16]            \
)

#define P(a,b,c,d,e,f,g,h,x,K)                  \
{                                               \
    temp1 = h + S3(e) + F1(e,f,g) + K + x;      \
    d += temp1; h = temp1 + S2(a) + F0(a,b,c);  \
}
#define PS(a,b,c,d,e,f,g,h,S)                  \
{                                               \
    temp1 = h + S3(e) + F1(e,f,g) + S;      \
    d += temp1; h = temp1 + S2(a) + F0(a,b,c);              \
}

#define PSLAST(a,b,c,d,e,f,g,h,S)                  \
{                                               \
    d += h + S3(e) + F1(e,f,g) + S;              \
}

    uint8 state = (uint8)(0x6A09E667, 0xBB67AE85, 0x3C6EF372, 0xA54FF53A, 0x510E527F, 0x9B05688C, 0x1F83D9AB, 0x5BE0CD19);
    uint8 vars = state;

    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, W[ 0], 0x428A2F98 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, W[ 1], 0x71374491 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, W[ 2], 0xB5C0FBCF );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, W[ 3], 0xE9B5DBA5 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, W[ 4], 0x3956C25B );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, W[ 5], 0x59F111F1 );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, W[ 6], 0x923F82A4 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, W[ 7], 0xAB1C5ED5 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, W[ 8], 0xD807AA98 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, W[ 9], 0x12835B01 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, W[10], 0x243185BE );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, W[11], 0x550C7DC3 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, W[12], 0x72BE5D74 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, W[13], 0x80DEB1FE );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, W[14], 0x9BDC06A7 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, W[15], 0xC19BF174 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(16), 0xE49B69C1 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(17), 0xEFBE4786 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(18), 0x0FC19DC6 );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(19), 0x240CA1CC );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(20), 0x2DE92C6F );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(21), 0x4A7484AA );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(22), 0x5CB0A9DC );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(23), 0x76F988DA );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(24), 0x983E5152 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(25), 0xA831C66D );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(26), 0xB00327C8 );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(27), 0xBF597FC7 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(28), 0xC6E00BF3 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(29), 0xD5A79147 );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(30), 0x06CA6351 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(31), 0x14292967 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(32), 0x27B70A85 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(33), 0x2E1B2138 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(34), 0x4D2C6DFC );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(35), 0x53380D13 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(36), 0x650A7354 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(37), 0x766A0ABB );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(38), 0x81C2C92E );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(39), 0x92722C85 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(40), 0xA2BFE8A1 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(41), 0xA81A664B );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(42), 0xC24B8B70 );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(43), 0xC76C51A3 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(44), 0xD192E819 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(45), 0xD6990624 );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(46), 0xF40E3585 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(47), 0x106AA070 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(48), 0x19A4C116 );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(49), 0x1E376C08 );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(50), 0x2748774C );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(51), 0x34B0BCB5 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(52), 0x391C0CB3 );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(53), 0x4ED8AA4A );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, R(54), 0x5B9CCA4F );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, R(55), 0x682E6FF3 );
    P( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, R(56), 0x748F82EE );
    P( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, R(57), 0x78A5636F );
    P( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, R(58), 0x84C87814 );
    P( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, R(59), 0x8CC70208 );
    P( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, R(60), 0x90BEFFFA );
    P( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, R(61), 0xA4506CEB );
    P( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, RD(62), 0xBEF9A3F7 );
    P( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, RD(63), 0xC67178F2 );

    state += vars;
    vars = state;

    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x80000000 + 0x428A2F98 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0 + 0x71374491 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0 + 0xB5C0FBCF );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0 + 0xE9B5DBA5 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0 + 0x3956C25B );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0 + 0x59F111F1 );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0 + 0x923F82A4 );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0 + 0xAB1C5ED5 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0 + 0xD807AA98 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0 + 0x12835B01 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0 + 0x243185BE );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0 + 0x550C7DC3 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0 + 0x72BE5D74 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0 + 0x80DEB1FE );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0 + 0x9BDC06A7 );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 512 + 0xC19BF174 );

    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x80000000 + 0xE49B69C1 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x01400000 + 0xEFBE4786 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0x00205000 + 0x0FC19DC6 );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0x00005088 + 0x240CA1CC );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0x22000800 + 0x2DE92C6F );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0x22550014 + 0x4A7484AA );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0x05089742 + 0x5CB0A9DC );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0xa0000020 + 0x76F988DA );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x5a880000 + 0x983E5152 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x005c9400 + 0xA831C66D );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0x0016d49d + 0xB00327C8 );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0xfa801f00 + 0xBF597FC7 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0xd33225d0 + 0xC6E00BF3 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0x11675959 + 0xD5A79147 );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0xf6e6bfda + 0x06CA6351 );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0xb30c1549 + 0x14292967 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x08b2b050 + 0x27B70A85 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x9d7c4c27 + 0x2E1B2138 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0x0ce2a393 + 0x4D2C6DFC );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0x88e6e1ea + 0x53380D13 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0xa52b4335 + 0x650A7354 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0x67a16f49 + 0x766A0ABB );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0xd732016f + 0x81C2C92E );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0x4eeb2e91 + 0x92722C85 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x5dbf55e5 + 0xA2BFE8A1 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x8eee2335 + 0xA81A664B );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0xe2bc5ec2 + 0xC24B8B70 );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0xa83f4394 + 0xC76C51A3 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0x45ad78f7 + 0xD192E819 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0x36f3d0cd + 0xD6990624 );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0xd99c05e8 + 0xF40E3585 );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0xb0511dc7 + 0x106AA070 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x69bc7ac4 + 0x19A4C116 );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0xbd11375b + 0x1E376C08 );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0xe3ba71e5 + 0x2748774C );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0x3b209ff2 + 0x34B0BCB5 );
    PS( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0x18feee17 + 0x391C0CB3 );
    PS( vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, 0xe25ad9e7 + 0x4ED8AA4A );
    PS( vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, 0x13375046 + 0x5B9CCA4F );
    PS( vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, 0x0515089d + 0x682E6FF3 );
    PS( vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, vars.s7, 0x4f0d0f04 + 0x748F82EE );
    PS( vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, vars.s6, 0x2627484e + 0x78A5636F );
    PS( vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, vars.s5, 0x310128d2 + 0x84C87814 );
    PS( vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, vars.s4, 0xc668b434 + 0x8CC70208 );
    PSLAST( vars.s4, vars.s5, vars.s6, vars.s7, vars.s0, vars.s1, vars.s2, vars.s3, 0x420841cc + 0x90BEFFFA );

    return vars.s7 + state.s7;
}

#define rolhackl(n) \
inline ulong rol ## n  (ulong l) \
{ \
    uint2 t = rotate(as_uint2(l), (n)); \
    return as_ulong((uint2)(bitselect(t.s0, t.s1, (uint)(1 << (n)) - 1), bitselect(t.s0, t.s1, (uint)(~((1 << (n)) - 1))))); \
}

rolhackl(8)
rolhackl(9)
rolhackl(10)
rolhackl(13)
rolhackl(14)
rolhackl(17)
rolhackl(19)
rolhackl(22)
rolhackl(24)
rolhackl(25)
rolhackl(27)
rolhackl(29)
rolhackl(30)

#define rolhackr(n) \
inline ulong rol ## n  (ulong l) \
{ \
    uint2 t = rotate(as_uint2(l), (n - 32)); \
    return as_ulong((uint2)(bitselect(t.s1, t.s0, (uint)(1 << (n - 32)) - 1), bitselect(t.s1, t.s0, (uint)(~((1 << (n - 32)) - 1))))); \
}

rolhackr(33)
rolhackr(34)
rolhackr(35)
rolhackr(36)
rolhackr(37)
rolhackr(39)
rolhackr(42)
rolhackr(43)
rolhackr(44)
rolhackr(46)
rolhackr(49)
rolhackr(50)
rolhackr(54)
rolhackr(56)

#define SKEIN_ROL_0_0(x) rol46(x)
#define SKEIN_ROL_0_1(x) rol36(x)
#define SKEIN_ROL_0_2(x) rol19(x)
#define SKEIN_ROL_0_3(x) rol37(x)
#define SKEIN_ROL_1_0(x) rol33(x)
#define SKEIN_ROL_1_1(x) rol27(x)
#define SKEIN_ROL_1_2(x) rol14(x)
#define SKEIN_ROL_1_3(x) rol42(x)
#define SKEIN_ROL_2_0(x) rol17(x)
#define SKEIN_ROL_2_1(x) rol49(x)
#define SKEIN_ROL_2_2(x) rol36(x)
#define SKEIN_ROL_2_3(x) rol39(x)
#define SKEIN_ROL_3_0(x) rol44(x)
#define SKEIN_ROL_3_1(x) rol9(x)
#define SKEIN_ROL_3_2(x) rol54(x)
#define SKEIN_ROL_3_3(x) rol56(x)
#define SKEIN_ROL_4_0(x) rol39(x)
#define SKEIN_ROL_4_1(x) rol30(x)
#define SKEIN_ROL_4_2(x) rol34(x)
#define SKEIN_ROL_4_3(x) rol24(x)
#define SKEIN_ROL_5_0(x) rol13(x)
#define SKEIN_ROL_5_1(x) rol50(x)
#define SKEIN_ROL_5_2(x) rol10(x)
#define SKEIN_ROL_5_3(x) rol17(x)
#define SKEIN_ROL_6_0(x) rol25(x)
#define SKEIN_ROL_6_1(x) rol29(x)
#define SKEIN_ROL_6_2(x) rol39(x)
#define SKEIN_ROL_6_3(x) rol43(x)
#define SKEIN_ROL_7_0(x) rol8(x)
#define SKEIN_ROL_7_1(x) rol35(x)
#define SKEIN_ROL_7_2(x) rol56(x)
#define SKEIN_ROL_7_3(x) rol22(x)

#define SKEIN_KS_PARITY         0x1BD11BDAA9FC1A22UL

#define SKEIN_R512(p0,p1,p2,p3,p4,p5,p6,p7,ROTS)                      \
    X.s##p0 += X.s##p1; \
    X.s##p2 += X.s##p3; \
    X.s##p4 += X.s##p5; \
    X.s##p6 += X.s##p7; \
    X.s##p1 = SKEIN_ROL_ ## ROTS ## _0(X.s##p1) ^ X.s##p0; \
    X.s##p3 = SKEIN_ROL_ ## ROTS ## _1(X.s##p3) ^ X.s##p2; \
    X.s##p5 = SKEIN_ROL_ ## ROTS ## _2(X.s##p5) ^ X.s##p4; \
    X.s##p7 = SKEIN_ROL_ ## ROTS ## _3(X.s##p7) ^ X.s##p6;

#define SKEIN_I512(R)                                                     \
    X.s0   += ks[((R)+1) % 9];   /* inject the key schedule value */  \
    X.s1   += ks[((R)+2) % 9];                                        \
    X.s2   += ks[((R)+3) % 9];                                        \
    X.s3   += ks[((R)+4) % 9];                                        \
    X.s4   += ks[((R)+5) % 9];                                        \
    X.s5   += ks[((R)+6) % 9] + ts[((R)+1) % 3];                      \
    X.s6   += ks[((R)+7) % 9] + ts[((R)+2) % 3];                      \
    X.s7   += ks[((R)+8) % 9] +     (R)+1;                            \

#define SKEIN_R512_8_rounds(R) \
        SKEIN_R512(0,1,2,3,4,5,6,7, 0);   \
        SKEIN_R512(2,1,4,7,6,5,0,3, 1);   \
        SKEIN_R512(4,1,6,3,0,5,2,7, 2);   \
        SKEIN_R512(6,1,0,7,2,5,4,3, 3);   \
        SKEIN_I512(2*(R));                              \
        SKEIN_R512(0,1,2,3,4,5,6,7, 4);   \
        SKEIN_R512(2,1,4,7,6,5,0,3, 5);   \
        SKEIN_R512(4,1,6,3,0,5,2,7, 6);   \
        SKEIN_R512(6,1,0,7,2,5,4,3, 7);   \
        SKEIN_I512(2*(R)+1);

inline ulong8 skein512_mid_impl(ulong8 X, ulong2 msg)
{
    u64 ts[3], ks[9];

    vstore8(X, 0, ks);
    X.s01 += msg;

    ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^
            ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;

    ts[0] = 80;
    ts[1] = 176UL << 56;
    ts[2] = 0xB000000000000050UL;

    X.s5 += 80;
    X.s6 += 176UL << 56;

    SKEIN_R512_8_rounds( 0);
    SKEIN_R512_8_rounds( 1);
    SKEIN_R512_8_rounds( 2);
    SKEIN_R512_8_rounds( 3);
    SKEIN_R512_8_rounds( 4);
    SKEIN_R512_8_rounds( 5);
    SKEIN_R512_8_rounds( 6);
    SKEIN_R512_8_rounds( 7);
    SKEIN_R512_8_rounds( 8);

    X.s01 ^= msg;
    vstore8(X, 0, ks);

    ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^
            ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;

    ts[0] = 8UL;
    ts[1] = 255UL << 56;
    ts[2] = 0xFF00000000000008UL;

    X.s5 += 8UL;
    X.s6 += 255UL << 56;

    SKEIN_R512_8_rounds( 0);
    SKEIN_R512_8_rounds( 1);
    SKEIN_R512_8_rounds( 2);
    SKEIN_R512_8_rounds( 3);
    SKEIN_R512_8_rounds( 4);
    SKEIN_R512_8_rounds( 5);
    SKEIN_R512_8_rounds( 6);
    SKEIN_R512_8_rounds( 7);
    SKEIN_R512_8_rounds( 8);

    return TOLE64V8(X);
}

__kernel void search(const u64 state0, const u64 state1, const u64 state2, const u64 state3,
                     const u64 state4, const u64 state5, const u64 state6, const u64 state7,
                     const u32 data16, const u32 data17, const u32 data18,
                     const u32 base,
                     __global u32* output)
{
    local u32 nonce;
    nonce = FROMLE32(base) + get_global_id(0);
    ulong8 state = (ulong8)(FROMLE64(state0), FROMLE64(state1), FROMLE64(state2), FROMLE64(state3),
                          FROMLE64(state4), FROMLE64(state5), FROMLE64(state6), FROMLE64(state7));

    ulong2 msg = as_ulong2((uint4)(data16, data17, data18, TOBE32(nonce)));

    if(sha256_res(as_uint16(skein512_mid_impl(state, msg)))/* & 0xc0ffffff*/)
        return;
    output[OUTPUT_SIZE] = output[nonce & OUTPUT_MASK] = nonce;
}


Great work man! A bit above my computing knowledge to be honest...this is all new to me. Will give it a shot though...time to do some reading  Grin
reorder
Sr. Member
****
Offline Offline

Activity: 462
Merit: 250


View Profile
November 15, 2013, 05:18:55 AM
 #191

Thanks reorder! But I never used poclbm - what is "crycl.h"? I found no information where I can get it...

It is just a bunch of (pretty much obvious) macros I use between projects, not a part of original poclbm. Here it is:
Code:
#ifndef CRYCL_H
#define CRYCL_H


#define ROL(x, n)       rotate(x, (ulong) n)
#define ROR(x, n)       rotate(x, (ulong) 64-n)
#define ROL32(x, n)       rotate(x, (uint) n)
#define ROR32(x, n)       rotate(x, (uint) 32-n)
#define SWAP32(a)       (as_uint(as_uchar4(a).wzyx))
#define SWAP64(n)       (as_ulong(as_uchar8(n).s76543210))


#define SHL(x, n)   ((x) << n)
#define SHR(x, n)   ((x) >> n)

#ifdef ENDIAN_LITTLE
#define TOBE64(x) SWAP64(x)
#define TOLE64(x) (x)
#define TOBE32(x) SWAP32(x)
#define TOLE32(x) (x)
#define TOBE64V8(v) (ulong8)(SWAP64(v.s0), SWAP64(v.s1), SWAP64(v.s2), SWAP64(v.s3), SWAP64(v.s4), SWAP64(v.s5), SWAP64(v.s6), SWAP64(v.s7))
#define TOLE64V8(v) (v)
#define TOBE32V16(v) (uint16)(SWAP32(v.s0), SWAP32(v.s1), SWAP32(v.s2), SWAP32(v.s3), \
    SWAP32(v.s4), SWAP32(v.s5), SWAP32(v.s6), SWAP32(v.s7), \
    SWAP32(v.s8), SWAP32(v.s9), SWAP32(v.sa), SWAP32(v.sb), \
    SWAP32(v.sc), SWAP32(v.sd), SWAP32(v.se), SWAP32(v.sf))
#define TOLE32V16(v) (v)
#define TOBE32V8(v) (uint8)(SWAP32(v.s0), SWAP32(v.s1), SWAP32(v.s2), SWAP32(v.s3), \
    SWAP32(v.s4), SWAP32(v.s5), SWAP32(v.s6), SWAP32(v.s7))
#define TOLE32V8(v) (v)

#else

#define TOBE64(x) (x)
#define TOLE64(x) SWAP64(x)
#define TOBE32(x) (x)
#define TOLE32(x) SWAP32(x)
#define TOBE64V8(v) (v)
#define TOLE64V8(v) (ulong8)(SWAP64(v.s0), SWAP64(v.s1), SWAP64(v.s2), SWAP64(v.s3), SWAP64(v.s4), SWAP64(v.s5), SWAP64(v.s6), SWAP64(v.s7))
#define TOBE32V16(v) (v)
#define TOLE32V16(v) (uint16)(SWAP32(v.s0), SWAP32(v.s1), SWAP32(v.s2), SWAP32(v.s3), \
    SWAP32(v.s4), SWAP32(v.s5), SWAP32(v.s6), SWAP32(v.s7), \
    SWAP32(v.s8), SWAP32(v.s9), SWAP32(v.sa), SWAP32(v.sb), \
    SWAP32(v.sc), SWAP32(v.sd), SWAP32(v.se), SWAP32(v.sf))
#define TOBE32V8(v) (v)
#define TOLE32V8(v) (uint8)(SWAP32(v.s0), SWAP32(v.s1), SWAP32(v.s2), SWAP32(v.s3), \
    SWAP32(v.s4), SWAP32(v.s5), SWAP32(v.s6), SWAP32(v.s7))
#endif

#define FROMLE64(x) TOLE64(x)
#define FROMBE64(x) TOBE64(x)
#define FROMLE32(x) TOLE32(x)
#define FROMBE32(x) TOBE32(x)
#define FROMBE64V8(x) TOBE64V8(x)
#define FROMLE64V8(x) TOLE64V8(x)
#define FROMBE32V8(x) TOBE32V8(x)
#define FROMLE32V8(x) TOLE32V8(x)
#define FROMBE32V16(x) TOBE32V16(x)
#define FROMLE32V16(x) TOLE32V16(x)

typedef ulong u64;
typedef uint u32;
typedef uchar u8;

__constant ulong16 zerol16 = (ulong16)(0UL);
__constant uint16 zeroi16 = (uint16)(0UL);

#endif

I would not share the modified poclbm itself being embarrassed with all the monkey patching and dirty code. Smiley You are better off plugging this into reaper or even cgminer anyway, just make sure to define GPU_AMD and ENDIAN_LITTLE on compilation.

You will also probably need this, this is how the midstate for kernel is computed:
Code:
int skeinhashmid
    (
    unsigned char *out,
    const unsigned char *in
    )

    {
    Skein_512_Ctxt_t ctx;

    Skein_512_Init  (&ctx,8*64);
    Skein_512_Update(&ctx,in,(size_t) 80);
    memcpy(out, ctx.X, 64);

    return 0;
    }
Gunther
Legendary
*
Offline Offline

Activity: 840
Merit: 1000


View Profile
November 15, 2013, 07:48:18 AM
Last edit: November 15, 2013, 08:40:53 AM by Gunther
 #192

CPU mining guide - Digital Ocean

1. Sign up for an account at Digital Ocean
   Consider using this referal code - https://www.digitalocean.com/?refcode=5ab24fa2bd9a
2. Create a droplet
   Enter host name
   A 512 MB/1 CPU droplet works fine
   Select a region
   Choose Ubuntu 13.04 x64 as your operating system
   Choose "Create Droplet"
3. Once completed you will receive your droplet IP and root password via e-mail
4. Use putty or similar ssh client to connect to the IP of your droplet
5. Login as root with password provided
6. Paste the following script in your ssh client

Code:
{
# Make swap space
sudo dd if=/dev/zero of=/swapfile bs=64M count=16
sudo mkswap /swapfile
sudo swapon /swapfile

# Install libraries
apt-get install build-essential curl git libboost-all-dev libdb++5.3-dev libdb5.3++-dev libminiupnpc-dev libssl-dev m4 -y

# Install GMP
cd ~/
rm -rf gmp-5.1.2.tar.bz2 gmp-5.1.2
wget http://mirrors.kernel.org/gnu/gmp/gmp-5.1.2.tar.bz2
tar xjvf gmp-5.1.2.tar.bz2
cd gmp-5.1.2
./configure --enable-cxx
make
make install
rm -rf gmp-5.1.2.tar.bz2 gmp-5.1.2
cd ~/

# Install Skeincoin
git clone https://github.com/skeincoin/skeincoin.git
cd ~/skeincoin/src
make -f makefile.unix USE_UPNP=- DEBUGFLAGS="" CXXFLAGS=

sudo cp ./skeincoind /usr/local/bin/

mkdir ~/.skeincoin
echo "rpcusername=myskeincoinusername
rpcpassword=$(cat /dev/urandom | tr -cd '[:alnum:]' | head -c32)
listen=1
server=1
maxconnections=200
gen=1
genproclimit=-1" > ~/.skeincoin/skeincoin.conf


skeincoind --daemon

}

7. Monitor your daemon with:

Code:
skeincoind getinfo

8. Send your coins to another wallet with:

Code:
skeincoind sendtoaddress <skeincoinaddress> <amount>


Thank you. Installed it yesterday, but didn't get any balance till now. Is that normal?
Don't know if he's mining or not.
melnikalex
Member
**
Offline Offline

Activity: 60
Merit: 10


View Profile
November 15, 2013, 09:05:35 AM
 #193

reorder, what are driver version and openCL SDK version your kernel compile/work with?
reorder
Sr. Member
****
Offline Offline

Activity: 462
Merit: 250


View Profile
November 15, 2013, 09:28:56 AM
 #194

reorder, what are driver version and openCL SDK version your kernel compile/work with?

Should work with any, but to get BFI_INT for bitselect without binary patching you are going to need 11.9+ and 2.6+ respectively.
feeleep
Legendary
*
Offline Offline

Activity: 1197
Merit: 1000


View Profile WWW
November 15, 2013, 09:31:15 AM
 #195

reorder, what are driver version and openCL SDK version your kernel compile/work with?

Should work with any, but to get BFI_INT for bitselect without binary patching you are going to need 11.9+ and 2.6+ respectively.

Guys - just let me know when you want to test your GPU miners on the pool as I would need to change share difficulty - otherwise you will flood my pool with low diff shares Smiley

reorder
Sr. Member
****
Offline Offline

Activity: 462
Merit: 250


View Profile
November 15, 2013, 09:36:09 AM
 #196

reorder, what are driver version and openCL SDK version your kernel compile/work with?

Should work with any, but to get BFI_INT for bitselect without binary patching you are going to need 11.9+ and 2.6+ respectively.

Guys - just let me know when you want to test your GPU miners on the pool as I would need to change share difficulty - otherwise you will flood my pool with low diff shares Smiley
In fact your pool keeps up to 160MHs pretty well, but trying at 280 after some optimisations started giving me ~30% stales Smiley
feeleep
Legendary
*
Offline Offline

Activity: 1197
Merit: 1000


View Profile WWW
November 15, 2013, 09:37:32 AM
 #197

reorder, what are driver version and openCL SDK version your kernel compile/work with?

Should work with any, but to get BFI_INT for bitselect without binary patching you are going to need 11.9+ and 2.6+ respectively.

Guys - just let me know when you want to test your GPU miners on the pool as I would need to change share difficulty - otherwise you will flood my pool with low diff shares Smiley
In fact your pool keeps up to 160MHs pretty well, but trying at 280 after some optimisations started giving me ~30% stales Smiley

probably because of low diff for shares - I will change it now so you can try again.

feeleep
Legendary
*
Offline Offline

Activity: 1197
Merit: 1000


View Profile WWW
November 15, 2013, 09:51:10 AM
 #198

reorder, what are driver version and openCL SDK version your kernel compile/work with?

Should work with any, but to get BFI_INT for bitselect without binary patching you are going to need 11.9+ and 2.6+ respectively.

Guys - just let me know when you want to test your GPU miners on the pool as I would need to change share difficulty - otherwise you will flood my pool with low diff shares Smiley
In fact your pool keeps up to 160MHs pretty well, but trying at 280 after some optimisations started giving me ~30% stales Smiley

you are sending quite a lot of duplicates...

reorder
Sr. Member
****
Offline Offline

Activity: 462
Merit: 250


View Profile
November 15, 2013, 09:54:03 AM
 #199

reorder, what are driver version and openCL SDK version your kernel compile/work with?

Should work with any, but to get BFI_INT for bitselect without binary patching you are going to need 11.9+ and 2.6+ respectively.

Guys - just let me know when you want to test your GPU miners on the pool as I would need to change share difficulty - otherwise you will flood my pool with low diff shares Smiley
In fact your pool keeps up to 160MHs pretty well, but trying at 280 after some optimisations started giving me ~30% stales Smiley

probably because of low diff for shares - I will change it now so you can try again.

It is probably due to poclbm not able to restart work immediately, but still it is like this now:
Code:
skc.coinmine.pl:6400 0:1:Pitcairn 72.0[277.086 MH/s (~181 MH/s)] [Rej: 96/275 (34.91%)]

I am sure cgminer can do better.
melnikalex
Member
**
Offline Offline

Activity: 60
Merit: 10


View Profile
November 15, 2013, 12:18:20 PM
 #200

reorder, what are driver version and openCL SDK version your kernel compile/work with?

Should work with any, but to get BFI_INT for bitselect without binary patching you are going to need 11.9+ and 2.6+ respectively.
AMD Kernel Analizer 2 cannot compile it on 13.10 and 2.8 - unhandled exception =( maybe you can tell what versions (catalist and SDK) do you use? Thanks
Pages: « 1 2 3 4 5 6 7 8 9 [10] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ... 78 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!