Bitcoin Forum
May 30, 2024, 12:32:37 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 [128] 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 »
  Print  
Author Topic: [ANN] ccminer 2.3 - opensource - GPL (tpruvot)  (Read 500007 times)
Epsylon3 (OP)
Legendary
*
Offline Offline

Activity: 1484
Merit: 1082


ccminer/cpuminer developer


View Profile WWW
September 30, 2017, 03:26:29 PM
 #2541

last release is built with cuda 9 .. and its generally the same or slower (and im now in public ccminer vacations like alexis)

BTC: 1FhDPLPpw18X4srecguG3MxJYe4a1JsZnd - My Projects: ccminer - cpuminer-multi - yiimp - Forum threads : ccminer - cpuminer-multi - yiimp
krnlx
Full Member
***
Offline Offline

Activity: 243
Merit: 105


View Profile
September 30, 2017, 03:44:49 PM
 #2542

It is still fine if you can release whatever dirty miner you have than nothing at all please?  Grin

Last time, when I released skunk miner, I recieved a lot of whine messages about bugs, wrong hashrate, etc. So I will test it better and try to speed up it. Developing is my hobby atm, so I code in my free time.

BTW There is a faster version(6.1mh on ti), that will be released  in about 1 month.
antantti
Legendary
*
Offline Offline

Activity: 1176
Merit: 1015


View Profile
September 30, 2017, 03:53:12 PM
 #2543

With the lack of ccminer tips in 2017

Some things never change. All that timeskunktravelraptorhash mania and you never released any closed beta preview miners?

You should. Just because you can and you deserve it.

Epsylon3 (OP)
Legendary
*
Offline Offline

Activity: 1484
Merit: 1082


ccminer/cpuminer developer


View Profile WWW
September 30, 2017, 03:59:53 PM
 #2544

actually it has changed... i guess users are poor after paying sp

BTC: 1FhDPLPpw18X4srecguG3MxJYe4a1JsZnd - My Projects: ccminer - cpuminer-multi - yiimp - Forum threads : ccminer - cpuminer-multi - yiimp
antantti
Legendary
*
Offline Offline

Activity: 1176
Merit: 1015


View Profile
September 30, 2017, 04:01:48 PM
 #2545

You should make them first pay to you  Grin

sp_
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 30, 2017, 04:40:41 PM
 #2546

There are 18 algos on xevan and only 2 of them are simple copy paste. My miner is dirty and not fast enough to release(3.3 mh on 1070, 5.4-5.5mh on 80ti).

Here is a broken bmw-1024. (Based on the ccminer 2.2 implementation)
Can you correct the code please.

Code:
__global__ __launch_bounds__(32, 8)
void quark_bmw512_gpu_hash_128(uint32_t threads, uint64_t *const __restrict__ g_hash, const uint32_t *const __restrict__ g_nonceVector)
{

const uint32_t thread = (blockDim.x * blockIdx.x + threadIdx.x);
if (thread < threads)
{

const uint32_t hashPosition = (g_nonceVector == NULL) ? thread : g_nonceVector[thread];

uint64_t *inpHash = &g_hash[8 * hashPosition];

uint2 __align__(16) msg0[16];
uint2 __align__(16) msg1[16] = { 0 };
uint2    __align__(16) h[16];

uint2x4* phash = (uint2x4*)inpHash;
uint2x4* outpt = (uint2x4*)msg0;
outpt[0] = __ldg4(&phash[0]);
outpt[1] = __ldg4(&phash[1]);

// bmw
msg1[0] = vectorize(0x80UL);
msg1[15] = vectorize(1024UL);

for (int i = 0; i < 16; ++i) h[i] = vectorize(BMW512_IV[i]);

Compression512(msg0, h);
Compression512(msg1, msg0);

for (int i = 0; i < 16; ++i) h[i] = vectorize(BMW512_FINAL[i]);

Compression512(msg1, h);

//#pragma unroll
// for (int i = 0; i < 8; ++i) hash->h8[i] = SWAP8(msg1[i + 8]);


inpHash[0] = cuda_swab32(devectorize(msg1[0 + 8]));
inpHash[1] = cuda_swab32(devectorize(msg1[1 + 8]));
inpHash[2] = cuda_swab32(devectorize(msg1[2 + 8]));
inpHash[3] = cuda_swab32(devectorize(msg1[3 + 8]));
inpHash[4] = cuda_swab32(devectorize(msg1[4 + 8]));
inpHash[5] = cuda_swab32(devectorize(msg1[5 + 8]));
inpHash[6] = cuda_swab32(devectorize(msg1[6 + 8]));
inpHash[7] = cuda_swab32(devectorize(msg1[7 + 8]));

}
}

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
krnlx
Full Member
***
Offline Offline

Activity: 243
Merit: 105


View Profile
September 30, 2017, 04:50:39 PM
 #2547

You don't need cuda_swab32 here

 inpHash[0] = cuda_swab32(devectorize(msg1[0 + 8]));

And it will not work because devectorize produces uint64_t not uint32_t
pr0ximus
Full Member
***
Offline Offline

Activity: 142
Merit: 100


View Profile
September 30, 2017, 05:00:06 PM
 #2548

You should make them first pay to you  Grin


That wouldn't make him any different from sp. He chose to keep it free and THAT defines him Smiley (and yes, he deserves donations for what he has done. shame that it did not help him much)
sp_
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 30, 2017, 05:07:53 PM
 #2549

What's up with yiimp today? the difficulty is reduced, and hashrate is lost.



Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
antantti
Legendary
*
Offline Offline

Activity: 1176
Merit: 1015


View Profile
September 30, 2017, 05:12:31 PM
 #2550

That wouldn't make him any different from sp. He chose to keep it free and THAT defines him Smiley (and yes, he deserves donations for what he has done. shame that it did not help him much)
It worked on lbc.

Epsylon3 is about the last man standing here, everyone else left for obvious reasons. Next time some "new" profitable algo to mine pops up I suggest you pm him.

sp_
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 30, 2017, 05:23:32 PM
 #2551

You don't need cuda_swab32 here
inpHash[0] = cuda_swab32(devectorize(msg1[0 + 8]));
And it will not work because devectorize produces uint64_t not uint32_t


right. but Still broken..

Code:

__constant__ uint64_t BMW512_IV[] = {
(0x8081828384858687), (0x88898A8B8C8D8E8F),
(0x9091929394959697), (0x98999A9B9C9D9E9F),
(0xA0A1A2A3A4A5A6A7), (0xA8A9AAABACADAEAF),
(0xB0B1B2B3B4B5B6B7), (0xB8B9BABBBCBDBEBF),
(0xC0C1C2C3C4C5C6C7), (0xC8C9CACBCCCDCECF),
(0xD0D1D2D3D4D5D6D7), (0xD8D9DADBDCDDDEDF),
(0xE0E1E2E3E4E5E6E7), (0xE8E9EAEBECEDEEEF),
(0xF0F1F2F3F4F5F6F7), (0xF8F9FAFBFCFDFEFF)
};
__constant__ uint64_t BMW512_FINAL[16] =
{
0xAAAAAAAAAAAAAAA0UL, 0xAAAAAAAAAAAAAAA1UL, 0xAAAAAAAAAAAAAAA2UL, 0xAAAAAAAAAAAAAAA3UL,
0xAAAAAAAAAAAAAAA4UL, 0xAAAAAAAAAAAAAAA5UL, 0xAAAAAAAAAAAAAAA6UL, 0xAAAAAAAAAAAAAAA7UL,
0xAAAAAAAAAAAAAAA8UL, 0xAAAAAAAAAAAAAAA9UL, 0xAAAAAAAAAAAAAAAAUL, 0xAAAAAAAAAAAAAAABUL,
0xAAAAAAAAAAAAAAACUL, 0xAAAAAAAAAAAAAAADUL, 0xAAAAAAAAAAAAAAAEUL, 0xAAAAAAAAAAAAAAAFUL
};
__global__ __launch_bounds__(32, 8)
void quark_bmw512_gpu_hash_128(uint32_t threads, uint64_t *const __restrict__ g_hash, const uint32_t *const __restrict__ g_nonceVector)
{

const uint32_t thread = (blockDim.x * blockIdx.x + threadIdx.x);
if (thread < threads)
{

const uint32_t hashPosition = (g_nonceVector == NULL) ? thread : g_nonceVector[thread];

uint64_t *inpHash = &g_hash[8 * hashPosition];
uint32_t *outHash = (uint32_t *)inpHash;
uint2 __align__(16) msg0[16];
uint2 __align__(16) msg1[16] = { 0 };
uint2    __align__(16) h[16];

uint2x4* phash = (uint2x4*)inpHash;
uint2x4* outpt = (uint2x4*)msg0;
outpt[0] = __ldg4(&phash[0]);
outpt[1] = __ldg4(&phash[1]);

// bmw
msg1[0] = vectorize(0x80UL);
msg1[15] = vectorize(1024UL);

for (int i = 0; i < 16; ++i) h[i] = vectorize(BMW512_IV[i]);

Compression512(msg0, h);
Compression512(msg1, msg0);

for (int i = 0; i < 16; ++i) h[i] = vectorize(BMW512_FINAL[i]);

Compression512(msg1, h);

outHash[0] = msg1[0 + 8].x;
outHash[1] = msg1[0 + 8].y;
outHash[2] = msg1[1 + 8].x;
outHash[3] = msg1[1 + 8].y;
outHash[4] = msg1[2 + 8].x;
outHash[5] = msg1[2 + 8].y;
outHash[6] = msg1[3 + 8].x;
outHash[7] = msg1[3 + 8].y;
outHash[8] = msg1[4 + 8].x;
outHash[9] = msg1[4 + 8].y;
outHash[10] = msg1[5 + 8].x;
outHash[11] = msg1[5 + 8].y;
outHash[12] = msg1[6 + 8].x;
outHash[13] = msg1[6 + 8].y;
outHash[14] = msg1[7 + 8].x;
outHash[15] = msg1[7 + 8].y;
}
}

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
krnlx
Full Member
***
Offline Offline

Activity: 243
Merit: 105


View Profile
September 30, 2017, 06:04:11 PM
 #2552

Compression512(msg0, h);

Compression512(h, msg0);

santan
Member
**
Offline Offline

Activity: 102
Merit: 11


View Profile
September 30, 2017, 06:13:26 PM
 #2553

Sp_ is preparing his next 0.5 BTC by asking
sp_
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 30, 2017, 06:20:25 PM
Last edit: September 30, 2017, 06:51:49 PM by sp_
 #2554

Next up is groestl 1024.

broken..


Code:

// Macros and table for Wolf's OpenCL Groestl implementation

#define BYTE(x, y) (bfe((uint32_t)((x) >> ((y >= 32U) ? 32U : 0U)), (y) - (((y) >= 32) ? 32U : 0), 8U))

#define B64_0(x) ((uint8_t)(x))
#define B64_1(x)    BYTE((x), 8U)
#define B64_2(x)    BYTE((x), 16U)
#define B64_3(x)    BYTE((x), 24U)
#define B64_4(x)    BYTE((x), 32U)
#define B64_5(x)    BYTE((x), 40U)
#define B64_6(x)    BYTE((x), 48U)
#define B64_7(x)    BYTE((x), 56U)

#define GROESTL_RBTT(d, Hval, b0, b1, b2, b3, b4, b5, b6, b7)\
d = (T0[B64_0(Hval[b0])] ^ T1[B64_1(Hval[b1])] ^ T2[B64_2(Hval[b2])] ^ T3[B64_3(Hval[b3])] \
^ SWAPDWORDS(T0[B64_4(Hval[b4])]) ^ SWAPDWORDS(T1[B64_5(Hval[b5])]) ^ SWAPDWORDS(T2[B64_6(Hval[b6])]) ^ SWAPDWORDS(T3[B64_7(Hval[b7])]));

#define PC64(j, r)  (uint64_t)((j) | (r))
#define QC64(j, r) ROTL64(((uint64_t)(r)) ^ (~((uint64_t)(j))), 56UL)

static const __constant__ uint64_t T0_G[] =
{
0xc6a597f4a5f432c6UL, 0xf884eb9784976ff8UL, 0xee99c7b099b05eeeUL, 0xf68df78c8d8c7af6UL,
0xff0de5170d17e8ffUL, 0xd6bdb7dcbddc0ad6UL, 0xdeb1a7c8b1c816deUL, 0x915439fc54fc6d91UL,
0x6050c0f050f09060UL, 0x0203040503050702UL, 0xcea987e0a9e02eceUL, 0x567dac877d87d156UL,
0xe719d52b192bcce7UL, 0xb56271a662a613b5UL, 0x4de69a31e6317c4dUL, 0xec9ac3b59ab559ecUL,
0x8f4505cf45cf408fUL, 0x1f9d3ebc9dbca31fUL, 0x894009c040c04989UL, 0xfa87ef92879268faUL,
0xef15c53f153fd0efUL, 0xb2eb7f26eb2694b2UL, 0x8ec90740c940ce8eUL, 0xfb0bed1d0b1de6fbUL,
0x41ec822fec2f6e41UL, 0xb3677da967a91ab3UL, 0x5ffdbe1cfd1c435fUL, 0x45ea8a25ea256045UL,
0x23bf46dabfdaf923UL, 0x53f7a602f7025153UL, 0xe496d3a196a145e4UL, 0x9b5b2ded5bed769bUL,
0x75c2ea5dc25d2875UL, 0xe11cd9241c24c5e1UL, 0x3dae7ae9aee9d43dUL, 0x4c6a98be6abef24cUL,
0x6c5ad8ee5aee826cUL, 0x7e41fcc341c3bd7eUL, 0xf502f1060206f3f5UL, 0x834f1dd14fd15283UL,
0x685cd0e45ce48c68UL, 0x51f4a207f4075651UL, 0xd134b95c345c8dd1UL, 0xf908e9180818e1f9UL,
0xe293dfae93ae4ce2UL, 0xab734d9573953eabUL, 0x6253c4f553f59762UL, 0x2a3f54413f416b2aUL,
0x080c10140c141c08UL, 0x955231f652f66395UL, 0x46658caf65afe946UL, 0x9d5e21e25ee27f9dUL,
0x3028607828784830UL, 0x37a16ef8a1f8cf37UL, 0x0a0f14110f111b0aUL, 0x2fb55ec4b5c4eb2fUL,
0x0e091c1b091b150eUL, 0x2436485a365a7e24UL, 0x1b9b36b69bb6ad1bUL, 0xdf3da5473d4798dfUL,
0xcd26816a266aa7cdUL, 0x4e699cbb69bbf54eUL, 0x7fcdfe4ccd4c337fUL, 0xea9fcfba9fba50eaUL,
0x121b242d1b2d3f12UL, 0x1d9e3ab99eb9a41dUL, 0x5874b09c749cc458UL, 0x342e68722e724634UL,
0x362d6c772d774136UL, 0xdcb2a3cdb2cd11dcUL, 0xb4ee7329ee299db4UL, 0x5bfbb616fb164d5bUL,
0xa4f65301f601a5a4UL, 0x764decd74dd7a176UL, 0xb76175a361a314b7UL, 0x7dcefa49ce49347dUL,
0x527ba48d7b8ddf52UL, 0xdd3ea1423e429fddUL, 0x5e71bc937193cd5eUL, 0x139726a297a2b113UL,
0xa6f55704f504a2a6UL, 0xb96869b868b801b9UL, 0x0000000000000000UL, 0xc12c99742c74b5c1UL,
0x406080a060a0e040UL, 0xe31fdd211f21c2e3UL, 0x79c8f243c8433a79UL, 0xb6ed772ced2c9ab6UL,
0xd4beb3d9bed90dd4UL, 0x8d4601ca46ca478dUL, 0x67d9ce70d9701767UL, 0x724be4dd4bddaf72UL,
0x94de3379de79ed94UL, 0x98d42b67d467ff98UL, 0xb0e87b23e82393b0UL, 0x854a11de4ade5b85UL,
0xbb6b6dbd6bbd06bbUL, 0xc52a917e2a7ebbc5UL, 0x4fe59e34e5347b4fUL, 0xed16c13a163ad7edUL,
0x86c51754c554d286UL, 0x9ad72f62d762f89aUL, 0x6655ccff55ff9966UL, 0x119422a794a7b611UL,
0x8acf0f4acf4ac08aUL, 0xe910c9301030d9e9UL, 0x0406080a060a0e04UL, 0xfe81e798819866feUL,
0xa0f05b0bf00baba0UL, 0x7844f0cc44ccb478UL, 0x25ba4ad5bad5f025UL, 0x4be3963ee33e754bUL,
0xa2f35f0ef30eaca2UL, 0x5dfeba19fe19445dUL, 0x80c01b5bc05bdb80UL, 0x058a0a858a858005UL,
0x3fad7eecadecd33fUL, 0x21bc42dfbcdffe21UL, 0x7048e0d848d8a870UL, 0xf104f90c040cfdf1UL,
0x63dfc67adf7a1963UL, 0x77c1ee58c1582f77UL, 0xaf75459f759f30afUL, 0x426384a563a5e742UL,
0x2030405030507020UL, 0xe51ad12e1a2ecbe5UL, 0xfd0ee1120e12effdUL, 0xbf6d65b76db708bfUL,
0x814c19d44cd45581UL, 0x1814303c143c2418UL, 0x26354c5f355f7926UL, 0xc32f9d712f71b2c3UL,
0xbee16738e13886beUL, 0x35a26afda2fdc835UL, 0x88cc0b4fcc4fc788UL, 0x2e395c4b394b652eUL,
0x93573df957f96a93UL, 0x55f2aa0df20d5855UL, 0xfc82e39d829d61fcUL, 0x7a47f4c947c9b37aUL,
0xc8ac8befacef27c8UL, 0xbae76f32e73288baUL, 0x322b647d2b7d4f32UL, 0xe695d7a495a442e6UL,
0xc0a09bfba0fb3bc0UL, 0x199832b398b3aa19UL, 0x9ed12768d168f69eUL, 0xa37f5d817f8122a3UL,
0x446688aa66aaee44UL, 0x547ea8827e82d654UL, 0x3bab76e6abe6dd3bUL, 0x0b83169e839e950bUL,
0x8cca0345ca45c98cUL, 0xc729957b297bbcc7UL, 0x6bd3d66ed36e056bUL, 0x283c50443c446c28UL,
0xa779558b798b2ca7UL, 0xbce2633de23d81bcUL, 0x161d2c271d273116UL, 0xad76419a769a37adUL,
0xdb3bad4d3b4d96dbUL, 0x6456c8fa56fa9e64UL, 0x744ee8d24ed2a674UL, 0x141e28221e223614UL,
0x92db3f76db76e492UL, 0x0c0a181e0a1e120cUL, 0x486c90b46cb4fc48UL, 0xb8e46b37e4378fb8UL,
0x9f5d25e75de7789fUL, 0xbd6e61b26eb20fbdUL, 0x43ef862aef2a6943UL, 0xc4a693f1a6f135c4UL,
0x39a872e3a8e3da39UL, 0x31a462f7a4f7c631UL, 0xd337bd5937598ad3UL, 0xf28bff868b8674f2UL,
0xd532b156325683d5UL, 0x8b430dc543c54e8bUL, 0x6e59dceb59eb856eUL, 0xdab7afc2b7c218daUL,
0x018c028f8c8f8e01UL, 0xb16479ac64ac1db1UL, 0x9cd2236dd26df19cUL, 0x49e0923be03b7249UL,
0xd8b4abc7b4c71fd8UL, 0xacfa4315fa15b9acUL, 0xf307fd090709faf3UL, 0xcf25856f256fa0cfUL,
0xcaaf8feaafea20caUL, 0xf48ef3898e897df4UL, 0x47e98e20e9206747UL, 0x1018202818283810UL,
0x6fd5de64d5640b6fUL, 0xf088fb83888373f0UL, 0x4a6f94b16fb1fb4aUL, 0x5c72b8967296ca5cUL,
0x3824706c246c5438UL, 0x57f1ae08f1085f57UL, 0x73c7e652c7522173UL, 0x975135f351f36497UL,
0xcb238d652365aecbUL, 0xa17c59847c8425a1UL, 0xe89ccbbf9cbf57e8UL, 0x3e217c6321635d3eUL,
0x96dd377cdd7cea96UL, 0x61dcc27fdc7f1e61UL, 0x0d861a9186919c0dUL, 0x0f851e9485949b0fUL,
0xe090dbab90ab4be0UL, 0x7c42f8c642c6ba7cUL, 0x71c4e257c4572671UL, 0xccaa83e5aae529ccUL,
0x90d83b73d873e390UL, 0x06050c0f050f0906UL, 0xf701f5030103f4f7UL, 0x1c12383612362a1cUL,
0xc2a39ffea3fe3cc2UL, 0x6a5fd4e15fe18b6aUL, 0xaef94710f910beaeUL, 0x69d0d26bd06b0269UL,
0x17912ea891a8bf17UL, 0x995829e858e87199UL, 0x3a2774692769533aUL, 0x27b94ed0b9d0f727UL,
0xd938a948384891d9UL, 0xeb13cd351335deebUL, 0x2bb356ceb3cee52bUL, 0x2233445533557722UL,
0xd2bbbfd6bbd604d2UL, 0xa9704990709039a9UL, 0x07890e8089808707UL, 0x33a766f2a7f2c133UL,
0x2db65ac1b6c1ec2dUL, 0x3c22786622665a3cUL, 0x15922aad92adb815UL, 0xc92089602060a9c9UL,
0x874915db49db5c87UL, 0xaaff4f1aff1ab0aaUL, 0x5078a0887888d850UL, 0xa57a518e7a8e2ba5UL,
0x038f068a8f8a8903UL, 0x59f8b213f8134a59UL, 0x0980129b809b9209UL, 0x1a1734391739231aUL,
0x65daca75da751065UL, 0xd731b553315384d7UL, 0x84c61351c651d584UL, 0xd0b8bbd3b8d303d0UL,
0x82c31f5ec35edc82UL, 0x29b052cbb0cbe229UL, 0x5a77b4997799c35aUL, 0x1e113c3311332d1eUL,
0x7bcbf646cb463d7bUL, 0xa8fc4b1ffc1fb7a8UL, 0x6dd6da61d6610c6dUL, 0x2c3a584e3a4e622cUL
};

__device__ __forceinline__
void GroestlPQ(uint64_t *HM, const uint64_t *H, const uint64_t *M, uint64_t *T0, uint64_t *T1, uint64_t *T2, uint64_t *T3)
{
uint64_t QM[16];
for (int i = 0; i < 16; ++i) HM[i] = H[i] ^ M[i];

for (int i = 0; i < 16; ++i) QM[i] = M[i];

for (int i = 0; i < 14; ++i)
{
uint64_t Tmp1[16], Tmp2[16];

#pragma unroll
for (int x = 0; x < 16; ++x)
{
Tmp1[x] = HM[x] ^ PC64(x << 4, i);
Tmp2[x] = QM[x] ^ QC64(x << 4, i);
}

GROESTL_RBTT(HM[0], Tmp1, 0, 1, 2, 3, 4, 5, 6, 11);
GROESTL_RBTT(HM[1], Tmp1, 1, 2, 3, 4, 5, 6, 7, 12);
GROESTL_RBTT(HM[2], Tmp1, 2, 3, 4, 5, 6, 7, 8, 13);
GROESTL_RBTT(HM[3], Tmp1, 3, 4, 5, 6, 7, 8, 9, 14);
GROESTL_RBTT(HM[4], Tmp1, 4, 5, 6, 7, 8, 9, 10, 15);
GROESTL_RBTT(HM[5], Tmp1, 5, 6, 7, 8, 9, 10, 11, 0);
GROESTL_RBTT(HM[6], Tmp1, 6, 7, 8, 9, 10, 11, 12, 1);
GROESTL_RBTT(HM[7], Tmp1, 7, 8, 9, 10, 11, 12, 13, 2);
GROESTL_RBTT(HM[8], Tmp1, 8, 9, 10, 11, 12, 13, 14, 3);
GROESTL_RBTT(HM[9], Tmp1, 9, 10, 11, 12, 13, 14, 15, 4);
GROESTL_RBTT(HM[10], Tmp1, 10, 11, 12, 13, 14, 15, 0, 5);
GROESTL_RBTT(HM[11], Tmp1, 11, 12, 13, 14, 15, 0, 1, 6);
GROESTL_RBTT(HM[12], Tmp1, 12, 13, 14, 15, 0, 1, 2, 7);
GROESTL_RBTT(HM[13], Tmp1, 13, 14, 15, 0, 1, 2, 3, 8);
GROESTL_RBTT(HM[14], Tmp1, 14, 15, 0, 1, 2, 3, 4, 9);
GROESTL_RBTT(HM[15], Tmp1, 15, 0, 1, 2, 3, 4, 5, 10);

GROESTL_RBTT(QM[0], Tmp2, 1, 3, 5, 11, 0, 2, 4, 6);
GROESTL_RBTT(QM[1], Tmp2, 2, 4, 6, 12, 1, 3, 5, 7);
GROESTL_RBTT(QM[2], Tmp2, 3, 5, 7, 13, 2, 4, 6, 8);
GROESTL_RBTT(QM[3], Tmp2, 4, 6, 8, 14, 3, 5, 7, 9);
GROESTL_RBTT(QM[4], Tmp2, 5, 7, 9, 15, 4, 6, 8, 10);
GROESTL_RBTT(QM[5], Tmp2, 6, 8, 10, 0, 5, 7, 9, 11);
GROESTL_RBTT(QM[6], Tmp2, 7, 9, 11, 1, 6, 8, 10, 12);
GROESTL_RBTT(QM[7], Tmp2, 8, 10, 12, 2, 7, 9, 11, 13);
GROESTL_RBTT(QM[8], Tmp2, 9, 11, 13, 3, 8, 10, 12, 14);
GROESTL_RBTT(QM[9], Tmp2, 10, 12, 14, 4, 9, 11, 13, 15);
GROESTL_RBTT(QM[10], Tmp2, 11, 13, 15, 5, 10, 12, 14, 0);
GROESTL_RBTT(QM[11], Tmp2, 12, 14, 0, 6, 11, 13, 15, 1);
GROESTL_RBTT(QM[12], Tmp2, 13, 15, 1, 7, 12, 14, 0, 2);
GROESTL_RBTT(QM[13], Tmp2, 14, 0, 2, 8, 13, 15, 1, 3);
GROESTL_RBTT(QM[14], Tmp2, 15, 1, 3, 9, 14, 0, 2, 4);
GROESTL_RBTT(QM[15], Tmp2, 0, 2, 4, 10, 15, 1, 3, 5);
}

for (int i = 0; i < 16; ++i) HM[i] ^= QM[i] ^ H[i];
}

__device__ __forceinline__
void GroestlP(uint64_t *Out, const uint64_t *In, uint64_t *T0, uint64_t *T1, uint64_t *T2, uint64_t *T3)
{
for (int i = 0; i < 16; ++i) Out[i] = In[i];

for (int i = 0; i < 14; ++i)
{
uint64_t H[16];

#pragma unroll
for (int x = 0; x < 16; ++x)
H[x] = Out[x] ^ PC64(x << 4, i);

GROESTL_RBTT(Out[0], H, 0, 1, 2, 3, 4, 5, 6, 11);
GROESTL_RBTT(Out[1], H, 1, 2, 3, 4, 5, 6, 7, 12);
GROESTL_RBTT(Out[2], H, 2, 3, 4, 5, 6, 7, 8, 13);
GROESTL_RBTT(Out[3], H, 3, 4, 5, 6, 7, 8, 9, 14);
GROESTL_RBTT(Out[4], H, 4, 5, 6, 7, 8, 9, 10, 15);
GROESTL_RBTT(Out[5], H, 5, 6, 7, 8, 9, 10, 11, 0);
GROESTL_RBTT(Out[6], H, 6, 7, 8, 9, 10, 11, 12, 1);
GROESTL_RBTT(Out[7], H, 7, 8, 9, 10, 11, 12, 13, 2);
GROESTL_RBTT(Out[8], H, 8, 9, 10, 11, 12, 13, 14, 3);
GROESTL_RBTT(Out[9], H, 9, 10, 11, 12, 13, 14, 15, 4);
GROESTL_RBTT(Out[10], H, 10, 11, 12, 13, 14, 15, 0, 5);
GROESTL_RBTT(Out[11], H, 11, 12, 13, 14, 15, 0, 1, 6);
GROESTL_RBTT(Out[12], H, 12, 13, 14, 15, 0, 1, 2, 7);
GROESTL_RBTT(Out[13], H, 13, 14, 15, 0, 1, 2, 3, 8);
GROESTL_RBTT(Out[14], H, 14, 15, 0, 1, 2, 3, 4, 9);
GROESTL_RBTT(Out[15], H, 15, 0, 1, 2, 3, 4, 5, 10);
}
}
#define WORKSIZE 64

__device__ __forceinline__
void GroestlCompress(uint64_t *State, uint64_t *Msg, uint64_t *T0, uint64_t *T1, uint64_t *T2, uint64_t *T3)
{
uint64_t Output[16];

GroestlPQ(Output, State, Msg, T0, T1, T2, T3);

for (int i = 0; i < 16; ++i) State[i] = Output[i];
}

__global__
void quark_groestl512_gpu_hash_128(uint32_t threads, uint64_t *const __restrict__ g_hash)
{
uint64_t T0[256], T1[256], T2[256], T3[256];

const uint32_t thread = (blockDim.x * blockIdx.x + threadIdx.x);
if (thread < threads)
{
const uint32_t hashPosition = thread;

uint64_t *inpHash = &g_hash[8 * hashPosition];


for (int i = 0; i < 256; i += WORKSIZE)
{
const uint64_t tmp = T0_G[i];
T0[i] = tmp;
T1[i] = ROTL64(tmp, 8UL);
T2[i] = ROTL64(tmp, 16UL);
T3[i] = ROTL64(tmp, 24UL);
}

uint64_t M[16] = { 0 }, H[16] = { 0 }, H2[16];

#pragma unroll
for (int i = 0; i < 16; ++i) M[i] = cuda_swab32(((uint32_t *)inpHash)[i]);

//M[8] = 0x80UL;
//M[15] = 0x0100000000000000UL;

H[15] = 0x0002000000000000UL;

GroestlCompress(H, M, T0, T1, T2, T3);

M[0] = 0x80UL;

for (int i = 1; i < 15; ++i) M[i] = 0x00UL;

M[15] = 0x0200000000000000UL;

GroestlCompress(H, M, T0, T1, T2, T3);

GroestlP(H2, H, T0, T1, T2, T3);

// vstore8(VSWAP8(((ulong8 *)H2)[1] ^ ((ulong8 *)H)[1]), 0, hash->h8);

#pragma unroll
for (int i = 8; i < 16; ++i)
{
H2[i] = (H2[i] ^ H[i]);
}
inpHash[0] = H2[0 + 8];
inpHash[1] = H2[1 + 8];
inpHash[2] = H2[2 + 8];
inpHash[3] = H2[3 + 8];
inpHash[4] = H2[4 + 8];
inpHash[5] = H2[5 + 8];
inpHash[6] = H2[6 + 8];
inpHash[7] = H2[7 + 8];

}

}

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 30, 2017, 06:21:52 PM
 #2555

Sp_ is preparing his next 0.5 BTC by asking

There are several bounties on a opensource cuda xevan miner.

KRLX can earn 0.25-0.5 BTC in bounties if he publish his work.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
krnlx
Full Member
***
Offline Offline

Activity: 243
Merit: 105


View Profile
September 30, 2017, 06:32:56 PM
 #2556

Next up is groestl 1024.

broken..


Code:
__global__ 
void quark_groestl512_gpu_hash_128(uint32_t threads, uint64_t *const __restrict__ g_hash)
{
uint64_t T0[256], T1[256], T2[256], T3[256];

const uint32_t thread = (blockDim.x * blockIdx.x + threadIdx.x);
if (thread < threads)
{
const uint32_t hashPosition = thread;

uint64_t *inpHash = &g_hash[8 * hashPosition];


for (int i = 0; i < 256; i += WORKSIZE)
{
const uint64_t tmp = T0_G[i];
T0[i] = tmp;
T1[i] = ROTL64(tmp, 8UL);
T2[i] = ROTL64(tmp, 16UL);
T3[i] = ROTL64(tmp, 24UL);
}

uint64_t M[16] = { 0 }, H[16] = { 0 }, H2[16];

#pragma unroll
for (int i = 0; i < 16; ++i) M[i] = cuda_swab32(((uint32_t *)inpHash)[i]);

//M[8] = 0x80UL;
//M[15] = 0x0100000000000000UL;

H[15] = 0x0002000000000000UL;

GroestlCompress(H, M, T0, T1, T2, T3);

M[0] = 0x80UL;

for (int i = 1; i < 15; ++i) M[i] = 0x00UL;

M[15] = 0x0200000000000000UL;

GroestlCompress(H, M, T0, T1, T2, T3);

GroestlP(H2, H, T0, T1, T2, T3);

// vstore8(VSWAP8(((ulong8 *)H2)[1] ^ ((ulong8 *)H)[1]), 0, hash->h8);

#pragma unroll
for (int i = 8; i < 16; ++i)
{
H2[i] = (H2[i] ^ H[i]);
}
inpHash[0] = H2[0 + 8];
inpHash[1] = H2[1 + 8];
inpHash[2] = H2[2 + 8];
inpHash[3] = H2[3 + 8];
inpHash[4] = H2[4 + 8];
inpHash[5] = H2[5 + 8];
inpHash[6] = H2[6 + 8];
inpHash[7] = H2[7 + 8];

}

}

It needs complete rewrite. simd512 will be the most difficult
antantti
Legendary
*
Offline Offline

Activity: 1176
Merit: 1015


View Profile
September 30, 2017, 06:33:52 PM
 #2557

Sp_ is preparing his next 0.5 BTC by asking
KRLX can earn 0.25-0.5 BTC in bounties if he publish his work.

 Smiley Smiley Smiley

krnlx
Full Member
***
Offline Offline

Activity: 243
Merit: 105


View Profile
September 30, 2017, 06:34:34 PM
 #2558

KRLX can earn 0.25-0.5 BTC in bounties if he publish his work.

They want windows version. I dont have windows computers.
pr0ximus
Full Member
***
Offline Offline

Activity: 142
Merit: 100


View Profile
September 30, 2017, 06:39:11 PM
 #2559

KRLX can earn 0.25-0.5 BTC in bounties if he publish his work.

They want windows version. I dont have windows computers.
Make linux version and ask someone you know to port it to windows (just like skunk) Smiley
sp_
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 30, 2017, 06:46:54 PM
 #2560

I can port to windows and will do it for free. pm me.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
Pages: « 1 ... 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 [128] 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!