traitrkng
Newbie
Offline
Activity: 40
Merit: 0
|
|
April 07, 2016, 10:39:30 AM |
|
Think it's best sto write here instead open new topic. I downloaded the gnu-miner.exe and relative dependency. Everything works fine (i see the Mhs and block) my question is how i can take the (eventually) sia coin for blocking reward. Also, some instructions to install this plugin? https://github.com/droghio/Sia-GPU-Miner/releases/tag/v2-ui
|
|
|
|
Genoil
|
|
July 12, 2016, 11:31:14 AM Last edit: July 12, 2016, 02:03:16 PM by Genoil |
|
Hey, I got bored with ETH and decided to give this a spin. Forked the SIA GPU Miner written in C and added a few things: - MSVC project files for those who want to hack on it using Windows. - pool support using -Q parameter (works similar to Go Miner) - took the cl kernel from the GO miner to improve performance (roughly +18% on a 7950) - x64 Windows binaries zipped up in the bin folder. Planning to improve nvidia opencl performance using some inline PTX asm. This should bring performance close(r) to the native CUDA miners around. Perhaps improve the CL performance a bit, but that's a crowded market these days. If you like poking around in CL, there's an undocumented command line parameter (-k filename) that allows you to provide a custom CL kernel. https://github.com/Genoil/Sia-GPU-MinerHave fun! [edit] it looks like the optimized kernel using uitn2 needs some work. the client reports found blocks, but my hashrate doesn't show any longer on nanopool. I'll wait a little while, may need to revert back and pay closer attention to what results it gives. stay tuned.Fixed
|
ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
|
|
|
Amph
Legendary
Offline
Activity: 3206
Merit: 1069
|
|
July 12, 2016, 11:48:23 AM |
|
Hey, I got bored with ETH and decided to give this a spin. Forked the SIA GPU Miner written in C and added a few things: - MSVC project files for those who want to hack on it using Windows. - pool support using -Q parameter (works similar to Go Miner) - took the cl kernel from the GO miner to improve performance (roughly +18% on a 7950) - x64 Windows binaries zipped up in the bin folder. Planning to improve nvidia opencl performance using some inline PTX asm. This should bring performance close(r) to the native CUDA miners around. Perhaps improve the CL performance a bit, but that's a crowded market these days. If you like poking around in CL, there's an undocumented command line parameter (-k filename) that allows you to provide a custom CL kernel. https://github.com/Genoil/Sia-GPU-MinerHave fun! [edit] it looks like the optimized kernel using uitn2 needs some work. the client reports found blocks, but my hashrate doesn't show any longer on nanopool. I'll wait a little while, may need to revert back and pay closer attention to what results it gives. stay tuned. is cuda that unfriendly to code with, that it's better to go with opencl first?
|
|
|
|
Genoil
|
|
July 12, 2016, 11:59:40 AM |
|
Hey, I got bored with ETH and decided to give this a spin. Forked the SIA GPU Miner written in C and added a few things: - MSVC project files for those who want to hack on it using Windows. - pool support using -Q parameter (works similar to Go Miner) - took the cl kernel from the GO miner to improve performance (roughly +18% on a 7950) - x64 Windows binaries zipped up in the bin folder. Planning to improve nvidia opencl performance using some inline PTX asm. This should bring performance close(r) to the native CUDA miners around. Perhaps improve the CL performance a bit, but that's a crowded market these days. If you like poking around in CL, there's an undocumented command line parameter (-k filename) that allows you to provide a custom CL kernel. https://github.com/Genoil/Sia-GPU-MinerHave fun! [edit] it looks like the optimized kernel using uitn2 needs some work. the client reports found blocks, but my hashrate doesn't show any longer on nanopool. I'll wait a little while, may need to revert back and pay closer attention to what results it gives. stay tuned. is cuda that unfriendly to code with, that it's better to go with opencl first? No quite the opposite. But I don't have any CUDA GPUs at the moment. Sold my 970 for a good price just before RX 480 hit the market, with the idea of exchanging it for a 1070. But now with 1060 coming, I'll wait for that as I'm only really interested in the architecture, not the performance. With ethash I learned that some inline PTX assembly in OpenCL can do a lot to match Nvidia OpenCL with CUDA. It means a whole lot of host and device code less to maintain.
|
ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
|
|
|
vaulter
|
|
July 12, 2016, 12:18:33 PM |
|
Hey, I got bored with ETH and decided to give this a spin. Forked the SIA GPU Miner written in C and added a few things: - MSVC project files for those who want to hack on it using Windows. - pool support using -Q parameter (works similar to Go Miner) - took the cl kernel from the GO miner to improve performance (roughly +18% on a 7950) - x64 Windows binaries zipped up in the bin folder. Planning to improve nvidia opencl performance using some inline PTX asm. This should bring performance close(r) to the native CUDA miners around. Perhaps improve the CL performance a bit, but that's a crowded market these days. If you like poking around in CL, there's an undocumented command line parameter (-k filename) that allows you to provide a custom CL kernel. https://github.com/Genoil/Sia-GPU-MinerHave fun! [edit] it looks like the optimized kernel using uitn2 needs some work. the client reports found blocks, but my hashrate doesn't show any longer on nanopool. I'll wait a little while, may need to revert back and pay closer attention to what results it gives. stay tuned. is cuda that unfriendly to code with, that it's better to go with opencl first? No quite the opposite. But I don't have any CUDA GPUs at the moment. Sold my 970 for a good price just before RX 480 hit the market, with the idea of exchanging it for a 1070. But now with 1060 coming, I'll wait for that as I'm only really interested in the architecture, not the performance. With ethash I learned that some inline PTX assembly in OpenCL can do a lot to match Nvidia OpenCL with CUDA. It means a whole lot of host and device code less to maintain. How about dual eth+sia on NVIDIA? What do you think of it?
|
|
|
|
Genoil
|
|
July 12, 2016, 12:22:08 PM |
|
Hey, I got bored with ETH and decided to give this a spin. Forked the SIA GPU Miner written in C and added a few things: - MSVC project files for those who want to hack on it using Windows. - pool support using -Q parameter (works similar to Go Miner) - took the cl kernel from the GO miner to improve performance (roughly +18% on a 7950) - x64 Windows binaries zipped up in the bin folder. Planning to improve nvidia opencl performance using some inline PTX asm. This should bring performance close(r) to the native CUDA miners around. Perhaps improve the CL performance a bit, but that's a crowded market these days. If you like poking around in CL, there's an undocumented command line parameter (-k filename) that allows you to provide a custom CL kernel. https://github.com/Genoil/Sia-GPU-MinerHave fun! [edit] it looks like the optimized kernel using uitn2 needs some work. the client reports found blocks, but my hashrate doesn't show any longer on nanopool. I'll wait a little while, may need to revert back and pay closer attention to what results it gives. stay tuned. is cuda that unfriendly to code with, that it's better to go with opencl first? No quite the opposite. But I don't have any CUDA GPUs at the moment. Sold my 970 for a good price just before RX 480 hit the market, with the idea of exchanging it for a 1070. But now with 1060 coming, I'll wait for that as I'm only really interested in the architecture, not the performance. With ethash I learned that some inline PTX assembly in OpenCL can do a lot to match Nvidia OpenCL with CUDA. It means a whole lot of host and device code less to maintain. How about dual eth+sia on NVIDIA? What do you think of it? I *think* of it, just like everybody . Problem with the shares seems fixed. The kernel is fine, it had something to do with the query string and ampersands...
|
ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
|
|
|
Genoil
|
|
July 12, 2016, 09:49:10 PM |
|
I'm curious what this does on CUDA devices: In the cl kernel, replace the ror64 with this and change the calls to ror64_2 to ror64: static uint2 ror64(const uint2 a, const int offset) { uint2 result; if (offset < 32) { asm("shf.r.wrap.b32 %0, %1, %2, %3;" : "=r"(result.x) : "r"(a.x), "r"(a.y), "r"(offset)); asm("shf.r.wrap.b32 %0, %1, %2, %3;" : "=r"(result.y) : "r"(a.y), "r"(a.x), "r"(offset)); } else { asm("shf.r.wrap.b32 %0, %1, %2, %3;" : "=r"(result.x) : "r"(a.y), "r"(a.x), "r"(offset)); asm("shf.r.wrap.b32 %0, %1, %2, %3;" : "=r"(result.y) : "r"(a.x), "r"(a.y), "r"(offset)); } return result; } I was on AWS to try it but then realized I needs Compute 3.5 at least.. Is there any reasonably priced Maxwell in the cloud yet?
|
ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
|
|
|
miningpoolhub
Legendary
Offline
Activity: 1456
Merit: 1006
Mining Pool Hub
|
|
July 13, 2016, 12:31:47 AM |
|
Any plan for stratum support?
miningpoolhub would like to support new stratum protocol and open siacoin pool.
|
|
|
|
miningpoolhub
Legendary
Offline
Activity: 1456
Merit: 1006
Mining Pool Hub
|
|
July 13, 2016, 01:55:43 AM |
|
Any plan for stratum support?
miningpoolhub would like to support new stratum protocol and open siacoin pool.
I just request one thing for the spec. Pass the NETWORK diff, as well as the share diff, in there. If working miner comes out, I can test and open the pool. Maybe can you implement that?
|
|
|
|
cijulangboy
|
|
July 13, 2016, 04:00:56 AM |
|
Its working now to me either , but the rig shows activity only on one of the cards. Speed is about 32MH/s ok thanks for information
|
|
|
|
sp_
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
July 13, 2016, 06:59:47 AM |
|
Klaus_t's cuda port is doing 355MHASH on the 750ti(64 bit cuda 8.0). Genoils opencl kernal does around 303MHASH. But the klaus_t version is getting alot of reject's on the pool. https://github.com/KlausT/Sia-CUDA-Miner
|
|
|
|
Amph
Legendary
Offline
Activity: 3206
Merit: 1069
|
|
July 13, 2016, 07:04:12 AM |
|
klaus version still need more than one instance, should be an easy fix for someone that know how to code...
|
|
|
|
Genoil
|
|
July 13, 2016, 07:06:34 AM |
|
Klaus_t's cuda port is doing 355MHASH on the 750ti(64 bit cuda 8.0). Genoils opencl kernal does around 303MHASH. But the klaus_t version is getting alot of reject's on the pool. https://github.com/KlausT/Sia-CUDA-MinerWhat does it do with the funnel shift asm? Or did you already paste that in? (btw it is NOT my kernel. i only provided windows build files and added pool support.) my fork also is still single threaded. i wasn't actually planning to build a proper miner around it. just a bit of kernel play.
|
ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
|
|
|
sp_
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
July 13, 2016, 07:07:00 AM |
|
Wall timings for the 750ti.
Genoil: 60watt @ 303MHASH Klaus_t:40watt @ 356MHASH
|
|
|
|
sp_
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
July 13, 2016, 07:22:03 AM |
|
Klaus_t's cuda port is doing 355MHASH on the 750ti(64 bit cuda 8.0). Genoils opencl kernal does around 303MHASH. But the klaus_t version is getting alot of reject's on the pool. https://github.com/KlausT/Sia-CUDA-MinerWhat does it do with the funnel shift asm? Or did you already paste that in? (btw it is NOT my kernel. i only provided windows build files and added pool support.) my fork also is still single threaded. i wasn't actually planning to build a proper miner around it. just a bit of kernel play. I managed to add some more hash by rewriting to uint2 like this: #undef rotr64 #undef __byte_perm_64 #undef __swap_hilo
__inline__ __device__ uint2 rotr64(const uint2 a, const int offset) { uint2 result; if (offset < 32) { asm("shf.r.wrap.b32 %0, %1, %2, %3;" : "=r"(result.x) : "r"(a.x), "r"(a.y), "r"(offset)); asm("shf.r.wrap.b32 %0, %1, %2, %3;" : "=r"(result.y) : "r"(a.y), "r"(a.x), "r"(offset)); } else { asm("shf.r.wrap.b32 %0, %1, %2, %3;" : "=r"(result.x) : "r"(a.y), "r"(a.x), "r"(offset)); asm("shf.r.wrap.b32 %0, %1, %2, %3;" : "=r"(result.y) : "r"(a.x), "r"(a.y), "r"(offset)); } return result; }
__device__ __forceinline__ uint2 __byte_perm_64(const uint2 source, const uint32_t grab1, const uint32_t grab2) { uint2 r; asm("prmt.b32 %0, %1, %2, %3;" : "=r"(r.x) : "r"(source.x), "r"(source.y), "r"(grab1)); asm("prmt.b32 %0, %1, %2, %3;" : "=r"(r.y) : "r"(source.x), "r"(source.y), "r"(grab2)); return r; }
__device__ __forceinline__ uint2 __swap_hilo(const uint2 source) { uint2 r;
r.x = source.y; r.y = source.x;
return r; }
__global__ void __launch_bounds__(blocksize, 4) nonceGrind_SP(const uint2 * __restrict__ headerIn, uint2 * __restrict__ hashOut, uint2 * __restrict__ nonceOut, const uint2 * __restrict__ v1, uint32_t target) { uint2 header[10], h[4], v[16];
uint32_t id = (blockDim.x * blockIdx.x + threadIdx.x)*npt;
#pragma unroll for (int i = 0; i < 10; i++) header[i] = headerIn[i];
for (int n = id; n < id + npt; n++) { ((uint32_t*)header)[8] = n; v[2] = vectorize(0x5BF2CD1EF9D6B596u) + header[4]; v[14] = __swap_hilo(~vectorize(0x1f83d9abfb41bd6bu) ^ v[2]); v[10] = vectorize(0x3c6ef372fe94f82bu) + v[14]; v[6] = __byte_perm_64(vectorize(0x1f83d9abfb41bd6bu) ^ v[10], 0x6543, 0x2107); v[2] = v[2] + v[6] + header[5]; v[14] = __byte_perm_64(v[14] ^ v[2], 0x5432, 0x1076); v[10] = v[10] + v[14]; v[6] = rotr64(v[6] ^ v[10], 63); v[3] = vectorize(0x130C253729B586Au) + header[6]; v[15] = __swap_hilo(vectorize(0x5be0cd19137e2179u) ^ v[3]); v[11] = vectorize(0xa54ff53a5f1d36f1u) + v[15]; v[7] = __byte_perm_64(vectorize(0x5be0cd19137e2179u) ^ v[11], 0x6543, 0x2107); v[3] = v[3] + v[7] + header[7]; v[15] = __byte_perm_64(v[15] ^ v[3], 0x5432, 0x1076); v[11] = v[11] + v[15]; v[7] = rotr64(v[7] ^ v[11], 63); v[0] = v1[0] + v1[5] + header[8]; v[15] = __swap_hilo(v[15] ^ v[0]); v[10] = v[10] + v[15]; v[5] = __byte_perm_64(v1[5] ^ v[10], 0x6543, 0x2107); v[0] = v[0] + v[5] + header[9]; v[15] = __byte_perm_64(v[15] ^ v[0], 0x5432, 0x1076); v[10] = v[10] + v[15]; v[5] = rotr64(v[5] ^ v[10], 63);
....
etc
You Also need to copy some functions from ccminer. __device__ __forceinline__ uint64_t devectorize(uint2 x) { uint64_t result; asm("mov.b64 %0,{%1,%2}; \n\t" : "=l"(result) : "r"(x.x), "r"(x.y)); return result; }
__device__ __forceinline__ uint2 vectorize(const uint64_t x) { uint2 result; asm("mov.b64 {%0,%1},%2; \n\t" : "=r"(result.x), "=r"(result.y) : "l"(x)); return result; }
static __device__ __forceinline__ uint2 operator^ (uint2 a, uint32_t b) { return make_uint2(a.x^ b, a.y); } static __device__ __forceinline__ uint2 operator^ (uint2 a, uint2 b) { return make_uint2(a.x ^ b.x, a.y ^ b.y); } static __device__ __forceinline__ uint2 operator& (uint2 a, uint2 b) { return make_uint2(a.x & b.x, a.y & b.y); } static __device__ __forceinline__ uint2 operator| (uint2 a, uint2 b) { return make_uint2(a.x | b.x, a.y | b.y); } static __device__ __forceinline__ uint2 operator~ (uint2 a) { return make_uint2(~a.x, ~a.y); } static __device__ __forceinline__ void operator^= (uint2 &a, uint2 b) { a = a ^ b; } static __device__ __forceinline__ uint2 operator+ (uint2 a, uint2 b) { uint2 result; asm( "add.cc.u32 %0,%2,%4; \n\t" "addc.u32 %1,%3,%5; \n\t" : "=r"(result.x), "=r"(result.y) : "r"(a.x), "r"(a.y), "r"(b.x), "r"(b.y)); return result; }
static __device__ __forceinline__ uint2 operator+ (uint2 a, uint32_t b) { uint2 result; asm("add.cc.u32 %0,%2,%4; \n\t" "addc.u32 %1,%3,%5; \n\t" : "=r"(result.x), "=r"(result.y) : "r"(a.x), "r"(a.y), "r"(b), "r"(0)); return result; }
static __device__ __forceinline__ uint2 operator- (uint2 a, uint32_t b) { uint2 result; asm("sub.cc.u32 %0,%2,%4; \n\t" "subc.u32 %1,%3,%5; \n\t" : "=r"(result.x), "=r"(result.y) : "r"(a.x), "r"(a.y), "r"(b), "r"(0)); return result; }
static __device__ __forceinline__ uint2 operator- (uint2 a, uint2 b) { uint2 result; asm("sub.cc.u32 %0,%2,%4; \n\t" "subc.u32 %1,%3,%5; \n\t" : "=r"(result.x), "=r"(result.y) : "r"(a.x), "r"(a.y), "r"(b.x), "r"(b.y)); return result; }
|
|
|
|
sp_
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
July 13, 2016, 07:30:24 AM |
|
But the klaus_t version is getting alot of reject's on the pool.
looks like the rejects are gone with this small code change. There seems to be a bug in the multinonce retrieval. Here I only return only one solution per iteration, and it works. if (*((uint32_t*)h) <= target) { int i; /* uint64_t tmp = devectorize(header[4]); for (i = 0; i < MAXRESULTS; i++) { tmp = atomicCAS(&((uint64_t *)nonceOut)[i], 0, tmp); if (tmp == 0) break; } */ nonceOut[i] = header[4]; // uint64_t tmp = devectorize(header[4])
hashOut[i * 4] = h[0]; v[1] = v[1] + v[6] + header[0]; v[12] = __swap_hilo(v[12] ^ v[1]); v[11] = v[11] + v[12]; v[1] = v[1] + __byte_perm_64(v[6] ^ v[11], 0x6543, 0x2107) + header[2]; v[3] = v[3] + v[4] + header[5]; v[14] = __swap_hilo(v[14] ^ v[3]); v[9] = v[9] + v[14]; v[3] = v[3] + __byte_perm_64(v[4] ^ v[9], 0x6543, 0x2107) + header[3]; hashOut[i * 4 + 1] = vectorize(0xbb67ae8584caa73b) ^ v[1] ^ (v[9] + __byte_perm_64(v[14] ^ v[3], 0x5432, 0x1076)); hashOut[i * 4 + 2] = vectorize(0x3c6ef372fe94f82b) ^ v[2] ^ (v[10] + __byte_perm_64(v[15] ^ v[0], 0x5432, 0x1076)); hashOut[i * 4 + 3] = vectorize(0xa54ff53a5f1d36f1) ^ v[3] ^ (v[11] + __byte_perm_64(v[12] ^ v[1], 0x5432, 0x1076)); return; }
|
|
|
|
marvykkio
|
|
July 13, 2016, 01:13:23 PM |
|
because everyone crazy for siacoin? is worth nothing 1 SC = 0.00000091 BTC
|
|
|
|
Amph
Legendary
Offline
Activity: 3206
Merit: 1069
|
|
July 13, 2016, 01:23:32 PM |
|
because everyone crazy for siacoin? is worth nothing 1 SC = 0.00000091 BTC because you can mine a shitton, with one stupid rig of six gpu you can do almost 30k coins a day keep mining useless ethereum which is not profitable by a long shot anymore...
|
|
|
|
BTCBusinessConsult
|
|
July 13, 2016, 01:31:02 PM |
|
So is there a EXE for those of us who want to hobby mine this and have ZERO understanding of the last 3 pages of text? lol
|
EPIC5k Trading on https://spectre.ai/?ref=PassiveIncome. Paying WEEKLY rewards in ETH since 2017. 100% FRAUD FREE Binary Trading Platform. $SXDT. Ask me about the ONLY smart options trading platform with 400% payouts, and their unique EPIC5000 trading system.
|
|
|
navydude
|
|
July 13, 2016, 02:36:20 PM |
|
So is there a EXE for those of us who want to hobby mine this and have ZERO understanding of the last 3 pages of text? lol Ummm Yeah! Me too.
|
|
|
|
|