crz
Member
Offline
Activity: 116
Merit: 10
|
|
July 22, 2014, 02:35:09 PM |
|
I uploaded a new copy. Give it another shot.
Let me know the outcome.
Carlo
Working perfect, thanks!
|
Branding/Graphics, UI/UX Designer, Front/Back-end Developer. (IRC (freenode): crz)
|
|
|
antonio8
Legendary
Offline
Activity: 1386
Merit: 1000
|
|
July 22, 2014, 02:58:47 PM |
|
I uploaded a new copy. Give it another shot.
Let me know the outcome.
Carlo
Is this with whirlcoin also like djm34? Just curios before downloading.
|
If you are going to leave your BTC on an exchange please send it to this address instead 1GH3ub3UUHbU5qDJW5u3E9jZ96ZEmzaXtG, I will at least use the money better than someone who steals it from the exchange. Thanks
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 22, 2014, 03:09:27 PM |
|
I just wanted to say thanks for all the solid work guys on the new versions of ccminer, nvminer and that profit calculator. I just completed my AMD to nvidia conversion and wanted to share my progress. Many headaches but its now rocking and rolling. 900+ pages and going strong! https://i.imgur.com/WyZVcYi.jpgNice, I am also replacing my 280x with 750ti (40 to 65), will save like 1/2 of the power costs. Here is a recommendation. Don't pick up any used cards on ebay. I got sent some cards that were problematic and they all came off ebay. Luckily I was able to swap them out for new ones due to Amazons lenient returns policy. Thanks for the recommendation. I only use new cards with warranty, had a series of 280x from Sapphire where Capacitors just blew after 2 weeks. Was no issue getting them refunded. lol, buying used card, you probably gets card which were used 24/7 non stop by miners
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
cayars
|
|
July 22, 2014, 03:40:06 PM |
|
I uploaded a new copy. Give it another shot.
Let me know the outcome.
Carlo
Is this with whirlcoin also like djm34? Just curios before downloading. Yes and the description also gives hash rates for each algo.
|
|
|
|
tarzanbigcity
|
|
July 22, 2014, 03:48:01 PM |
|
Any plans to work in the split miner layout into the next few versions of ccminer or nvminer?
|
|
|
|
sp_
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
July 22, 2014, 04:12:03 PM |
|
djm34:
ccminer / x13 / cuda_shabal512.cu:
#define PERM_ELT(xa0, xa1, xb0, xb1, xb2, xb3, xc, xm) { \ xa0 = T32((xa0 \ ^ (((xa1 << 15) | (xa1 >> 17)) * 5U) \ ^ xc) * 3U) \ ^ xb1 ^ (xb2 & ~xb3) ^ xm; \ xb0 = T32(~(((xb0 << 1) | (xb0 >> 31)) ^ xa0)); \ } Rewrite to:
#define PERM_ELT(xa0, xa1, xb0, xb1, xb2, xb3, xc, xm) { \ xa0 = T32((xa0 \ ^ ((SPH_ROTL32(xa1, 15) * 5U) \ ^ xc) * 3U) \ ^ xb1 ^ (xb2 & ~xb3) ^ xm; \ xb0 = T32(~((SPH_ROTL32(xb0, 1) ^ xa0)); \ }
I have disassembled the latest binary , still the funnel shift is not used in many of the algorithms x11/x13 etc . I am to lazy to make a build, and I don't have a maxwell card here to test.
(cuda_shabal512.cu: compute 3_5)
.reg .b32 %rhs; shl.b32 %lhs, %r321, 17; shr.b32 %rhs, %r321, 15; add.u32 %r322, %lhs, %rhs;
can be reduced to one instruction.
|
|
|
|
sp_
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
July 22, 2014, 04:29:45 PM |
|
cuda_x11_luffa512.cu:
define TWEAK(a0,a1,a2,a3,j)\ a0 = (a0<<(j))|(a0>>(32-j));\ a1 = (a1<<(j))|(a1>>(32-j));\ a2 = (a2<<(j))|(a2>>(32-j));\ a3 = (a3<<(j))|(a3>>(32-j)); ------->
define TWEAK(a0,a1,a2,a3,j)\ a0 = SPH_ROTL32(a0, j);\ a1 = SPH_ROTL32(a1, j);\ a2 = SPH_ROTL32(a2, j);\ a3 = SPH_ROTL32(a3, j);
|
|
|
|
opus.224
|
|
July 22, 2014, 04:30:05 PM |
|
Does anyone mine JPC @ dwarfpool and get a lot of "booo's" ? Since three days or so (since the fork), I only get like 88-90% good shares shown in the console (ubuntu 14.04 x64). But the admin of dwarfpool checked my shares, and they are up to 100% accepted. Why does it show only ~90% good shares on my side, but on the pool everything I send is accepted? Never had that problem with JPC before.
Hi. Mining there with 3x750ti, no booo's at all in my case. Using official ccminer v1.2, getting 5.2-5.5 mhs per card. Same here! More than 10% of booos. Normally it was 0.5-1.0%. Running 6x750Ti, winXP_32.
|
|
|
|
sp_
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
July 22, 2014, 04:34:11 PM |
|
cuda_x11_luffa512.cu:
#define MIXWORD(a0,a4)\ a4 ^= a0;\ a0 = (a0<<2) | (a0>>(30));\ a0 ^= a4;\ a4 = (a4<<14) | (a4>>(18));\ a4 ^= a0;\ a0 = (a0<<10) | (a0>>(22));\ a0 ^= a4;\ a4 = (a4<<1) | (a4>>(31));
-------->
a4 ^= a0;\ a0 = SPH_ROTL32(a0, 2);\ a0 ^= a4;\ a4 = SPH_ROTL32(a4, 14);\ a4 ^= a0;\ a0 = SPH_ROTL32(a0, 10);\ a0 ^= a4;\ a4 = SPH_ROTL32(a0, 1);\;
cuda_x11_cubehash512.cu: #define ROTATEUPWARDS7(a) (((a) << 7) | ((a) >> 25)) #define ROTATEUPWARDS11(a) (((a) << 11) | ((a) >> 21))
------->
#define ROTATEUPWARDS7(a) (SPH_ROTL32(a, 7)) #define ROTATEUPWARDS11(a) (SPH_ROTL32(a, 11))
etc..
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 22, 2014, 04:37:12 PM |
|
djm34:
ccminer / x13 / cuda_shabal512.cu:
#define PERM_ELT(xa0, xa1, xb0, xb1, xb2, xb3, xc, xm) { \ xa0 = T32((xa0 \ ^ (((xa1 << 15) | (xa1 >> 17)) * 5U) \ ^ xc) * 3U) \ ^ xb1 ^ (xb2 & ~xb3) ^ xm; \ xb0 = T32(~(((xb0 << 1) | (xb0 >> 31)) ^ xa0)); \ } Rewrite to:
#define PERM_ELT(xa0, xa1, xb0, xb1, xb2, xb3, xc, xm) { \ xa0 = T32((xa0 \ ^ ((SPH_ROTL32(xa1, 15) * 5U) \ ^ xc) * 3U) \ ^ xb1 ^ (xb2 & ~xb3) ^ xm; \ xb0 = T32(~((SPH_ROTL32(xb0, 1) ^ xa0)); \ }
I have disassembled the latest binary , still the funnel shift is not used in many of the algorithms x11/x13 etc . I am to lazy to make a build, and I don't have a maxwell card here to test.
(cuda_shabal512.cu: compute 3_5)
.reg .b32 %rhs; shl.b32 %lhs, %r321, 17; shr.b32 %rhs, %r321, 15; add.u32 %r322, %lhs, %rhs;
I tried on some algo, but it doesn't make much difference. Mostly because ccminer doesn't spend a lot of time on these algo. Shabal represent something less than 5% of the overall time. Things which needs improvement: on 750ti: echo , groestl, whirlpool, hamsi (13%, 12.1%, 10.4%, 9.9% respectively) on 780ti: hamsi, groestl, echo, fugue (15.9%; 12.5%; 12.1%; 7% resp.) whirlpool only 6.9% (numbers are from a unreleased Xxx algo. ) So, sure one can certainly gain a little on luffa, shabal but it won't improve the overall perforamce of the algo. (I tried on luffa actually...). the card isn't bottlenecked by computing time (I mean calculation) but by reading those giant lookup tables (actually I am wondering if it wouldn't be faster to replace them by the original calculation... ) if you can tell me how to make .reg .b32 %rhs; shl.b32 %lhs, %r321, 17; shr.b32 %rhs, %r321, 15; add.u32 %r322, %lhs, %rhs; in one line, I will be happy though...
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 22, 2014, 04:39:46 PM |
|
cuda_x11_luffa512.cu:
#define MIXWORD(a0,a4)\ a4 ^= a0;\ a0 = (a0<<2) | (a0>>(30));\ a0 ^= a4;\ a4 = (a4<<14) | (a4>>(18));\ a4 ^= a0;\ a0 = (a0<<10) | (a0>>(22));\ a0 ^= a4;\ a4 = (a4<<1) | (a4>>(31));
-------->
a4 ^= a0;\ a0 = SPH_ROTL32(a0, 2);\ a0 ^= a4;\ a4 = SPH_ROTL32(a4, 14);\ a4 ^= a0;\ a0 = SPH_ROTL32(a0, 10);\ a0 ^= a4;\ a4 = SPH_ROTL32(a0, 1);\;
cuda_x11_cubehash512.cu: #define ROTATEUPWARDS7(a) (((a) << 7) | ((a) >> 25)) #define ROTATEUPWARDS11(a) (((a) << 11) | ((a) >> 21))
------->
#define ROTATEUPWARDS7(a) (SPH_ROTL32(a, 7)) #define ROTATEUPWARDS11(a) (SPH_ROTL32(a, 11))
etc..
those I tried... no difference (unless they are in a big loop, it doesn't make a lot of difference)
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
sp_
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
July 22, 2014, 04:56:20 PM |
|
djm34: if you can tell me how to make .reg .b32 %rhs; shl.b32 %lhs, %r321, 17; shr.b32 %rhs, %r321, 15; add.u32 %r322, %lhs, %rhs;
in one line, I will be happy though...
Compile a new version with my changes, disassemble and you will see. Did you try to tweak the register count? Fixed to 80 for all compute versions seems a bit strange. Fatbin ptx code: ================ arch = sm_35 code version = [3,2] producer = cuda host = windows compile_size = 32bit compressed identifier = C:/CCMiner/nvminer/x13/cuda_shabal512.cu ptxasOptions = -v -abi=no -v -maxrregcount=80
|
|
|
|
sp_
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
July 22, 2014, 05:12:26 PM |
|
In the hamsi and the SBOX macro there seems to be Read-after-write register dependency stalls.
Register Dependency Read-after-write register dependency Instruction’s result can be read ~24 cycles later
#define SBOX(a, b, c, d) { \ uint32_t t; \ t = (a); \ (a) &= (c); \ (Stall) (a) ^= (d); \ (Stall) (c) ^= (b); \ (c) ^= (a); \ (Stall) (d) |= t; \ (d) ^= (b); \ (Stall) t ^= (c); \ (b) = (d); \ (d) |= t; \ (Stall) (d) ^= (a); \ (Stall) (a) &= (b); \ (Stall) t ^= (a); \ (Stall) (b) ^= (d); \ (b) ^= t; \ (Stall) (a) = (c); \ (c) = (b); \ (Stall) (b) = (d); \ (Stall) (d) = SPH_T32(~t); \ (Stall) }
|
|
|
|
Neo.op
|
|
July 22, 2014, 05:22:03 PM |
|
wondering how much does this take to compile? If I remember correctly last time around things took 2 hrs or so. Edit: It always get stuck at z24x13_fugue512_.... what algo is that exactly?
|
LYR | ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓███ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█████ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓████▓███ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓████▓▓▓▓████ ▓▓▓▓▓▓▓▓▓▓▄█████▓▓▓▓▓▓▓▓█████▄ ▓▓▓▓▓▓▓▓▓▓▓▓████▓▓▓▓▓▓▓▓████ ▓▓▓▓▓▓▓▓▓▓▓▄▄▓████▓▓▓▓████ ▓▓▓▓▓▓▓▓▓▓███▄▓▓████▓███ ▓▓▓▓▓▓▓▓▓██████▄▓▓█████ ▓▓▓▓▓▓▓▓█████████▄▓███ ▓▓▓▓▓▓▓██████▀▓███▄▓█ ▓▓▓▓▓▓██████▓▓▓█████ ▓▓▓▓▓██████▓▓▓▓█████ ▓▓▓▓██████▄▄▄▄▄██████ ▓▓▓██████████████████ ▓▓███████████████████ ▓██████▓▓▓▓▓▓▓▓▓▓█████ ██████▓▓▓▓▓▓▓▓▓▓▓█████ ██████▓▓▓▓▓▓▓▓▓▓▓▓█████ | LYRA Loyalty Rewards on Blockchain
══════════════════════[ Main Features ]══════════════════════ ✓Customizable tokens ███✓NFT███ ✓DeFi███ ✓DEX███ ✓Revenue share | | | |
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 22, 2014, 05:42:25 PM |
|
djm34: if you can tell me how to make .reg .b32 %rhs; shl.b32 %lhs, %r321, 17; shr.b32 %rhs, %r321, 15; add.u32 %r322, %lhs, %rhs;
in one line, I will be happy though...
Compile a new version with my changes, disassemble and you will see. Did you try to tweak the register count? Fixed to 80 for all compute versions seems a bit strange. Fatbin ptx code: ================ arch = sm_35 code version = [3,2] producer = cuda host = windows compile_size = 32bit compressed identifier = C:/CCMiner/nvminer/x13/cuda_shabal512.cu ptxasOptions = -v -abi=no -v -maxrregcount=80 You don't really need to disassemble, ptx files are written during compilations for each kernels... regarding the maxregcount, well it is a maxregcount, if it doesn't require that much it will use less... (and Christian did it ) I played a bit with it, the problem is that the big kernels don't like it that much. With the current maxregcount, the occupancy is around 50% for most of the kernel and decreasing it, doesn't really increase the performance. Actually it works only for whirlpool where using 64 reg, gives somewhat better performance (but I get lots of spilled bytes... ) Regarding Shabal, it uses 66 registers for an occupancy of 37.5%. I guess I could decrease a bit the register count. but as I said computing time spent on shabal, I just checked, is only 1.4% on gtx750ti and 1.% on 780ti (luffa 3.1% and 3.9%), actually shabal is already one of the fastest algorithm of the bunch...
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 22, 2014, 05:45:35 PM |
|
wondering how much does this take to compile? If I remember correctly last time around things took 2 hrs or so. Edit: It always get stuck at z24x13_fugue512_.... what algo is that exactly? lol, you didn't compile yet whirlpool, the wait isn't over... (upgrade to 6.0 or 6.5 cuda version it will be faster)
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 22, 2014, 05:48:33 PM |
|
In the hamsi and the SBOX macro there seems to be Read-after-write register dependency stalls.
Register Dependency Read-after-write register dependency Instruction’s result can be read ~24 cycles later
#define SBOX(a, b, c, d) { \ uint32_t t; \ t = (a); \ (a) &= (c); \ (Stall) (a) ^= (d); \ (Stall) (c) ^= (b); \ (c) ^= (a); \ (Stall) (d) |= t; \ (d) ^= (b); \ (Stall) t ^= (c); \ (b) = (d); \ (d) |= t; \ (Stall) (d) ^= (a); \ (Stall) (a) &= (b); \ (Stall) t ^= (a); \ (Stall) (b) ^= (d); \ (b) ^= t; \ (Stall) (a) = (c); \ (c) = (b); \ (Stall) (b) = (d); \ (Stall) (d) = SPH_T32(~t); \ (Stall) }
I wrote one got removed though... { uint32_t t; t = a; asm("and.b32 %0,%0,%1;" : "+r"(a) : "r"(c)); asm("xor.b32 %0,%0,%1;" : "+r"(a) : "r"(d)); asm("xor.b32 %0,%0,%1;" : "+r"(c) : "r"(b)); asm("xor.b32 %0,%0,%1;" : "+r"(c) : "r"(a)); asm( "or.b32 %0,%0,%1;" : "+r"(d) : "r"(t)); asm("xor.b32 %0,%0,%1;" : "+r"(d) : "r"(b)); asm("xor.b32 %0,%0,%1;" : "+r"(t) : "r"(c)); b=d; asm( "or.b32 %0,%0,%1;" : "+r"(d) : "r"(t)); asm("xor.b32 %0,%0,%1;" : "+r"(d) : "r"(a)); asm("and.b32 %0,%0,%1;" : "+r"(a) : "r"(b)); asm("xor.b32 %0,%0,%1;" : "+r"(t) : "r"(a)); asm("xor.b32 %0,%0,%1;" : "+r"(b) : "r"(d)); asm("xor.b32 %0,%0,%1;" : "+r"(b) : "r"(t)); a=c; c=b; b=d; asm("not.b32 %0,%1;" : "=r"(d) : "r"(t)); //asm("xor.b32 %0,%0,0xFFFFFFFF;" : "+r"(d)); I assumed it was the same using several asm statement or one with several line. (knowing that I need to declare additional temp variable )
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
Neo.op
|
|
July 22, 2014, 06:07:43 PM |
|
wondering how much does this take to compile? If I remember correctly last time around things took 2 hrs or so. Edit: It always get stuck at z24x13_fugue512_.... what algo is that exactly? lol, you didn't compile yet whirlpool, the wait isn't over... (upgrade to 6.0 or 6.5 cuda version it will be faster) yep still stuck, going along slowly. btw any chances of whirlpool only ccminer like fresh one?
|
LYR | ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓███ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█████ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓████▓███ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓████▓▓▓▓████ ▓▓▓▓▓▓▓▓▓▓▄█████▓▓▓▓▓▓▓▓█████▄ ▓▓▓▓▓▓▓▓▓▓▓▓████▓▓▓▓▓▓▓▓████ ▓▓▓▓▓▓▓▓▓▓▓▄▄▓████▓▓▓▓████ ▓▓▓▓▓▓▓▓▓▓███▄▓▓████▓███ ▓▓▓▓▓▓▓▓▓██████▄▓▓█████ ▓▓▓▓▓▓▓▓█████████▄▓███ ▓▓▓▓▓▓▓██████▀▓███▄▓█ ▓▓▓▓▓▓██████▓▓▓█████ ▓▓▓▓▓██████▓▓▓▓█████ ▓▓▓▓██████▄▄▄▄▄██████ ▓▓▓██████████████████ ▓▓███████████████████ ▓██████▓▓▓▓▓▓▓▓▓▓█████ ██████▓▓▓▓▓▓▓▓▓▓▓█████ ██████▓▓▓▓▓▓▓▓▓▓▓▓█████ | LYRA Loyalty Rewards on Blockchain
══════════════════════[ Main Features ]══════════════════════ ✓Customizable tokens ███✓NFT███ ✓DeFi███ ✓DEX███ ✓Revenue share | | | |
|
|
|
sp_
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
July 22, 2014, 06:23:10 PM |
|
Why not compute 2 hashes in parallell
#define SBOX_pipelined(a, b, c, d,a1,b1,c1,d1) { \ uint32_t t,t1; \ t = (a); \ t1= (a1); \ (a) &= (c); \ (a1) &= (c1); \ (a) ^= (d); \ (a1) ^= (d1); \ (c) ^= (b); \ (c1) ^= (b1); \ (c) ^= (a); \ (c1) ^= (a1); \ (d) |= t; \ (d1) |= t1; \ (d) ^= (b); \ (d1) ^= (b1); \ t ^= (c); \ t1 ^= (c1); \ Etc....
Wouldnt it remove the stalls?
|
|
|
|
unitedminers
|
|
July 22, 2014, 07:24:34 PM |
|
Is it possible to change the fan speed, GPU engine etc and read the GPU temps with ccMiner (Xubuntu)?
|
|
|
|
|