djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 22, 2014, 10:58:22 PM |
|
which the password to download the x15 file - 07/15/2014?
DA4AF09FE5377715856BA0B10A29C95867053ECBF4105DBDD8957DA78B4127E49E4717DD667CEEF B Don't understand why nobody remember it... not and this is not damn it, I can't remember either I never put any password anywhere... (not sure what you downloaded actually...)
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 22, 2014, 11:03:32 PM |
|
Replace SBOX with sbox_pipelined
In the code:
SBOX(hamsi_s00, hamsi_s08, hamsi_s10, hamsi_s18); \ SBOX(hamsi_s01, hamsi_s09, hamsi_s11, hamsi_s19); \ SBOX(hamsi_s02, hamsi_s0A, hamsi_s12, hamsi_s1A); \ SBOX(hamsi_s03, hamsi_s0B, hamsi_s13, hamsi_s1B); \ SBOX(hamsi_s04, hamsi_s0C, hamsi_s14, hamsi_s1C); \ SBOX(hamsi_s05, hamsi_s0D, hamsi_s15, hamsi_s1D); \ SBOX(hamsi_s06, hamsi_s0E, hamsi_s16, hamsi_s1E); \ SBOX(hamsi_s07, hamsi_s0F, hamsi_s17, hamsi_s1F); \
------>
sbox_pipelined(hamsi_s00, hamsi_s08, hamsi_s10, hamsi_s18,hamsi_s01, hamsi_s09, hamsi_s11, hamsi_s19); \ sbox_pipelined(hamsi_s02, hamsi_s0A, hamsi_s12, hamsi_s1A,hamsi_s03, hamsi_s0B, hamsi_s13, hamsi_s1B); \ sbox_pipelined(hamsi_s04, hamsi_s0C, hamsi_s14, hamsi_s1C,hamsi_s05, hamsi_s0D, hamsi_s15, hamsi_s1D); \ sbox_pipelined(hamsi_s06, hamsi_s0E, hamsi_s16, hamsi_s1E,hamsi_s07, hamsi_s0F, hamsi_s17, hamsi_s1F); \
ok I tried, but again it doesn't make a difference.
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
bitcoinvideos
|
|
July 23, 2014, 03:31:33 AM |
|
Just thought I'd say that DeepCoin is still hella mineable...very ninja type launch on Qubit algo...very under the radar
|
|
|
|
tsiv
|
|
July 23, 2014, 05:27:44 AM |
|
Welp. Managed to split the most offensive part of the kernel into four parallel threads per hash, result is spectacularly unimpressive. The best I've come up with breaks even with the current single thread per hash implementation. Well, almost. It's actually a percent slower AND loses compute 2.0 compatibility due to using shuffle. On the other hands it performs a lot more reasonably with various launch configurations, 15 blocks of 32 threads works our equally well as the original 8x60 magic bullet for 750 Ti. At this point I'm starting to think I'll just forget about that part and start looking if there's something else to be improved. I'm still curious as to how it runs on other hardware, so if a couple of gents on Win boxes with something else than a 750 Ti in would be willing to take it for a spin, I'd appreciate it. I've added the number for SMX/SMM/Whateverthingmabobs into the miner thread start-up info, you'll probably find your card performing best when the block count is a multiple of the SMX count and the number of threads a power of 2. 4/8/16/32/64 are the best bets. https://github.com/tsiv/ccminer-cryptonight/releases/download/v0.15-rc1/ccminer-cryptonight_20140723_exp.zip
|
|
|
|
sp_
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
July 23, 2014, 05:43:47 AM |
|
Replace SBOX with sbox_pipelined
In the code:
SBOX(hamsi_s00, hamsi_s08, hamsi_s10, hamsi_s18); \ SBOX(hamsi_s01, hamsi_s09, hamsi_s11, hamsi_s19); \ SBOX(hamsi_s02, hamsi_s0A, hamsi_s12, hamsi_s1A); \ SBOX(hamsi_s03, hamsi_s0B, hamsi_s13, hamsi_s1B); \ SBOX(hamsi_s04, hamsi_s0C, hamsi_s14, hamsi_s1C); \ SBOX(hamsi_s05, hamsi_s0D, hamsi_s15, hamsi_s1D); \ SBOX(hamsi_s06, hamsi_s0E, hamsi_s16, hamsi_s1E); \ SBOX(hamsi_s07, hamsi_s0F, hamsi_s17, hamsi_s1F); \
------>
sbox_pipelined(hamsi_s00, hamsi_s08, hamsi_s10, hamsi_s18,hamsi_s01, hamsi_s09, hamsi_s11, hamsi_s19); \ sbox_pipelined(hamsi_s02, hamsi_s0A, hamsi_s12, hamsi_s1A,hamsi_s03, hamsi_s0B, hamsi_s13, hamsi_s1B); \ sbox_pipelined(hamsi_s04, hamsi_s0C, hamsi_s14, hamsi_s1C,hamsi_s05, hamsi_s0D, hamsi_s15, hamsi_s1D); \ sbox_pipelined(hamsi_s06, hamsi_s0E, hamsi_s16, hamsi_s1E,hamsi_s07, hamsi_s0F, hamsi_s17, hamsi_s1F); \
ok I tried, but again it doesn't make a difference. But it does when you convert the datastructure to 64 bit. Put hamsi_s00 in the 32bit upper part of the register, and ,hamsi_s01 in the lower part of the 64bit. then you will solve 2 times the data with the same assembly instructions that you had previously (but in 64bit). uint64_t t; t = a; asm("and.b64 %0,%0,%1;" : "+r"(a) : "r"(c)); asm("xor.b64 %0,%0,%1;" : "+r"(a) : "r"(d)); asm("xor.b64 %0,%0,%1;" : "+r"(c) : "r"(b)); asm("xor.b64 %0,%0,%1;" : "+r"(c) : "r"(a)); asm( "or.b64 %0,%0,%1;" : "+r"(d) : "r"(t)); asm("xor.b64 %0,%0,%1;" : "+r"(d) : "r"(b)); asm("xor.b64 %0,%0,%1;" : "+r"(t) : "r"(c)); b=d; asm( "or.b64 %0,%0,%1;" : "+r"(d) : "r"(t)); asm("xor.b64 %0,%0,%1;" : "+r"(d) : "r"(a)); asm("and.b64 %0,%0,%1;" : "+r"(a) : "r"(b)); asm("xor.b64 %0,%0,%1;" : "+r"(t) : "r"(a)); asm("xor.b64 %0,%0,%1;" : "+r"(b) : "r"(d)); asm("xor.b64 %0,%0,%1;" : "+r"(b) : "r"(t)); a=c; c=b; b=d; asm("not.b64 %0,%1;" : "=r"(d) : "r"(t));.... x13 / cuda_x13_hamsi512.cu / #define ROUND_BIG(rc, alpha) { should be rewritten to operate on 64bit integers.
|
|
|
|
PVmining
|
|
July 23, 2014, 07:15:35 AM |
|
Welp. Managed to split the most offensive part of the kernel into four parallel threads per hash, result is spectacularly unimpressive.
Thanks for trying it!
|
|
|
|
Bombadil
|
|
July 23, 2014, 07:52:03 AM |
|
Welp. Managed to split the most offensive part of the kernel into four parallel threads per hash, result is spectacularly unimpressive. The best I've come up with breaks even with the current single thread per hash implementation. Well, almost. It's actually a percent slower AND loses compute 2.0 compatibility due to using shuffle. On the other hands it performs a lot more reasonably with various launch configurations, 15 blocks of 32 threads works our equally well as the original 8x60 magic bullet for 750 Ti. At this point I'm starting to think I'll just forget about that part and start looking if there's something else to be improved. I'm still curious as to how it runs on other hardware, so if a couple of gents on Win boxes with something else than a 750 Ti in would be willing to take it for a spin, I'd appreciate it. I've added the number for SMX/SMM/Whateverthingmabobs into the miner thread start-up info, you'll probably find your card performing best when the block count is a multiple of the SMX count and the number of threads a power of 2. 4/8/16/32/64 are the best bets. https://github.com/tsiv/ccminer-cryptonight/releases/download/v0.15-rc1/ccminer-cryptonight_20140723_exp.zipWolf0 also started on modding your ccminer-mod https://bitcointalk.org/index.php?topic=701910.0
|
|
|
|
DrAlco
Newbie
Offline
Activity: 43
Merit: 0
|
|
July 23, 2014, 11:23:21 AM |
|
At this point I'm starting to think I'll just forget about that part and start looking if there's something else to be improved. I'm still curious as to how it runs on other hardware, so if a couple of gents on Win boxes with something else than a 750 Ti in would be willing to take it for a spin, I'd appreciate it. I've added the number for SMX/SMM/Whateverthingmabobs into the miner thread start-up info, you'll probably find your card performing best when the block count is a multiple of the SMX count and the number of threads a power of 2. 4/8/16/32/64 are the best bets. https://github.com/tsiv/ccminer-cryptonight/releases/download/v0.15-rc1/ccminer-cryptonight_20140723_exp.zipImproved hashrate of about 70H/s on a 780ti. Up from 320 to about 390 (using 8x60). Also doesn't seem to hang and bring the system to it's knees when using all GFX cards.
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 23, 2014, 11:32:15 AM |
|
Replace SBOX with sbox_pipelined
In the code:
SBOX(hamsi_s00, hamsi_s08, hamsi_s10, hamsi_s18); \ SBOX(hamsi_s01, hamsi_s09, hamsi_s11, hamsi_s19); \ SBOX(hamsi_s02, hamsi_s0A, hamsi_s12, hamsi_s1A); \ SBOX(hamsi_s03, hamsi_s0B, hamsi_s13, hamsi_s1B); \ SBOX(hamsi_s04, hamsi_s0C, hamsi_s14, hamsi_s1C); \ SBOX(hamsi_s05, hamsi_s0D, hamsi_s15, hamsi_s1D); \ SBOX(hamsi_s06, hamsi_s0E, hamsi_s16, hamsi_s1E); \ SBOX(hamsi_s07, hamsi_s0F, hamsi_s17, hamsi_s1F); \
------>
sbox_pipelined(hamsi_s00, hamsi_s08, hamsi_s10, hamsi_s18,hamsi_s01, hamsi_s09, hamsi_s11, hamsi_s19); \ sbox_pipelined(hamsi_s02, hamsi_s0A, hamsi_s12, hamsi_s1A,hamsi_s03, hamsi_s0B, hamsi_s13, hamsi_s1B); \ sbox_pipelined(hamsi_s04, hamsi_s0C, hamsi_s14, hamsi_s1C,hamsi_s05, hamsi_s0D, hamsi_s15, hamsi_s1D); \ sbox_pipelined(hamsi_s06, hamsi_s0E, hamsi_s16, hamsi_s1E,hamsi_s07, hamsi_s0F, hamsi_s17, hamsi_s1F); \
ok I tried, but again it doesn't make a difference. But it does when you convert the datastructure to 64 bit. Put hamsi_s00 in the 32bit upper part of the register, and ,hamsi_s01 in the lower part of the 64bit. then you will solve 2 times the data with the same assembly instructions that you had previously (but in 64bit). uint64_t t; t = a; asm("and.b64 %0,%0,%1;" : "+r"(a) : "r"(c)); asm("xor.b64 %0,%0,%1;" : "+r"(a) : "r"(d)); asm("xor.b64 %0,%0,%1;" : "+r"(c) : "r"(b)); asm("xor.b64 %0,%0,%1;" : "+r"(c) : "r"(a)); asm( "or.b64 %0,%0,%1;" : "+r"(d) : "r"(t)); asm("xor.b64 %0,%0,%1;" : "+r"(d) : "r"(b)); asm("xor.b64 %0,%0,%1;" : "+r"(t) : "r"(c)); b=d; asm( "or.b64 %0,%0,%1;" : "+r"(d) : "r"(t)); asm("xor.b64 %0,%0,%1;" : "+r"(d) : "r"(a)); asm("and.b64 %0,%0,%1;" : "+r"(a) : "r"(b)); asm("xor.b64 %0,%0,%1;" : "+r"(t) : "r"(a)); asm("xor.b64 %0,%0,%1;" : "+r"(b) : "r"(d)); asm("xor.b64 %0,%0,%1;" : "+r"(b) : "r"(t)); a=c; c=b; b=d; asm("not.b64 %0,%1;" : "=r"(d) : "r"(t));.... x13 / cuda_x13_hamsi512.cu / #define ROUND_BIG(rc, alpha) { should be rewritten to operate on 64bit integers. the problem, is that it would be necessary to convert the entire algo in 64bit as conversion from 32 to 64bit are rather slow... (won't happen this week)
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
tsiv
|
|
July 23, 2014, 12:19:07 PM |
|
At this point I'm starting to think I'll just forget about that part and start looking if there's something else to be improved. I'm still curious as to how it runs on other hardware, so if a couple of gents on Win boxes with something else than a 750 Ti in would be willing to take it for a spin, I'd appreciate it. I've added the number for SMX/SMM/Whateverthingmabobs into the miner thread start-up info, you'll probably find your card performing best when the block count is a multiple of the SMX count and the number of threads a power of 2. 4/8/16/32/64 are the best bets. https://github.com/tsiv/ccminer-cryptonight/releases/download/v0.15-rc1/ccminer-cryptonight_20140723_exp.zipImproved hashrate of about 70H/s on a 780ti. Up from 320 to about 390 (using 8x60). Also doesn't seem to hang and bring the system to it's knees when using all GFX cards. Seems to be in line with the ~18% improvements I saw when benchmarking only the AES part of the kernel. Have you tried other configs? 390 is still pretty low for a 780 Ti, I think people were getting best results with 4x120 on the 780 Ti.
|
|
|
|
cayars
|
|
July 23, 2014, 12:37:45 PM |
|
tsiv,
Wouldn't you want to have the block size a multiple of 32? Ie 32,64,96,128
|
|
|
|
cayars
|
|
July 23, 2014, 12:50:54 PM |
|
Hey Christian, You taking a siesta?
|
|
|
|
Bombadil
|
|
July 23, 2014, 01:01:04 PM |
|
Hey Christian, You taking a siesta? Christian is our Satoshi Nakamoto, if you know what I mean
|
|
|
|
cayars
|
|
July 23, 2014, 01:09:36 PM Last edit: July 23, 2014, 01:22:58 PM by cayars |
|
Yea, lately that is true. I think djm34 has as many if not algos in ccminer then Christian does now. Carlo EDIT: CCMiner algos: anime (C&C) cryptonight (tsiv) dmd-gr (Bombadil) fresh (djm34) fugue256 (C&C) groestl (C&C) heavy (C&C-based off reorder's cgminer code) jackpot (C&C) mjollnir (C&C-based off reorder's cgminer code) myr-gr (C&C) nist5 (C&C) quark (C&C) qubit (djm34) Whirlcoin (djm34) x11 (C&C) x13 (C&C) x14 (djm34) x15 (djm34) 1 Bombadil 1 tsiv 5 djm34 11 C&C Soon: boolberry - C&C??? ppl - djm34???
|
|
|
|
sp_
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
July 23, 2014, 01:12:18 PM |
|
the problem, is that it would be necessary to convert the entire algo in 64bit as conversion from 32 to 64bit are rather slow... (won't happen this week)
But the reward could be significant. From your previous comment: Things which needs improvement: on 750ti: echo , groestl, whirlpool, hamsi (13%, 12.1%, 10.4%, 9.9% respectively) on 780ti: hamsi, groestl, echo, fugue ( 15.9%; 12.5%; 12.1%; 7% resp.) whirlpool only 6.9% Keep up the good work.
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 23, 2014, 01:40:40 PM |
|
x17 added to my github repository. https://github.com/djm34/ccminerwindows binaries here: https://mega.co.nz/#!EEEElQ7Z!J77zXN1d6pTgHgGIhsJ1BzUkuE8IPyqS4_QyP7lm3Wk (compîled with cuda 6.5) ccminer -a x17 donation: XjPqpkCPoYJJYdQRrVByU7ySpVyeqJmSGU
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 23, 2014, 01:50:07 PM |
|
the problem, is that it would be necessary to convert the entire algo in 64bit as conversion from 32 to 64bit are rather slow... (won't happen this week)
But the reward could be significant. From your previous comment: Things which needs improvement: on 750ti: echo , groestl, whirlpool, hamsi (13%, 12.1%, 10.4%, 9.9% respectively) on 780ti: hamsi, groestl, echo, fugue ( 15.9%; 12.5%; 12.1%; 7% resp.) whirlpool only 6.9% Keep up the good work. yes but there is also new algo coming... too...
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
cayars
|
|
July 23, 2014, 02:06:44 PM |
|
CCMiner algos: anime (C&C) cryptonight (tsiv) dmd-gr (Bombadil) fresh (djm34) fugue256 (C&C) groestl (C&C) heavy (C&C-based off reorder's cgminer code) jackpot (C&C) mjollnir (C&C-based off reorder's cgminer code) myr-gr (C&C) nist5 (C&C) quark (C&C) qubit (djm34) Whirlcoin (djm34) x11 (C&C) x13 (C&C) x14 (djm34) x15 (djm34) x17 (djm34) 1 Bombadil 1 tsiv 6 djm34 11 C&C djm34 is on a massive roll!
|
|
|
|
tsiv
|
|
July 23, 2014, 02:17:03 PM |
|
tsiv,
Wouldn't you want to have the block size a multiple of 32? Ie 32,64,96,128
Ye, full warps do sound tasty. We're starting to get there too. The launch config isn't exactly about threads per block anymore, the kernels are starting to use more than one thread per hash and the launch config is actually hashes per block and blocks per grid. For example the kernels I modified earlier are now running eight threads per hash, so they're actually already at full warp size at four hashes per block. The latest experimental build takes the slowest kernel that is running only a single thread per hash on the latest committed source and spreads it out between four threads per hash. Again, full warp at eight hashes per block while four hashes per block remains kinda iffy.
|
|
|
|
tarzanbigcity
|
|
July 23, 2014, 02:29:02 PM |
|
When I run this build I get "Unable to query number of CUDA device! Is an nVidia driver installed?" Working with 2x EVGA 750TI SC on driver 337.88 Ideas?
|
|
|
|
|