Bitcoin Forum
June 14, 2024, 06:06:10 PM *
News: Voting for pizza day contest
 
   Home   Help Search Login Register More  
Pages: « 1 ... 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 [908] 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 ... 1135 »
  Print  
Author Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]  (Read 3426876 times)
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
July 22, 2014, 10:58:22 PM
 #18141

which the password to download the x15 file - 07/15/2014?

DA4AF09FE5377715856BA0B10A29C95867053ECBF4105DBDD8957DA78B4127E49E4717DD667CEEF B

Don't understand why nobody remember it...  Grin

not and this is not
damn it, I can't remember either  Grin

 Embarrassed
I never put any password anywhere... (not sure what you downloaded actually...)

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
July 22, 2014, 11:03:32 PM
 #18142

Replace SBOX with sbox_pipelined

In the code:

SBOX(hamsi_s00, hamsi_s08, hamsi_s10, hamsi_s18); \
      SBOX(hamsi_s01, hamsi_s09, hamsi_s11, hamsi_s19); \
      SBOX(hamsi_s02, hamsi_s0A, hamsi_s12, hamsi_s1A); \
      SBOX(hamsi_s03, hamsi_s0B, hamsi_s13, hamsi_s1B); \
      SBOX(hamsi_s04, hamsi_s0C, hamsi_s14, hamsi_s1C); \
      SBOX(hamsi_s05, hamsi_s0D, hamsi_s15, hamsi_s1D); \
      SBOX(hamsi_s06, hamsi_s0E, hamsi_s16, hamsi_s1E); \
      SBOX(hamsi_s07, hamsi_s0F, hamsi_s17, hamsi_s1F); \


------>

   sbox_pipelined(hamsi_s00, hamsi_s08, hamsi_s10, hamsi_s18,hamsi_s01, hamsi_s09, hamsi_s11, hamsi_s19); \
   sbox_pipelined(hamsi_s02, hamsi_s0A, hamsi_s12, hamsi_s1A,hamsi_s03, hamsi_s0B, hamsi_s13, hamsi_s1B); \
   sbox_pipelined(hamsi_s04, hamsi_s0C, hamsi_s14, hamsi_s1C,hamsi_s05, hamsi_s0D, hamsi_s15, hamsi_s1D); \
   sbox_pipelined(hamsi_s06, hamsi_s0E, hamsi_s16, hamsi_s1E,hamsi_s07, hamsi_s0F, hamsi_s17, hamsi_s1F); \

ok I tried, but again it doesn't make a difference.

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
bitcoinvideos
Sr. Member
****
Offline Offline

Activity: 251
Merit: 250



View Profile
July 23, 2014, 03:31:33 AM
 #18143

Just thought I'd say that DeepCoin is still hella mineable...very ninja type launch on Qubit algo...very under the radar
tsiv
Full Member
***
Offline Offline

Activity: 137
Merit: 100


View Profile
July 23, 2014, 05:27:44 AM
 #18144

Welp. Managed to split the most offensive part of the kernel into four parallel threads per hash, result is spectacularly unimpressive. The best I've come up with breaks even with the current single thread per hash implementation. Well, almost. It's actually a percent slower AND loses compute 2.0 compatibility due to using shuffle. On the other hands it performs a lot more reasonably with various launch configurations, 15 blocks of 32 threads works our equally well as the original 8x60 magic bullet for 750 Ti.

At this point I'm starting to think I'll just forget about that part and start looking if there's something else to be improved. I'm still curious as to how it runs on other hardware, so if a couple of gents on Win boxes with something else than a 750 Ti in would be willing to take it for a spin, I'd appreciate it. I've added the number for SMX/SMM/Whateverthingmabobs into the miner thread start-up info, you'll probably find your card performing best when the block count is a multiple of the SMX count and the number of threads a power of 2. 4/8/16/32/64 are the best bets.

https://github.com/tsiv/ccminer-cryptonight/releases/download/v0.15-rc1/ccminer-cryptonight_20140723_exp.zip
sp_
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
July 23, 2014, 05:43:47 AM
 #18145

Replace SBOX with sbox_pipelined

In the code:

SBOX(hamsi_s00, hamsi_s08, hamsi_s10, hamsi_s18); \
      SBOX(hamsi_s01, hamsi_s09, hamsi_s11, hamsi_s19); \
      SBOX(hamsi_s02, hamsi_s0A, hamsi_s12, hamsi_s1A); \
      SBOX(hamsi_s03, hamsi_s0B, hamsi_s13, hamsi_s1B); \
      SBOX(hamsi_s04, hamsi_s0C, hamsi_s14, hamsi_s1C); \
      SBOX(hamsi_s05, hamsi_s0D, hamsi_s15, hamsi_s1D); \
      SBOX(hamsi_s06, hamsi_s0E, hamsi_s16, hamsi_s1E); \
      SBOX(hamsi_s07, hamsi_s0F, hamsi_s17, hamsi_s1F); \


------>

   sbox_pipelined(hamsi_s00, hamsi_s08, hamsi_s10, hamsi_s18,hamsi_s01, hamsi_s09, hamsi_s11, hamsi_s19); \
   sbox_pipelined(hamsi_s02, hamsi_s0A, hamsi_s12, hamsi_s1A,hamsi_s03, hamsi_s0B, hamsi_s13, hamsi_s1B); \
   sbox_pipelined(hamsi_s04, hamsi_s0C, hamsi_s14, hamsi_s1C,hamsi_s05, hamsi_s0D, hamsi_s15, hamsi_s1D); \
   sbox_pipelined(hamsi_s06, hamsi_s0E, hamsi_s16, hamsi_s1E,hamsi_s07, hamsi_s0F, hamsi_s17, hamsi_s1F); \

ok I tried, but again it doesn't make a difference.

But it does when you convert the datastructure to 64 bit. Put hamsi_s00 in the 32bit upper part of the register, and ,hamsi_s01 in the lower part of the 64bit. then you will solve 2 times the data with the same assembly instructions that you had previously (but in 64bit).


uint64_t t;
t = a;
asm("and.b64 %0,%0,%1;" : "+r"(a) : "r"(c));
asm("xor.b64 %0,%0,%1;" : "+r"(a) : "r"(d));
asm("xor.b64 %0,%0,%1;" : "+r"(c) : "r"(b));
asm("xor.b64 %0,%0,%1;" : "+r"(c) : "r"(a));
asm( "or.b64 %0,%0,%1;" : "+r"(d) : "r"(t));
asm("xor.b64 %0,%0,%1;" : "+r"(d) : "r"(b));
asm("xor.b64 %0,%0,%1;" : "+r"(t) : "r"(c));
b=d;
asm( "or.b64 %0,%0,%1;" : "+r"(d) : "r"(t));
asm("xor.b64 %0,%0,%1;" : "+r"(d) : "r"(a));
asm("and.b64 %0,%0,%1;" : "+r"(a) : "r"(b));
asm("xor.b64 %0,%0,%1;" : "+r"(t) : "r"(a));
asm("xor.b64 %0,%0,%1;" : "+r"(b) : "r"(d));
asm("xor.b64 %0,%0,%1;" : "+r"(b) : "r"(t));
a=c;
c=b;
b=d;
asm("not.b64 %0,%1;" : "=r"(d) : "r"(t));....


x13 / cuda_x13_hamsi512.cu /

#define ROUND_BIG(rc, alpha) { should be rewritten to operate on 64bit integers.


Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
PVmining
Sr. Member
****
Offline Offline

Activity: 330
Merit: 252



View Profile
July 23, 2014, 07:15:35 AM
 #18146

Welp. Managed to split the most offensive part of the kernel into four parallel threads per hash, result is spectacularly unimpressive.

Thanks for trying it!
Bombadil
Hero Member
*****
Offline Offline

Activity: 644
Merit: 500



View Profile
July 23, 2014, 07:52:03 AM
 #18147

Welp. Managed to split the most offensive part of the kernel into four parallel threads per hash, result is spectacularly unimpressive. The best I've come up with breaks even with the current single thread per hash implementation. Well, almost. It's actually a percent slower AND loses compute 2.0 compatibility due to using shuffle. On the other hands it performs a lot more reasonably with various launch configurations, 15 blocks of 32 threads works our equally well as the original 8x60 magic bullet for 750 Ti.

At this point I'm starting to think I'll just forget about that part and start looking if there's something else to be improved. I'm still curious as to how it runs on other hardware, so if a couple of gents on Win boxes with something else than a 750 Ti in would be willing to take it for a spin, I'd appreciate it. I've added the number for SMX/SMM/Whateverthingmabobs into the miner thread start-up info, you'll probably find your card performing best when the block count is a multiple of the SMX count and the number of threads a power of 2. 4/8/16/32/64 are the best bets.

https://github.com/tsiv/ccminer-cryptonight/releases/download/v0.15-rc1/ccminer-cryptonight_20140723_exp.zip

Wolf0 also started on modding your ccminer-mod Cheesy https://bitcointalk.org/index.php?topic=701910.0
DrAlco
Newbie
*
Offline Offline

Activity: 43
Merit: 0


View Profile
July 23, 2014, 11:23:21 AM
 #18148

At this point I'm starting to think I'll just forget about that part and start looking if there's something else to be improved. I'm still curious as to how it runs on other hardware, so if a couple of gents on Win boxes with something else than a 750 Ti in would be willing to take it for a spin, I'd appreciate it. I've added the number for SMX/SMM/Whateverthingmabobs into the miner thread start-up info, you'll probably find your card performing best when the block count is a multiple of the SMX count and the number of threads a power of 2. 4/8/16/32/64 are the best bets.

https://github.com/tsiv/ccminer-cryptonight/releases/download/v0.15-rc1/ccminer-cryptonight_20140723_exp.zip

Improved hashrate of about 70H/s on a 780ti. Up from 320 to about 390 (using 8x60). Also doesn't seem to hang and bring the system to it's knees when using all GFX cards.
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
July 23, 2014, 11:32:15 AM
 #18149

Replace SBOX with sbox_pipelined

In the code:

SBOX(hamsi_s00, hamsi_s08, hamsi_s10, hamsi_s18); \
      SBOX(hamsi_s01, hamsi_s09, hamsi_s11, hamsi_s19); \
      SBOX(hamsi_s02, hamsi_s0A, hamsi_s12, hamsi_s1A); \
      SBOX(hamsi_s03, hamsi_s0B, hamsi_s13, hamsi_s1B); \
      SBOX(hamsi_s04, hamsi_s0C, hamsi_s14, hamsi_s1C); \
      SBOX(hamsi_s05, hamsi_s0D, hamsi_s15, hamsi_s1D); \
      SBOX(hamsi_s06, hamsi_s0E, hamsi_s16, hamsi_s1E); \
      SBOX(hamsi_s07, hamsi_s0F, hamsi_s17, hamsi_s1F); \


------>

   sbox_pipelined(hamsi_s00, hamsi_s08, hamsi_s10, hamsi_s18,hamsi_s01, hamsi_s09, hamsi_s11, hamsi_s19); \
   sbox_pipelined(hamsi_s02, hamsi_s0A, hamsi_s12, hamsi_s1A,hamsi_s03, hamsi_s0B, hamsi_s13, hamsi_s1B); \
   sbox_pipelined(hamsi_s04, hamsi_s0C, hamsi_s14, hamsi_s1C,hamsi_s05, hamsi_s0D, hamsi_s15, hamsi_s1D); \
   sbox_pipelined(hamsi_s06, hamsi_s0E, hamsi_s16, hamsi_s1E,hamsi_s07, hamsi_s0F, hamsi_s17, hamsi_s1F); \

ok I tried, but again it doesn't make a difference.

But it does when you convert the datastructure to 64 bit. Put hamsi_s00 in the 32bit upper part of the register, and ,hamsi_s01 in the lower part of the 64bit. then you will solve 2 times the data with the same assembly instructions that you had previously (but in 64bit).


uint64_t t;
t = a;
asm("and.b64 %0,%0,%1;" : "+r"(a) : "r"(c));
asm("xor.b64 %0,%0,%1;" : "+r"(a) : "r"(d));
asm("xor.b64 %0,%0,%1;" : "+r"(c) : "r"(b));
asm("xor.b64 %0,%0,%1;" : "+r"(c) : "r"(a));
asm( "or.b64 %0,%0,%1;" : "+r"(d) : "r"(t));
asm("xor.b64 %0,%0,%1;" : "+r"(d) : "r"(b));
asm("xor.b64 %0,%0,%1;" : "+r"(t) : "r"(c));
b=d;
asm( "or.b64 %0,%0,%1;" : "+r"(d) : "r"(t));
asm("xor.b64 %0,%0,%1;" : "+r"(d) : "r"(a));
asm("and.b64 %0,%0,%1;" : "+r"(a) : "r"(b));
asm("xor.b64 %0,%0,%1;" : "+r"(t) : "r"(a));
asm("xor.b64 %0,%0,%1;" : "+r"(b) : "r"(d));
asm("xor.b64 %0,%0,%1;" : "+r"(b) : "r"(t));
a=c;
c=b;
b=d;
asm("not.b64 %0,%1;" : "=r"(d) : "r"(t));....


x13 / cuda_x13_hamsi512.cu /

#define ROUND_BIG(rc, alpha) { should be rewritten to operate on 64bit integers.


the problem, is that it would be necessary to convert the entire algo in 64bit as conversion from 32 to 64bit are rather slow...
(won't happen this week)

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
tsiv
Full Member
***
Offline Offline

Activity: 137
Merit: 100


View Profile
July 23, 2014, 12:19:07 PM
 #18150

At this point I'm starting to think I'll just forget about that part and start looking if there's something else to be improved. I'm still curious as to how it runs on other hardware, so if a couple of gents on Win boxes with something else than a 750 Ti in would be willing to take it for a spin, I'd appreciate it. I've added the number for SMX/SMM/Whateverthingmabobs into the miner thread start-up info, you'll probably find your card performing best when the block count is a multiple of the SMX count and the number of threads a power of 2. 4/8/16/32/64 are the best bets.

https://github.com/tsiv/ccminer-cryptonight/releases/download/v0.15-rc1/ccminer-cryptonight_20140723_exp.zip

Improved hashrate of about 70H/s on a 780ti. Up from 320 to about 390 (using 8x60). Also doesn't seem to hang and bring the system to it's knees when using all GFX cards.

Seems to be in line with the ~18% improvements I saw when benchmarking only the AES part of the kernel. Have you tried other configs? 390 is still pretty low for a 780 Ti, I think people were getting best results with 4x120 on the 780 Ti.
cayars
Full Member
***
Offline Offline

Activity: 168
Merit: 100


View Profile
July 23, 2014, 12:37:45 PM
 #18151

tsiv,

Wouldn't you want to have the block size a multiple of 32?  Ie 32,64,96,128
cayars
Full Member
***
Offline Offline

Activity: 168
Merit: 100


View Profile
July 23, 2014, 12:50:54 PM
 #18152

Hey Christian,

You taking a siesta?  Grin
Bombadil
Hero Member
*****
Offline Offline

Activity: 644
Merit: 500



View Profile
July 23, 2014, 01:01:04 PM
 #18153

Hey Christian,

You taking a siesta?  Grin

Christian is our Satoshi Nakamoto, if you know what I mean Cheesy
cayars
Full Member
***
Offline Offline

Activity: 168
Merit: 100


View Profile
July 23, 2014, 01:09:36 PM
Last edit: July 23, 2014, 01:22:58 PM by cayars
 #18154

Yea, lately that is true. Smiley

I think djm34 has as many if not algos in ccminer then Christian does now.

Carlo

EDIT:
CCMiner algos:
anime (C&C)
cryptonight (tsiv)
dmd-gr (Bombadil)
fresh (djm34)
fugue256 (C&C)
groestl (C&C)
heavy (C&C-based off reorder's cgminer code)
jackpot (C&C)
mjollnir (C&C-based off reorder's cgminer code)
myr-gr (C&C)
nist5 (C&C)
quark (C&C)
qubit (djm34)
Whirlcoin (djm34)
x11 (C&C)
x13 (C&C)
x14 (djm34)
x15 (djm34)

1 Bombadil
1 tsiv
5 djm34
11 C&C

Soon:
boolberry - C&C???
ppl - djm34???
sp_
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
July 23, 2014, 01:12:18 PM
 #18155

the problem, is that it would be necessary to convert the entire algo in 64bit as conversion from 32 to 64bit are rather slow...
(won't happen this week)

But the reward could be significant. Smiley


From  your previous comment:

Things which needs improvement:
on 750ti: echo , groestl, whirlpool, hamsi (13%, 12.1%, 10.4%, 9.9% respectively)
on 780ti: hamsi, groestl, echo, fugue (15.9%; 12.5%; 12.1%; 7% resp.) whirlpool only 6.9%

Keep up the good work.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
July 23, 2014, 01:40:40 PM
 #18156

x17 added to my github repository.
https://github.com/djm34/ccminer

windows binaries here: https://mega.co.nz/#!EEEElQ7Z!J77zXN1d6pTgHgGIhsJ1BzUkuE8IPyqS4_QyP7lm3Wk
(compîled with cuda 6.5)

ccminer -a x17

donation: XjPqpkCPoYJJYdQRrVByU7ySpVyeqJmSGU


djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
July 23, 2014, 01:50:07 PM
 #18157

the problem, is that it would be necessary to convert the entire algo in 64bit as conversion from 32 to 64bit are rather slow...
(won't happen this week)

But the reward could be significant. Smiley


From  your previous comment:

Things which needs improvement:
on 750ti: echo , groestl, whirlpool, hamsi (13%, 12.1%, 10.4%, 9.9% respectively)
on 780ti: hamsi, groestl, echo, fugue (15.9%; 12.5%; 12.1%; 7% resp.) whirlpool only 6.9%

Keep up the good work.
yes but there is also new algo coming... too...  Grin

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
cayars
Full Member
***
Offline Offline

Activity: 168
Merit: 100


View Profile
July 23, 2014, 02:06:44 PM
 #18158

x17 added to my github repository.
https://github.com/djm34/ccminer

windows binaries here: https://mega.co.nz/#!EEEElQ7Z!J77zXN1d6pTgHgGIhsJ1BzUkuE8IPyqS4_QyP7lm3Wk
(compîled with cuda 6.5)

ccminer -a x17

donation: XjPqpkCPoYJJYdQRrVByU7ySpVyeqJmSGU



CCMiner algos:
anime (C&C)
cryptonight (tsiv)
dmd-gr (Bombadil)
fresh (djm34)
fugue256 (C&C)
groestl (C&C)
heavy (C&C-based off reorder's cgminer code)
jackpot (C&C)
mjollnir (C&C-based off reorder's cgminer code)
myr-gr (C&C)
nist5 (C&C)
quark (C&C)
qubit (djm34)
Whirlcoin (djm34)
x11 (C&C)
x13 (C&C)
x14 (djm34)
x15 (djm34)
x17 (djm34)

1 Bombadil
1 tsiv
6 djm34
11 C&C

djm34 is on a massive roll!
tsiv
Full Member
***
Offline Offline

Activity: 137
Merit: 100


View Profile
July 23, 2014, 02:17:03 PM
 #18159

tsiv,

Wouldn't you want to have the block size a multiple of 32?  Ie 32,64,96,128

Ye, full warps do sound tasty. We're starting to get there too. The launch config isn't exactly about threads per block anymore, the kernels are starting to use more than one thread per hash and the launch config is actually hashes per block and blocks per grid. For example the kernels I modified earlier are now running eight threads per hash, so they're actually already at full warp size at four hashes per block. The latest experimental build takes the slowest kernel that is running only a single thread per hash on the latest committed source and spreads it out between four threads per hash. Again, full warp at eight hashes per block while four hashes per block remains kinda iffy.
tarzanbigcity
Sr. Member
****
Offline Offline

Activity: 602
Merit: 250



View Profile
July 23, 2014, 02:29:02 PM
 #18160

x17 added to my github repository.
https://github.com/djm34/ccminer

windows binaries here: https://mega.co.nz/#!EEEElQ7Z!J77zXN1d6pTgHgGIhsJ1BzUkuE8IPyqS4_QyP7lm3Wk
(compîled with cuda 6.5)

ccminer -a x17

donation: XjPqpkCPoYJJYdQRrVByU7ySpVyeqJmSGU



When I run this build I get "Unable to query number of CUDA device! Is an nVidia driver installed?" Working with 2x EVGA 750TI SC on driver 337.88

Ideas?

Pages: « 1 ... 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 [908] 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 ... 1135 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!