Bitcoin Forum
June 17, 2024, 08:28:16 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [16] 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 ... 1240 »
  Print  
Author Topic: CCminer(SP-MOD) Modded GPU kernels.  (Read 2347501 times)
tbearhere
Legendary
*
Offline Offline

Activity: 3164
Merit: 1003



View Profile
November 15, 2014, 01:00:15 PM
 #301

still very new at compiling, where can i get curl to 7.38.0 windows 8.1 from a safe site please.
from the site of the author
thanks djm34  going to fetch it  Smiley
tbearhere
Legendary
*
Offline Offline

Activity: 3164
Merit: 1003



View Profile
November 15, 2014, 01:13:46 PM
Last edit: November 15, 2014, 01:30:28 PM by tbearhere
 #302

still very new at compiling, where can i get curl to 7.38.0 windows 8.1 from a safe site please.
from the site of the author
thanks djm34  going to fetch it  Smiley
i cant find it
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
November 15, 2014, 01:33:44 PM
 #303

still very new at compiling, where can i get curl to 7.38.0 windows 8.1 from a safe site please.
from the site of the author
thanks djm34  going to fetch it  Smiley
i cant find it
google libcurl (no it isn't on microsoft page) it is open source

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
Epsylon3
Legendary
*
Offline Offline

Activity: 1484
Merit: 1082


ccminer/cpuminer developer


View Profile WWW
November 15, 2014, 02:39:38 PM
 #304

I should probobly merge the latest changes in the main branch. (1.4.9) but I'm too lazy. My focus is on the kernals, and not the rest.

The work on the echo is not done. There is more to remove.

I almost didnt changed the .cu files recently, you can maybe refork my project Wink

BTC: 1FhDPLPpw18X4srecguG3MxJYe4a1JsZnd - My Projects: ccminer - cpuminer-multi - yiimp - Forum threads : ccminer - cpuminer-multi - yiimp
tbearhere
Legendary
*
Offline Offline

Activity: 3164
Merit: 1003



View Profile
November 15, 2014, 04:15:30 PM
 #305

still very new at compiling, where can i get curl to 7.38.0 windows 8.1 from a safe site please.
from the site of the author
thanks djm34  going to fetch it  Smiley
i cant find it
google libcurl (no it isn't on microsoft page) it is open source
i got it but it needs to be compiled im use to doing it bigjme way. so i cant do it. 6 hrs to try 1.4.9 no luck
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
November 15, 2014, 04:28:04 PM
 #306

still very new at compiling, where can i get curl to 7.38.0 windows 8.1 from a safe site please.
from the site of the author
thanks djm34  going to fetch it  Smiley
i cant find it
google libcurl (no it isn't on microsoft page) it is open source
i got it but it needs to be compiled im use to doing it bigjme way. so i cant do it. 6 hrs to try 1.4.9 no luck
hu ? still haven't compile it ?
You know that you haven't anything to do (as epsilon told you... since libcurl has been put into the compat)

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
November 15, 2014, 04:37:26 PM
 #307

You need to install the latest cuda 6.5 and visual studio 2013.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
tbearhere
Legendary
*
Offline Offline

Activity: 3164
Merit: 1003



View Profile
November 15, 2014, 05:09:37 PM
 #308

You need to install the latest cuda 6.5 and visual studio 2013.

i got it thank you  djm34  and  sp  and all  yes curl was build into it.. it was open zip to folder then  sln..opens  vs 2013 automatically  then 64x release and done...3 minutes  Smiley
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
November 15, 2014, 05:15:43 PM
 #309

On windows, ccminer runs faster when compiled for x86

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
tbearhere
Legendary
*
Offline Offline

Activity: 3164
Merit: 1003



View Profile
November 15, 2014, 05:20:01 PM
Last edit: November 15, 2014, 05:45:32 PM by tbearhere
 #310

On windows, ccminer runs faster when compiled for x86
i get less hash on x11 750ti and no improvement on quark.
EDIT: I should say no improvement on x11
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
November 15, 2014, 05:31:06 PM
 #311

On windows, ccminer runs faster when compiled for x86
i get less hash on x11 750ti and no improvement on quark.
compiled with x64 or x86 ?

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
tbearhere
Legendary
*
Offline Offline

Activity: 3164
Merit: 1003



View Profile
November 15, 2014, 05:41:08 PM
 #312

On windows, ccminer runs faster when compiled for x86
i get less hash on x11 750ti and no improvement on quark.
compiled with x64 or x86 ?
x86
EDIT: I should say no improvement on x11
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
November 15, 2014, 05:57:29 PM
 #313

the problem is that the throughput is set to be fast on the 980 in the source version. download the 1.4.9 source and replace the file:

cuda_x11_echo.cu from my fork

you should get a small boost in x11. On quark I have optimized bmw and blake a tinybit

1.4.9 with the intesity parameter is found here:

 https://github.com/tpruvot/ccminer

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
tbearhere
Legendary
*
Offline Offline

Activity: 3164
Merit: 1003



View Profile
November 15, 2014, 06:13:47 PM
 #314



sp your older ccminer  and im really pushing it. looking forward to your new one.  Smiley
the fastest hashing is 1.4.6  
Epsylon3
Legendary
*
Offline Offline

Activity: 1484
Merit: 1082


ccminer/cpuminer developer


View Profile WWW
November 16, 2014, 01:41:34 AM
Last edit: November 16, 2014, 03:11:03 AM by Epsylon3
 #315

the problem is that the throughput is set to be fast on the 980 in the source version. download the 1.4.9 source and replace the file:

cuda_x11_echo.cu from my fork

you should get a small boost in x11. On quark I have optimized bmw and blake a tinybit

1.4.9 with the intesity parameter is found here:

 https://github.com/tpruvot/ccminer

Indeed, +15kH on the 750 Ti (2ms improvement on your repo, its the biggest optimisation you have made on a single algo, was 0.5 before, i will pick it for the 1.5.0)

on mine, i get +9 KH (2791 vs 2800KH in benchmark mode) but i didnt take the launch bounds change for the moment...

39.171ms before, 38.522ms before = 0.65ms on mine, enough for me (but not fully comparable)

EDIT: but on windows :// seems to be lowered, investigating...

BTC: 1FhDPLPpw18X4srecguG3MxJYe4a1JsZnd - My Projects: ccminer - cpuminer-multi - yiimp - Forum threads : ccminer - cpuminer-multi - yiimp
Schleicher
Hero Member
*****
Offline Offline

Activity: 675
Merit: 513



View Profile
November 16, 2014, 06:36:00 AM
 #316

Possible small optimization at the end of cuda_echo_round:
Code:
	for (int i = 0; i<15; i += 4)
{
W[i] ^= W[32 + i] ^ 512;
W[i + 1] ^= W[32 + i + 1];
W[i + 2] ^= W[32 + i + 2];
W[i + 3] ^= W[32 + i + 3];
}
W[15] ^= W[47] ^ 512;
(we don't need more than 16)

sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
November 16, 2014, 08:49:15 AM
 #317

Indeed, +15kH on the 750 Ti (2ms improvement on your repo, its the biggest optimisation you have made on a single algo, was 0.5 before, i will pick it for the 1.5.0)
on mine, i get +9 KH (2791 vs 2800KH in benchmark mode) but i didnt take the launch bounds change for the moment...
39.171ms before, 38.522ms before = 0.65ms on mine, enough for me (but not fully comparable)
EDIT: but on windows :// seems to be lowered, investigating...

In addition to the launchbound change, did you remember to go from 256 to 320 threads when calling the kernal?. The launchbound will force the compiler to use 64 registers. We get more spills to memory, but it seems to run faster.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
November 16, 2014, 08:52:08 AM
 #318

Possible small optimization at the end of cuda_echo_round:
Code:
	for (int i = 0; i<15; i += 4)
{
W[i] ^= W[32 + i] ^ 512;
W[i + 1] ^= W[32 + i + 1];
W[i + 2] ^= W[32 + i + 2];
W[i + 3] ^= W[32 + i + 3];
}
W[15] ^= W[47] ^ 512;
(we don't need more than 16)

Thanks, it works.

   for (int i = 0; i<15; i += 4)
   {
      W ^= W[32 + i] ^ 512;
      W[i + 1] ^= W[32 + i + 1];
      W[i + 2] ^= W[32 + i + 2];
      W[i + 3] ^= W[32 + i + 3];
   }

is enough.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
November 16, 2014, 11:47:07 AM
Last edit: November 16, 2014, 01:15:15 PM by sp_
 #319

I have checked in some more performance improvements. I moved the precalc table in echo from constmem to the instruction cache. Improved registers/launchbounds on shavite.

The 980 is now around 400KHASH faster than the release 6. on stock clocks.(x11).

Here is the link:

http://www.filedropper.com/release7

The sourcecode is available here:

https://github.com/sp-hash/ccminer

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
jpouza
Legendary
*
Offline Offline

Activity: 2730
Merit: 1116


View Profile
November 16, 2014, 12:14:46 PM
 #320

I have checked in some more performance improvements. I moved the precalc table in echo from constmem to the instruction cache. Improved registers/launchbounds on shavite.

The 980 is now around 400KHASH faster than the release 6. on stock clocks.(x11).

Here is the link:

http://www.filedropper.com/release7

Nice, 9MH/s with 185+ on 980 GPUs.
10MH/s with extreme overclock GPU at 300+ and overvolted.

750Ti boost to 2.9MH/s with 135+ GPU 460+ MEM.

Cheers
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [16] 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 ... 1240 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!