Bitcoin Forum
November 11, 2024, 11:11:16 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [17] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 ... 1240 »
  Print  
Author Topic: CCminer(SP-MOD) Modded GPU kernels.  (Read 2347573 times)
tbearhere
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
November 16, 2014, 01:06:29 PM
 #321

On the 750ti 50kh more x11.  Smiley
Epsylon3
Legendary
*
Offline Offline

Activity: 1484
Merit: 1082


ccminer/cpuminer developer


View Profile WWW
November 16, 2014, 02:58:25 PM
 #322

Linux profile of your repo, indeed big difference :

Code:
sp - before echo (linux x64)
==11174== Profiling result:
Time(%)      Time     Calls       Avg       Min       Max  Name
 20.76%  2.87625s        53  54.269ms  54.098ms  55.278ms  x11_echo512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
 18.83%  2.60877s        54  48.311ms  48.168ms  53.868ms  quark_groestl512_gpu_hash_64_quad(int, unsigned int, unsigned int*, unsigned int*)
 13.02%  1.80384s        54  33.404ms  32.752ms  37.241ms  x11_shavite512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
 11.04%  1.52931s        53  28.855ms  28.780ms  30.472ms  x11_simd512_gpu_expand_64(int, unsigned int, unsigned long*, unsigned int*, uint4*)
  7.25%  1.00414s        54  18.595ms  18.548ms  20.737ms  x11_cubehash512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  5.32%  737.65ms        54  13.660ms  13.589ms  15.234ms  quark_jh512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  5.03%  697.42ms        54  12.915ms  12.778ms  14.462ms  x11_luffa512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  3.05%  422.23ms        53  7.9665ms  7.8972ms  8.0252ms  x11_simd512_gpu_compress2_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
  2.90%  401.89ms        54  7.4425ms  6.9065ms  8.3138ms  quark_bmw512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  2.90%  401.74ms        54  7.4396ms  7.4077ms  8.2859ms  quark_blake512_gpu_hash_80(int, unsigned int, void*)
  2.77%  383.50ms        53  7.2358ms  7.1146ms  7.3789ms  x11_simd512_gpu_compress1_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
  2.69%  373.04ms        54  6.9082ms  6.8450ms  7.7322ms  quark_skein512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  2.55%  353.48ms        54  6.5459ms  6.5278ms  7.2944ms  quark_keccak512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  1.60%  221.22ms        53  4.1741ms  4.1419ms  4.2535ms  x11_simd512_gpu_final_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)

sp - 12d436ae1ecdc5e647a6a1576b98c4803510b13f
==25578== Profiling result:
Time(%)      Time     Calls       Avg       Min       Max  Name
 20.56%  6.72060s       127  *52.918ms  52.822ms  53.985ms  x11_echo512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
 18.89%  6.17511s       128  48.243ms  48.147ms  53.860ms  quark_groestl512_gpu_hash_64_quad(int, unsigned int, unsigned int*, unsigned int*)
 12.65%  4.13517s       128  *32.306ms  32.181ms  36.017ms  x11_shavite512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
 11.29%  3.69123s       128  28.838ms  28.787ms  30.680ms  x11_simd512_gpu_expand_64(int, unsigned int, unsigned long*, unsigned int*, uint4*)
  7.27%  2.37746s       128  18.574ms  18.547ms  20.732ms  x11_cubehash512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  5.35%  1.74723s       128  13.650ms  13.589ms  15.257ms  quark_jh512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  5.05%  1.65134s       128  12.901ms  12.699ms  14.372ms  x11_luffa512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  3.12%  1.01978s       128  7.9670ms  7.9183ms  8.0247ms  x11_simd512_gpu_compress2_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
  2.91%  951.09ms       128  7.4304ms  7.4097ms  8.2771ms  quark_blake512_gpu_hash_80(int, unsigned int, void*)
  2.88%  941.80ms       128  7.3578ms  6.9981ms  8.3027ms  quark_bmw512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  2.83%  926.04ms       128  7.2347ms  7.0956ms  7.3374ms  x11_simd512_gpu_compress1_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
  2.72%  887.67ms       128  6.9349ms  6.8876ms  7.7173ms  quark_skein512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  2.56%  836.91ms       128  6.5384ms  6.5282ms  7.2936ms  quark_keccak512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  1.63%  533.83ms       128  4.1705ms  4.1391ms  4.2481ms  x11_simd512_gpu_final_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)

Are you testing on linux too ? or just in windows ?

Still trying to get the same gains on windows... but that take a lof of time

BTC: 1FhDPLPpw18X4srecguG3MxJYe4a1JsZnd - My Projects: ccminer - cpuminer-multi - yiimp - Forum threads : ccminer - cpuminer-multi - yiimp
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
November 16, 2014, 03:51:05 PM
 #323

I am only testing on windows. There was a small bug in the exe file I sendt out. Exe 7. I have fixed it, and noe I am preparing another checkin later today. Next kernal to be checked in is blake.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
November 16, 2014, 06:03:59 PM
 #324

Faster blake (nist5, quark,x11 etc.).

http://www.filedropper.com/release8

fixed bug in release7 (unvalid nounces) (the bug was only present in the exe, because an old file was linked in instead of the latest)

source:

https://github.com/sp-hash/ccminer

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
Epsylon3
Legendary
*
Offline Offline

Activity: 1484
Merit: 1082


ccminer/cpuminer developer


View Profile WWW
November 16, 2014, 06:33:25 PM
Last edit: November 16, 2014, 06:53:07 PM by Epsylon3
 #325

wow Smiley good game, didnt check blake512 for the moment, trying to fix x13 weird behavior during benchmark

but i was able to see the improvements on windows too with the previous commit

EDIT: + 10KH also with blake on the 750 ti

Code:
 
 20.56%  4.55172s        86  52.927ms  52.850ms  53.968ms  x11_echo512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
 18.97%  4.19956s        87  48.271ms  48.164ms  53.870ms  quark_groestl512_gpu_hash_64_quad(int, unsigned int, unsigned int*, unsigned int*)
 12.70%  2.81199s        87  32.322ms  32.149ms  36.061ms  x11_shavite512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
 11.33%  2.50956s        87  28.846ms  28.786ms  30.704ms  x11_simd512_gpu_expand_64(int, unsigned int, unsigned long*, unsigned int*, uint4*)
  7.30%  1.61676s        87  18.584ms  18.549ms  20.739ms  x11_cubehash512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  5.36%  1.18770s        87  13.652ms  13.590ms  15.225ms  quark_jh512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  5.08%  1.12451s        87  12.925ms  12.721ms  14.430ms  x11_luffa512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  3.09%  685.04ms        86  7.9656ms  7.9084ms  8.0212ms  x11_simd512_gpu_compress2_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
  2.93%  648.66ms        87  7.4559ms  7.1070ms  8.3455ms  quark_bmw512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  2.84%  629.44ms        87  7.2350ms  7.1123ms  7.3753ms  x11_simd512_gpu_compress1_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
  2.73%  604.12ms        87  6.9439ms  6.8900ms  7.7449ms  quark_skein512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  2.62%  579.68ms        87  *6.6630ms  6.6329ms  7.4384ms  quark_blake512_gpu_hash_80(int, unsigned int, void*)
  2.57%  569.19ms        87  6.5424ms  6.5284ms  7.2974ms  quark_keccak512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
  1.62%  358.63ms        86  4.1702ms  4.1305ms  4.2341ms  x11_simd512_gpu_final_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)

BTC: 1FhDPLPpw18X4srecguG3MxJYe4a1JsZnd - My Projects: ccminer - cpuminer-multi - yiimp - Forum threads : ccminer - cpuminer-multi - yiimp
jpouza
Legendary
*
Offline Offline

Activity: 2870
Merit: 1122


View Profile
November 16, 2014, 08:44:22 PM
 #326

Targeting 10MH/s X11, keep pushing  Cool

Disabling SLI things go higher, will print screen trying to hit 10MH/s.


hosting imagenes
jpouza
Legendary
*
Offline Offline

Activity: 2870
Merit: 1122


View Profile
November 16, 2014, 08:59:30 PM
 #327

Maximum on the 980, limited by 1.2500v, voltage limitations of the reference cards Sad


subefotos
th00ber
Hero Member
*****
Offline Offline

Activity: 789
Merit: 501


View Profile
November 18, 2014, 03:28:53 PM
 #328

good job ! Smiley
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
November 18, 2014, 11:05:32 PM
 #329

Probably a little more; and X15 can be improved a lot.
Bitslice it if you must. That will help you remove the memory issue.

Checked in a small boost by using the perm  instruction in whirlpool, but I think I have to rewrite the shared mem part to get 1/8th the memory reads.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
polanskiman
Full Member
***
Offline Offline

Activity: 266
Merit: 100


View Profile
November 19, 2014, 01:04:19 AM
 #330

Probably a little more; and X15 can be improved a lot.
Bitslice it if you must. That will help you remove the memory issue.

Checked in a small boost by using the perm  instruction in whirlpool, but I think I have to rewrite the shared mem part to get 1/8th the memory reads.

Bitslice would mean NO memory reads.

For 750ti:

With ccminer v7 from DJm34 I have an average of:
x11 = 2605 Khash

With ccminer by SP_ release 8 I have an average of:
X11 = 2800 Khash

That's a 195 Khash average difference. It wont make me any richer but it is always welcome Cheesy

ccminer by SP_ release 8 seems to have a few bugs though. When you proceed with ctrl+c, most of the times ccminer will crash. Also when you are asked to terminate the batch job whether you say Y or N yields the same result: ccminer is closed.
polanskiman
Full Member
***
Offline Offline

Activity: 266
Merit: 100


View Profile
November 19, 2014, 01:27:26 AM
 #331

By the way, any intentions on including and perhaps improving m7 algo in ccminer by SP_  releases?
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
November 19, 2014, 06:34:29 AM
Last edit: November 19, 2014, 07:28:19 AM by sp_
 #332

Some of the bugs have been removed in the the Tvpouvet release that I forked. I will probobly refork. My focus is on the kernals, and only 50% of the kernals of x11 have been modded in the opensource.

I just recompiled with yesterdays NVIDIA driver (344.75) There seems to be a hashincrease on the 750ti of around 30KHASH.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
kingscrown
Hero Member
*****
Offline Offline

Activity: 672
Merit: 500


http://fuk.io - check it out!


View Profile WWW
November 19, 2014, 06:38:47 AM
 #333

this mod looks SICK!

sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
November 19, 2014, 07:26:34 AM
 #334

EDIT: Just looked at Echo again. Quite well done, I tip my hat.

Instead of doing 10 rounds of echo, I do 9.25 rounds. This is because most of the first round is done on constant input. But someparts of round2 can also be precalculated... More boost is expected.
To be continued.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
November 19, 2014, 07:35:40 AM
 #335

With ccminer v7 from DJm34 I have an average of:
x11 = 2605 Khash

With ccminer by SP_ release 8 I have an average of:
X11 = 2800 Khash

That's a 195 Khash average difference. It wont make me any richer but it is always welcome Cheesy

You should try the 980. Up from 7MHASH to 9. 30% faster.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
jjjordan
Sr. Member
****
Offline Offline

Activity: 271
Merit: 251


View Profile
November 19, 2014, 09:06:16 AM
 #336

No improvement with GTX 970 whatsoever over 1.4.9.
Or maybe there was no such intended?
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
November 19, 2014, 09:12:26 AM
 #337

Kernals are beeing merged up to the head branch by tvpouvet. 1.4.9 and this mod based on 1.4.6 share the same optimalizations.
There is a small difference though. 1.4.9 has improved the throughput settings in the kernals. I need to do a refork, and redo launchbounds tweaks for 1.4.9.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
Bombadil
Hero Member
*****
Offline Offline

Activity: 644
Merit: 500



View Profile
November 19, 2014, 09:19:29 AM
 #338

Kernals are beeing merged up to the head branch by tvpouvet. 1.4.9 and this mod based on 1.4.6 share the same optimalizations.
There is a small difference though. 1.4.9 has improved the throughput settings in the kernals. I need to do a refork, and redo launchbounds tweaks for 1.4.9.


So it means you'll make use of his API too? I'm busy writing a .NET app that monitors your cudarigs with his API, with detailed stats etc, so that would be nice for comparisons Smiley
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
November 19, 2014, 09:24:38 AM
 #339

yes, When I do a refork, only some of the kernals will be different. On github alot of good changes and bugfixes have been done in 1.4.9. API support as well.
Makes it worth the upgrade

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
polanskiman
Full Member
***
Offline Offline

Activity: 266
Merit: 100


View Profile
November 19, 2014, 03:13:20 PM
 #340

Some of the bugs have been removed in the the Tvpouvet release that I forked. I will probobly refork. My focus is on the kernals, and only 50% of the kernals of x11 have been modded in the opensource.

I just recompiled with yesterdays NVIDIA driver (344.75) There seems to be a hashincrease on the 750ti of around 30KHASH.


You are talking about this: https://github.com/tpruvot/ccminer ?
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [17] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 ... 1240 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!