tbearhere
Legendary
Offline
Activity: 3220
Merit: 1003
|
|
November 16, 2014, 01:06:29 PM |
|
On the 750ti 50kh more x11.
|
|
|
|
Epsylon3
Legendary
Offline
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
|
|
November 16, 2014, 02:58:25 PM |
|
Linux profile of your repo, indeed big difference : sp - before echo (linux x64) ==11174== Profiling result: Time(%) Time Calls Avg Min Max Name 20.76% 2.87625s 53 54.269ms 54.098ms 55.278ms x11_echo512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 18.83% 2.60877s 54 48.311ms 48.168ms 53.868ms quark_groestl512_gpu_hash_64_quad(int, unsigned int, unsigned int*, unsigned int*) 13.02% 1.80384s 54 33.404ms 32.752ms 37.241ms x11_shavite512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 11.04% 1.52931s 53 28.855ms 28.780ms 30.472ms x11_simd512_gpu_expand_64(int, unsigned int, unsigned long*, unsigned int*, uint4*) 7.25% 1.00414s 54 18.595ms 18.548ms 20.737ms x11_cubehash512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 5.32% 737.65ms 54 13.660ms 13.589ms 15.234ms quark_jh512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 5.03% 697.42ms 54 12.915ms 12.778ms 14.462ms x11_luffa512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 3.05% 422.23ms 53 7.9665ms 7.8972ms 8.0252ms x11_simd512_gpu_compress2_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*) 2.90% 401.89ms 54 7.4425ms 6.9065ms 8.3138ms quark_bmw512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 2.90% 401.74ms 54 7.4396ms 7.4077ms 8.2859ms quark_blake512_gpu_hash_80(int, unsigned int, void*) 2.77% 383.50ms 53 7.2358ms 7.1146ms 7.3789ms x11_simd512_gpu_compress1_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*) 2.69% 373.04ms 54 6.9082ms 6.8450ms 7.7322ms quark_skein512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 2.55% 353.48ms 54 6.5459ms 6.5278ms 7.2944ms quark_keccak512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 1.60% 221.22ms 53 4.1741ms 4.1419ms 4.2535ms x11_simd512_gpu_final_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
sp - 12d436ae1ecdc5e647a6a1576b98c4803510b13f ==25578== Profiling result: Time(%) Time Calls Avg Min Max Name 20.56% 6.72060s 127 *52.918ms 52.822ms 53.985ms x11_echo512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 18.89% 6.17511s 128 48.243ms 48.147ms 53.860ms quark_groestl512_gpu_hash_64_quad(int, unsigned int, unsigned int*, unsigned int*) 12.65% 4.13517s 128 *32.306ms 32.181ms 36.017ms x11_shavite512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 11.29% 3.69123s 128 28.838ms 28.787ms 30.680ms x11_simd512_gpu_expand_64(int, unsigned int, unsigned long*, unsigned int*, uint4*) 7.27% 2.37746s 128 18.574ms 18.547ms 20.732ms x11_cubehash512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 5.35% 1.74723s 128 13.650ms 13.589ms 15.257ms quark_jh512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 5.05% 1.65134s 128 12.901ms 12.699ms 14.372ms x11_luffa512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 3.12% 1.01978s 128 7.9670ms 7.9183ms 8.0247ms x11_simd512_gpu_compress2_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*) 2.91% 951.09ms 128 7.4304ms 7.4097ms 8.2771ms quark_blake512_gpu_hash_80(int, unsigned int, void*) 2.88% 941.80ms 128 7.3578ms 6.9981ms 8.3027ms quark_bmw512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 2.83% 926.04ms 128 7.2347ms 7.0956ms 7.3374ms x11_simd512_gpu_compress1_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*) 2.72% 887.67ms 128 6.9349ms 6.8876ms 7.7173ms quark_skein512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 2.56% 836.91ms 128 6.5384ms 6.5282ms 7.2936ms quark_keccak512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 1.63% 533.83ms 128 4.1705ms 4.1391ms 4.2481ms x11_simd512_gpu_final_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
Are you testing on linux too ? or just in windows ? Still trying to get the same gains on windows... but that take a lof of time
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
November 16, 2014, 03:51:05 PM |
|
I am only testing on windows. There was a small bug in the exe file I sendt out. Exe 7. I have fixed it, and noe I am preparing another checkin later today. Next kernal to be checked in is blake.
|
|
|
|
|
Epsylon3
Legendary
Offline
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
|
|
November 16, 2014, 06:33:25 PM Last edit: November 16, 2014, 06:53:07 PM by Epsylon3 |
|
wow good game, didnt check blake512 for the moment, trying to fix x13 weird behavior during benchmark but i was able to see the improvements on windows too with the previous commit EDIT: + 10KH also with blake on the 750 ti 20.56% 4.55172s 86 52.927ms 52.850ms 53.968ms x11_echo512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 18.97% 4.19956s 87 48.271ms 48.164ms 53.870ms quark_groestl512_gpu_hash_64_quad(int, unsigned int, unsigned int*, unsigned int*) 12.70% 2.81199s 87 32.322ms 32.149ms 36.061ms x11_shavite512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 11.33% 2.50956s 87 28.846ms 28.786ms 30.704ms x11_simd512_gpu_expand_64(int, unsigned int, unsigned long*, unsigned int*, uint4*) 7.30% 1.61676s 87 18.584ms 18.549ms 20.739ms x11_cubehash512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 5.36% 1.18770s 87 13.652ms 13.590ms 15.225ms quark_jh512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 5.08% 1.12451s 87 12.925ms 12.721ms 14.430ms x11_luffa512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 3.09% 685.04ms 86 7.9656ms 7.9084ms 8.0212ms x11_simd512_gpu_compress2_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*) 2.93% 648.66ms 87 7.4559ms 7.1070ms 8.3455ms quark_bmw512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 2.84% 629.44ms 87 7.2350ms 7.1123ms 7.3753ms x11_simd512_gpu_compress1_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*) 2.73% 604.12ms 87 6.9439ms 6.8900ms 7.7449ms quark_skein512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 2.62% 579.68ms 87 *6.6630ms 6.6329ms 7.4384ms quark_blake512_gpu_hash_80(int, unsigned int, void*) 2.57% 569.19ms 87 6.5424ms 6.5284ms 7.2974ms quark_keccak512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*) 1.62% 358.63ms 86 4.1702ms 4.1305ms 4.2341ms x11_simd512_gpu_final_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
|
|
|
|
jpouza
Legendary
Offline
Activity: 2870
Merit: 1122
|
|
November 16, 2014, 08:44:22 PM |
|
Targeting 10MH/s X11, keep pushing Disabling SLI things go higher, will print screen trying to hit 10MH/s. hosting imagenes
|
|
|
|
jpouza
Legendary
Offline
Activity: 2870
Merit: 1122
|
|
November 16, 2014, 08:59:30 PM |
|
Maximum on the 980, limited by 1.2500v, voltage limitations of the reference cards subefotos
|
|
|
|
th00ber
|
|
November 18, 2014, 03:28:53 PM |
|
good job !
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
November 18, 2014, 11:05:32 PM |
|
Probably a little more; and X15 can be improved a lot.
Bitslice it if you must. That will help you remove the memory issue. Checked in a small boost by using the perm instruction in whirlpool, but I think I have to rewrite the shared mem part to get 1/8th the memory reads.
|
|
|
|
polanskiman
|
|
November 19, 2014, 01:04:19 AM |
|
Probably a little more; and X15 can be improved a lot.
Bitslice it if you must. That will help you remove the memory issue. Checked in a small boost by using the perm instruction in whirlpool, but I think I have to rewrite the shared mem part to get 1/8th the memory reads. Bitslice would mean NO memory reads. For 750ti: With ccminer v7 from DJm34 I have an average of: x11 = 2605 KhashWith ccminer by SP_ release 8 I have an average of: X11 = 2800 KhashThat's a 195 Khash average difference. It wont make me any richer but it is always welcome ccminer by SP_ release 8 seems to have a few bugs though. When you proceed with ctrl+c, most of the times ccminer will crash. Also when you are asked to terminate the batch job whether you say Y or N yields the same result: ccminer is closed.
|
|
|
|
polanskiman
|
|
November 19, 2014, 01:27:26 AM |
|
By the way, any intentions on including and perhaps improving m7 algo in ccminer by SP_ releases?
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
November 19, 2014, 06:34:29 AM Last edit: November 19, 2014, 07:28:19 AM by sp_ |
|
Some of the bugs have been removed in the the Tvpouvet release that I forked. I will probobly refork. My focus is on the kernals, and only 50% of the kernals of x11 have been modded in the opensource.
I just recompiled with yesterdays NVIDIA driver (344.75) There seems to be a hashincrease on the 750ti of around 30KHASH.
|
|
|
|
kingscrown
|
|
November 19, 2014, 06:38:47 AM |
|
this mod looks SICK!
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
November 19, 2014, 07:26:34 AM |
|
EDIT: Just looked at Echo again. Quite well done, I tip my hat.
Instead of doing 10 rounds of echo, I do 9.25 rounds. This is because most of the first round is done on constant input. But someparts of round2 can also be precalculated... More boost is expected. To be continued.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
November 19, 2014, 07:35:40 AM |
|
With ccminer v7 from DJm34 I have an average of: x11 = 2605 KhashWith ccminer by SP_ release 8 I have an average of: X11 = 2800 KhashThat's a 195 Khash average difference. It wont make me any richer but it is always welcome You should try the 980. Up from 7MHASH to 9. 30% faster.
|
|
|
|
jjjordan
|
|
November 19, 2014, 09:06:16 AM |
|
No improvement with GTX 970 whatsoever over 1.4.9. Or maybe there was no such intended?
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
November 19, 2014, 09:12:26 AM |
|
Kernals are beeing merged up to the head branch by tvpouvet. 1.4.9 and this mod based on 1.4.6 share the same optimalizations. There is a small difference though. 1.4.9 has improved the throughput settings in the kernals. I need to do a refork, and redo launchbounds tweaks for 1.4.9.
|
|
|
|
Bombadil
|
|
November 19, 2014, 09:19:29 AM |
|
Kernals are beeing merged up to the head branch by tvpouvet. 1.4.9 and this mod based on 1.4.6 share the same optimalizations. There is a small difference though. 1.4.9 has improved the throughput settings in the kernals. I need to do a refork, and redo launchbounds tweaks for 1.4.9.
So it means you'll make use of his API too? I'm busy writing a .NET app that monitors your cudarigs with his API, with detailed stats etc, so that would be nice for comparisons
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
November 19, 2014, 09:24:38 AM |
|
yes, When I do a refork, only some of the kernals will be different. On github alot of good changes and bugfixes have been done in 1.4.9. API support as well. Makes it worth the upgrade
|
|
|
|
polanskiman
|
|
November 19, 2014, 03:13:20 PM |
|
Some of the bugs have been removed in the the Tvpouvet release that I forked. I will probobly refork. My focus is on the kernals, and only 50% of the kernals of x11 have been modded in the opensource.
I just recompiled with yesterdays NVIDIA driver (344.75) There seems to be a hashincrease on the 750ti of around 30KHASH.
You are talking about this: https://github.com/tpruvot/ccminer ?
|
|
|
|
|