Bitcoin Forum
November 19, 2024, 07:02:19 AM *
News: Check out the artwork 1Dq created to commemorate this forum's 15th anniversary
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 [7] 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 ... 1240 »
  Print  
Author Topic: CCminer(SP-MOD) Modded GPU kernels.  (Read 2347588 times)
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
October 24, 2014, 10:09:33 PM
Last edit: October 24, 2014, 10:35:59 PM by sp_
 #121

I re-wrote two new hashing algorithms yesterday, it has some bugs so the numbers are not ready yet, but it looks like 3,5 MHASH on the 750ti on x11 with 38 watt in the wall per card. 9.2 watt per MHASH.
Impressive ! Smiley

But it I cant't get it to work properly:(. The idea was to reduce the number of sharemem access in echo and skein (ahs). ccminer is currently it is doing a lookup for each byte, but i want to use the increased and improved sharemem in the maxwell to lookup more bits. The 1kb table can become 32kb or 48kb.  Block latency should stay low as the probabillity of hitting 2 equal adresses is lower with 48kb combinations.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
October 25, 2014, 02:22:37 AM
 #122

I re-wrote two new hashing algorithms yesterday, it has some bugs so the numbers are not ready yet, but it looks like 3,5 MHASH on the 750ti on x11 with 38 watt in the wall per card. 9.2 watt per MHASH.
Impressive ! Smiley

But it I cant't get it to work properly:(. The idea was to reduce the number of sharemem access in echo and skein (ahs). ccminer is currently it is doing a lookup for each byte, but i want to use the increased and improved sharemem in the maxwell to lookup more bits. The 1kb table can become 32kb or 48kb.  Block latency should stay low as the probabillity of hitting 2 equal adresses is lower with 48kb combinations.
hmm, not sure what would bring sharedmem on skein... there is no big look up table, everything is just calculated for every nonce and threads so there is nothing to share between threads in the first place 
Strangely, I tried it while working on skein-1024 and the performance were just terrible and it was a major slow down compared to the version not using it...

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
October 25, 2014, 06:38:17 AM
 #123

Not Skein, I ment Shavite. Was mixing the X'es here. They both share the implementation in cuda_x11_aes.cu. The method aes_round()

I want to do to 2 table reads into one. but this will require 256kb of shared mem. so I need to split the bits and do seperate code for the upper bits combinations.
 
sharedMemory[__byte_perm(x0, 0, 0x4440)]^sharedMemory[__byte_perm(x1, 0, 0x4441) + 256],


Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
jorneyflair
Hero Member
*****
Offline Offline

Activity: 672
Merit: 500



View Profile
October 25, 2014, 06:39:18 PM
 #124

x11: 2680khash (ccminer-djm34)
x13: 2030khash (ccminer-djm34)
x15: 1800khash (ccminer-djm34)

zotac 750ti stock

x11: 2820khash (ccminer-sp)
x13: 2090khash (ccminer-sp)
x15: 1880khash (ccminer-sp)


I will test 970 and 980 tomorrow.
Sadly my Suarez lost the game in his first show. Sad , upset
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
October 25, 2014, 09:16:25 PM
 #125

The sharemem tweaks was a dead end. But managed to squeze another 1.5% faster than the Schleicher implementation in the echo hash.

This code can be optimized:

         uint32_t t;
         t = ((ab & 0x80808080) >> 7);
         uint32_t abx = t<<4 ^ t<<3 ^ t<<1 ^ t;
         t = ((bc & 0x80808080) >> 7);
         uint32_t bcx = t<<4 ^ t<<3 ^ t<<1 ^ t;
         t = ((cd & 0x80808080) >> 7);
         uint32_t cdx = t<<4 ^ t<<3 ^ t<<1 ^ t;

         abx ^= ((ab & 0x7F7F7F7F) << 1);
         bcx ^= ((bc & 0x7F7F7F7F) << 1);
         cdx ^= ((cd & 0x7F7F7F7F) << 1);

because

(ab & 0x7F7F7F7F)=ab^((ab & 0x80808080)

saves a register/moves/deadcode, and with the proper configuration 1.5% more hash in the ECHO.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
rygamble
Full Member
***
Offline Offline

Activity: 160
Merit: 100


View Profile
October 25, 2014, 10:46:53 PM
 #126

Would be very interested in this, but I'd be going to buy some 970s from Micro Center to do so. I checked a few pages back and didn't see any hard numbers on the 970s (only 980s) in terms of kh/s and wattage. I have a bunch of rigs with various 270Xs but I'm looking to downscale and convert to nvidia cards due to their increased efficiency.
SS2006
Sr. Member
****
Offline Offline

Activity: 285
Merit: 250


View Profile
October 26, 2014, 03:42:17 AM
 #127

I have the 970, posted the numbers
SS2006
Sr. Member
****
Offline Offline

Activity: 285
Merit: 250


View Profile
October 26, 2014, 08:32:40 AM
 #128

I have hit 7000 KH/S with a 970!! Using 250 clock OC with 37 mv overvoltage
This is using SP's first release, I feel when I have his latest release it can go even higher Cheesy

price per performance ratio for the 970 is unmatched right now

Good work!

Epsylon3
Legendary
*
Offline Offline

Activity: 1484
Merit: 1082


ccminer/cpuminer developer


View Profile WWW
October 26, 2014, 10:27:34 AM
 #129

before you take all the credits, i released a new version 1.4.6 :



https://github.com/tpruvot/ccminer/releases/

BTC: 1FhDPLPpw18X4srecguG3MxJYe4a1JsZnd - My Projects: ccminer - cpuminer-multi - yiimp - Forum threads : ccminer - cpuminer-multi - yiimp
tbearhere
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
October 26, 2014, 10:57:15 AM
Last edit: October 26, 2014, 12:47:08 PM by tbearhere
 #130

before you take all the credits, i released a new version 1.4.6 :



https://github.com/tpruvot/ccminer/releases/






From  https://github.com/tsiv/ccminer/releases  s3  algo  ~9mh/s 750ti  overclocked some

EDIT: at 60 watts per
jpouza
Legendary
*
Offline Offline

Activity: 2884
Merit: 1123


View Profile
October 26, 2014, 11:04:19 AM
 #131

980 doing 8.2MH/s X11.
But miner crashes when closing.
go6ooo1212
Legendary
*
Offline Offline

Activity: 1512
Merit: 1000


quarkchain.io


View Profile
October 26, 2014, 11:13:10 AM
 #132

Good improovement , Ill test 900s later Smiley
jpouza
Legendary
*
Offline Offline

Activity: 2884
Merit: 1123


View Profile
October 26, 2014, 11:18:11 AM
 #133

Just a note:

ccminer-52.exe to run with 900 series by tpruvot 1.4.6 => fastest to 900 series till now (8.2-8.3MH/s X11 per card)

ccminermod.exe by sp_ => fastest to run with 750Ti => 2.85MH/s X11 per card

Cheers
Amph
Legendary
*
Offline Offline

Activity: 3248
Merit: 1070



View Profile
October 26, 2014, 11:50:41 AM
 #134

980 doing 8.2MH/s X11.
But miner crashes when closing.

consumption?
jpouza
Legendary
*
Offline Offline

Activity: 2884
Merit: 1123


View Profile
October 26, 2014, 12:03:02 PM
 #135

980 doing 8.2MH/s X11.
But miner crashes when closing.

consumption?

About 140w per 980.
Note that my cards are limited to 83 degrees celsius due to 3way sli heat, so a 980 single and well cooled can get about 8.5Mh/s when oced.
Amph
Legendary
*
Offline Offline

Activity: 3248
Merit: 1070



View Profile
October 26, 2014, 02:28:45 PM
 #136

980 doing 8.2MH/s X11.
But miner crashes when closing.

consumption?

About 140w per 980.
Note that my cards are limited to 83 degrees celsius due to 3way sli heat, so a 980 single and well cooled can get about 8.5Mh/s when oced.

140 for 8200mh/s is really good, x3 750ti hash with less then x3 consumption
alexbg21
Newbie
*
Offline Offline

Activity: 24
Merit: 0


View Profile
October 26, 2014, 04:58:51 PM
 #137

Latest ccMiner release 1.4.6-tpruvot (Oct 26th 2014) Overclocked Palit GTX 970 - X11 7400 Mhs
jpouza
Legendary
*
Offline Offline

Activity: 2884
Merit: 1123


View Profile
October 26, 2014, 07:46:38 PM
 #138

Proof:

subir fotos
jpouza
Legendary
*
Offline Offline

Activity: 2884
Merit: 1123


View Profile
October 26, 2014, 08:20:34 PM
Last edit: October 26, 2014, 08:50:04 PM by jpouza
 #139

Overclocked more, this is the maximum 1450MHz x3 980, almost 9MH/s in X11, can do 9MH/s if watercooled for sure.


subir fotos
jorneyflair
Hero Member
*****
Offline Offline

Activity: 672
Merit: 500



View Profile
October 26, 2014, 08:36:26 PM
Last edit: October 26, 2014, 09:22:52 PM by jorneyflair
 #140

x11: 5570khash (ccminer-djm34)
x13: 4090khash (ccminer-djm34)
x15: 3670khash (ccminer-djm34)

Giga 970 G1 stock  65%TDP nearly120w

x11: 5710khash (ccminer-sp)
x13: 4160khash (ccminer-sp)
x15: 3770khash (ccminer-sp)

overclock +200mhz nearly 160w comsumpution
x11: 6720khash (ccminer-sp)
x13: 5250khash (ccminer-sp)
x15: 4660khash (ccminer-sp)
Pages: « 1 2 3 4 5 6 [7] 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 ... 1240 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!