sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
October 24, 2014, 10:09:33 PM Last edit: October 24, 2014, 10:35:59 PM by sp_ |
|
I re-wrote two new hashing algorithms yesterday, it has some bugs so the numbers are not ready yet, but it looks like 3,5 MHASH on the 750ti on x11 with 38 watt in the wall per card. 9.2 watt per MHASH.
Impressive ! But it I cant't get it to work properly:(. The idea was to reduce the number of sharemem access in echo and skein (ahs). ccminer is currently it is doing a lookup for each byte, but i want to use the increased and improved sharemem in the maxwell to lookup more bits. The 1kb table can become 32kb or 48kb. Block latency should stay low as the probabillity of hitting 2 equal adresses is lower with 48kb combinations.
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
October 25, 2014, 02:22:37 AM |
|
I re-wrote two new hashing algorithms yesterday, it has some bugs so the numbers are not ready yet, but it looks like 3,5 MHASH on the 750ti on x11 with 38 watt in the wall per card. 9.2 watt per MHASH.
Impressive ! But it I cant't get it to work properly:(. The idea was to reduce the number of sharemem access in echo and skein (ahs). ccminer is currently it is doing a lookup for each byte, but i want to use the increased and improved sharemem in the maxwell to lookup more bits. The 1kb table can become 32kb or 48kb. Block latency should stay low as the probabillity of hitting 2 equal adresses is lower with 48kb combinations. hmm, not sure what would bring sharedmem on skein... there is no big look up table, everything is just calculated for every nonce and threads so there is nothing to share between threads in the first place Strangely, I tried it while working on skein-1024 and the performance were just terrible and it was a major slow down compared to the version not using it...
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
October 25, 2014, 06:38:17 AM |
|
Not Skein, I ment Shavite. Was mixing the X'es here. They both share the implementation in cuda_x11_aes.cu. The method aes_round()
I want to do to 2 table reads into one. but this will require 256kb of shared mem. so I need to split the bits and do seperate code for the upper bits combinations. sharedMemory[__byte_perm(x0, 0, 0x4440)]^sharedMemory[__byte_perm(x1, 0, 0x4441) + 256],
|
|
|
|
jorneyflair
|
|
October 25, 2014, 06:39:18 PM |
|
x11: 2680khash (ccminer-djm34) x13: 2030khash (ccminer-djm34) x15: 1800khash (ccminer-djm34) zotac 750ti stock x11: 2820khash (ccminer-sp) x13: 2090khash (ccminer-sp) x15: 1880khash (ccminer-sp) I will test 970 and 980 tomorrow. Sadly my Suarez lost the game in his first show. , upset
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
October 25, 2014, 09:16:25 PM |
|
The sharemem tweaks was a dead end. But managed to squeze another 1.5% faster than the Schleicher implementation in the echo hash.
This code can be optimized:
uint32_t t; t = ((ab & 0x80808080) >> 7); uint32_t abx = t<<4 ^ t<<3 ^ t<<1 ^ t; t = ((bc & 0x80808080) >> 7); uint32_t bcx = t<<4 ^ t<<3 ^ t<<1 ^ t; t = ((cd & 0x80808080) >> 7); uint32_t cdx = t<<4 ^ t<<3 ^ t<<1 ^ t;
abx ^= ((ab & 0x7F7F7F7F) << 1); bcx ^= ((bc & 0x7F7F7F7F) << 1); cdx ^= ((cd & 0x7F7F7F7F) << 1);
because
(ab & 0x7F7F7F7F)=ab^((ab & 0x80808080)
saves a register/moves/deadcode, and with the proper configuration 1.5% more hash in the ECHO.
|
|
|
|
rygamble
|
|
October 25, 2014, 10:46:53 PM |
|
Would be very interested in this, but I'd be going to buy some 970s from Micro Center to do so. I checked a few pages back and didn't see any hard numbers on the 970s (only 980s) in terms of kh/s and wattage. I have a bunch of rigs with various 270Xs but I'm looking to downscale and convert to nvidia cards due to their increased efficiency.
|
|
|
|
SS2006
|
|
October 26, 2014, 03:42:17 AM |
|
I have the 970, posted the numbers
|
|
|
|
SS2006
|
|
October 26, 2014, 08:32:40 AM |
|
I have hit 7000 KH/S with a 970!! Using 250 clock OC with 37 mv overvoltage This is using SP's first release, I feel when I have his latest release it can go even higher price per performance ratio for the 970 is unmatched right now Good work!
|
|
|
|
|
|
jpouza
Legendary
Offline
Activity: 2884
Merit: 1123
|
|
October 26, 2014, 11:04:19 AM |
|
980 doing 8.2MH/s X11. But miner crashes when closing.
|
|
|
|
go6ooo1212
Legendary
Offline
Activity: 1512
Merit: 1000
quarkchain.io
|
|
October 26, 2014, 11:13:10 AM |
|
Good improovement , Ill test 900s later
|
|
|
|
jpouza
Legendary
Offline
Activity: 2884
Merit: 1123
|
|
October 26, 2014, 11:18:11 AM |
|
Just a note:
ccminer-52.exe to run with 900 series by tpruvot 1.4.6 => fastest to 900 series till now (8.2-8.3MH/s X11 per card)
ccminermod.exe by sp_ => fastest to run with 750Ti => 2.85MH/s X11 per card
Cheers
|
|
|
|
Amph
Legendary
Offline
Activity: 3248
Merit: 1070
|
|
October 26, 2014, 11:50:41 AM |
|
980 doing 8.2MH/s X11. But miner crashes when closing.
consumption?
|
|
|
|
jpouza
Legendary
Offline
Activity: 2884
Merit: 1123
|
|
October 26, 2014, 12:03:02 PM |
|
980 doing 8.2MH/s X11. But miner crashes when closing.
consumption? About 140w per 980. Note that my cards are limited to 83 degrees celsius due to 3way sli heat, so a 980 single and well cooled can get about 8.5Mh/s when oced.
|
|
|
|
Amph
Legendary
Offline
Activity: 3248
Merit: 1070
|
|
October 26, 2014, 02:28:45 PM |
|
980 doing 8.2MH/s X11. But miner crashes when closing.
consumption? About 140w per 980. Note that my cards are limited to 83 degrees celsius due to 3way sli heat, so a 980 single and well cooled can get about 8.5Mh/s when oced. 140 for 8200mh/s is really good, x3 750ti hash with less then x3 consumption
|
|
|
|
alexbg21
Newbie
Offline
Activity: 24
Merit: 0
|
|
October 26, 2014, 04:58:51 PM |
|
Latest ccMiner release 1.4.6-tpruvot (Oct 26th 2014) Overclocked Palit GTX 970 - X11 7400 Mhs
|
|
|
|
jpouza
Legendary
Offline
Activity: 2884
Merit: 1123
|
|
October 26, 2014, 07:46:38 PM |
|
|
|
|
|
jpouza
Legendary
Offline
Activity: 2884
Merit: 1123
|
|
October 26, 2014, 08:20:34 PM Last edit: October 26, 2014, 08:50:04 PM by jpouza |
|
Overclocked more, this is the maximum 1450MHz x3 980, almost 9MH/s in X11, can do 9MH/s if watercooled for sure. subir fotos
|
|
|
|
jorneyflair
|
|
October 26, 2014, 08:36:26 PM Last edit: October 26, 2014, 09:22:52 PM by jorneyflair |
|
x11: 5570khash (ccminer-djm34) x13: 4090khash (ccminer-djm34) x15: 3670khash (ccminer-djm34)
Giga 970 G1 stock 65%TDP nearly120w
x11: 5710khash (ccminer-sp) x13: 4160khash (ccminer-sp) x15: 3770khash (ccminer-sp)
overclock +200mhz nearly 160w comsumpution x11: 6720khash (ccminer-sp) x13: 5250khash (ccminer-sp) x15: 4660khash (ccminer-sp)
|
|
|
|
|