CCminer(SP-MOD) Modded GPU kernels.

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 16, 2015, 11:04:01 PM

#1621

The last change is Aes is slower on the 750ti. I am working to improve it.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

rednoW

Legendary

Offline

Activity: 1510
Merit: 1003

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 17, 2015, 08:16:44 AM

#1622

Quote from: sp_ on February 16, 2015, 11:04:01 PM

The last change is Aes is slower on the 750ti. I am working to improve it.

I've rolled back in cuda_x11_aes.cu and seen no difference. Also commit "Faster shabal" https://github.com/sp-hash/ccminer/commit/c7eef5275ab77f02d3d86601092774fae8a29cd7 doesn't change anything in rates on my setup.

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 17, 2015, 09:33:41 AM

#1623

Quote from: rednoW on February 17, 2015, 08:16:44 AM

Quote from: sp_ on February 16, 2015, 11:04:01 PM

The last change is Aes is slower on the 750ti. I am working to improve it.

I've rolled back in cuda_x11_aes.cu and seen no difference. Also commit "Faster shabal" https://github.com/sp-hash/ccminer/commit/c7eef5275ab77f02d3d86601092774fae8a29cd7 doesn't change anything in rates on my setup.

The cuda_x11_aes.cu is excluded from the project file, so if you change it it will not build unless you save echo or shavite or take a full build. To messure you can use Fresh, because this has fewer chained hashing algos.
The differences are small, but the PTX code. (assembly code) shows less instructions with my changes. And also the different cards have different timings.
A small speedup of 1kHASH will not be noticable, but 100 small speedups will.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 17, 2015, 09:36:47 AM

#1624

Quote from: rednoW on February 17, 2015, 08:16:44 AM

Quote from: sp_ on February 16, 2015, 11:04:01 PM

The last change is Aes is slower on the 750ti. I am working to improve it.

I've rolled back in cuda_x11_aes.cu and seen no difference. Also commit "Faster shabal" https://github.com/sp-hash/ccminer/commit/c7eef5275ab77f02d3d86601092774fae8a29cd7 doesn't change anything in rates on my setup.

x14 has 14 chained algos. If I optmize shabal 2% , the increase of the total hash is very small. like 0.005% faster. This is because (groest, echo, simd etc are much slower. and take most of the time)

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 17, 2015, 09:45:26 AM

#1625

Another problem is 64bit vs 32bit. And windows vs linux

I optimize for windows and 32bit. On linux builds are normally 64 bit(Some optimalizations are faster when building 32bit. )

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 17, 2015, 09:49:47 AM

#1626

Quote from: scotte on February 16, 2015, 12:36:12 PM

qubit has slowed from 3609 kH/s on the previous version to 3571 kH/s on the current one - GTX 850M Linux, using "git pull" to update, build.sh has -O3 passed to configure

The fresh algo uses 2 rounds of shavite, 2 of simd and 1 echo.

shavite512
simd512
shavite512
simd512
echo512

The shavite and echo is affected in my AES change, that is faster on the compute 5.2 cards. I am rewriting the AES now, so the next commit will hopefully increase the performance of fresh.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

rednoW

Legendary

Offline

Activity: 1510
Merit: 1003

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 17, 2015, 11:09:57 AM

#1627

Quote from: sp_ on February 17, 2015, 09:33:41 AM

The cuda_x11_aes.cu is excluded from the project file, so if you change it it will not build unless you save echo or shavite or take a full build. To messure you can use Fresh, because this has fewer chained hashing algos.

I checked a VS build log after rollback. cuda_x11_aes.cu was #included in 2 other .cu files that were rebuilt by VS. So I think I made it right.

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 17, 2015, 09:33:38 PM

#1628

Any coders who can help me?

I try to reduce the number of instructions from 4 to 2 per shared mem access in cuda_x11_aes.cu

I try to force the pointer to be correct with this code:

__device__ __forceinline__
uint32_t SPRead(uint32_t *const __restrict__ x, uint32_t low)
{
   uint32_t tmp,res;
   asm("bfi.b32 %0, %1, %2, 10 , 8;" : "=r"(tmp) : "r"(low), "r"(x));
   asm("ld.shared.u32 %0,[%1];" : "=r"(res) : "r"(tmp));
   return res;
}

the ptx looks good

cvta.shared.u32    %r393, _Z23x11_echo512_gpu_hash_64jjPy$__cuda_local_var_204566_57_non_const_sharedMemory;
// inline asm
bfi.b32 %r232, %r18, %r393, 10 , 8;
// inline asm
// inline asm
ld.shared.u32 %r235,[%r232];

But the program crash with illegal memory access. The shared memory pointer is alligned to 1024 boundary.

Without the pointer hack I get code like this:

   bfe.u32 %r243, %r4, %r440, 8;
   // inline asm
   shl.b32    %r468, %r243, 2;
   add.s32    %r469, %r468, %r451;
   ld.shared.u32    %r470, [%r469+1024];

4 instructions.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

djm34

Legendary

Offline

Activity: 1400
Merit: 1050

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 17, 2015, 09:46:25 PM

#1629

Quote from: sp_ on February 17, 2015, 09:33:38 PM

Any coders who can help me?

I try to reduce the number of instructions from 4 to 2 per shared mem access in cuda_x11_aes.cu

I try to force the pointer to be correct with this code:

__device__ __forceinline__
uint32_t SPRead(uint32_t *const __restrict__ x, uint32_t low)
{
   uint32_t tmp,res;
   asm("bfi.b32 %0, %1, %2, 10 , 8;" : "=r"(tmp) : "r"(low), "r"(x));
   asm("ld.shared.u32 %0,[%1];" : "=r"(res) : "r"(tmp));
   return res;
}

tmp in the second instruction should be a pointer

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 18, 2015, 06:49:08 AM

#1630

Didn't work.

the strange is that this doesn't work eighter:

__device__ __forceinline__
uint32_t SPRead(uint32_t *const __restrict__ x, uint32_t low)
{
uint32_t *tmp;
tmp=x;
return tmp[low];
}
...
tmp=SPRead(sharedmemory,x0 &0xff);

but this works:

tmp= sharedmemory[x0 &0xff];

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

Namsbreh

Sr. Member

Offline

Activity: 318
Merit: 250

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 18, 2015, 10:39:55 AM

#1631

How is the next version of the spreadminer coming along?

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 18, 2015, 11:25:37 AM

#1632

Quote from: Namsbreh on February 18, 2015, 10:39:55 AM

How is the next version of the spreadminer coming along?

I'm rewriting the sha part. Perhaps I'm done in the weekend.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

scryptr

Legendary

Offline

Activity: 1793
Merit: 1028

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 19, 2015, 07:42:44 PM
Last edit: February 19, 2015, 07:57:52 PM by scryptr

#1633

QUBIT --

I've been mining qubit on Yaamp with no accepts. No errors, maybe one or two accepts, but a long chain of block change messages, one after another. My 960 is getting about 7880kh/s, but no productive work. I am using version 39, on Windows 7.

sp-ccminer v39:

scryptr image

tpruvot ccminer v1.5.3:

scryptr image

--scryptr

TIPS: BTC - 1Fs4uZ6a9ABYBTaHGUfqcwCQmeBRxkKRQT DASH - XrK81tW31SLsVvZ2WX9VhTjpT6GXJPLdbQ
SCRYPTR'S NOTEBOOK: https://bitcointalk.org/index.php?topic=5035515.msg46035530#msg46035530
GITHUB: "github.com/scryptr" MERIT is appreciated, also. Thanks!

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 19, 2015, 09:04:13 PM

#1634

Must be a linux issue, or perhaps an issue with the 960 card.

51% of the qubit hashrate is now with my latest version of the mod @yaamp

ccminer/1.5.39-git(SP-MOD) 14 51% 161.1 mh/s 1%

I will be testing on the 960 this weekend.

The latest version on github has a bit high intensity. I get out of memory sometimes on my testrig, but the hashrate is +30-50KHASH. The GPU has plenty of memory left, but I think this is an issue with memory on the mainboard. I try to allocate a total of 1.2 gig on the graphic card, and it fails. 700mb is ok. (2GB 750ti's)

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 19, 2015, 09:10:44 PM

#1635

More free hash is comming in x13/x14/x15 (fugue).
I can precalc most of the first round of the FUGUE512_3 function since it is starting with alot of constant inputs. It seems to give another 20-30KHASH(750ti), but more work is needed before I can know for sure. .

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

scryptr

Legendary

Offline

Activity: 1793
Merit: 1028

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 19, 2015, 09:14:02 PM

#1636

Quote from: sp_ on February 19, 2015, 09:04:13 PM

Must be a linux issue, or perhaps an issue with the 960 card.

51% of the qubit hashrate is now with my latest version of the mod @yaamp

ccminer/1.5.39-git(SP-MOD) 14 51% 161.1 mh/s 1%

I will be testing on the 960 this weekend.

The latest version on github has a bit high intensity. I get out of memory sometimes on my testrig, but the hashrate is +30-50KHASH. The GPU has plenty of memory left, but I think this is an issue with memory on the mainboard. I try to allocate a total of 1.2 gig on the graphic card, and it fails. 700mb is ok. (2GB 750ti's)

Thanks for the quick reply! This card is running on Win7 x64. I also posted on tpruvot's thread. --scryptr

TIPS: BTC - 1Fs4uZ6a9ABYBTaHGUfqcwCQmeBRxkKRQT DASH - XrK81tW31SLsVvZ2WX9VhTjpT6GXJPLdbQ
SCRYPTR'S NOTEBOOK: https://bitcointalk.org/index.php?topic=5035515.msg46035530#msg46035530
GITHUB: "github.com/scryptr" MERIT is appreciated, also. Thanks!

tbearhere

Legendary

Offline

Activity: 3136
Merit: 1003

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 19, 2015, 09:25:31 PM

#1637

Quote from: StuffOfInterest on February 04, 2015, 03:13:54 AM

Quote from: sp_ on February 03, 2015, 07:20:01 PM

Quote from: StuffOfInterest on February 03, 2015, 11:01:33 AM

Interesting to see bitcoin (sha256) support added. No chance of profit but fun to have most of the algos supported in one miner. Of note, this doesn't seem to be a very optimal implementation. With cgminer (opencl) I was getting about 405,000 kh/s on my 2 x 750 Ti setup. With release 34 ccminer I'm down to about 345,000 kh/s. Still, it sits at the very bottom of the profitability list with either hash rate so I'm not too concerned. It burns many more times in power than it earns.

Klaus_t implementation was pretty good, but the launchconfiguration of the kernal was way off. I submitted a 17.5% speed increase to github. (750ti)

Hah! 17.5% increase puts it right at the 405 mh/s I had with the OpenCL miner. Cheesy

If all the ASIC miners out there suddenly brick themselves we'll have a nice GPU Bitcoin miner here.

@ StuffOfInterest or anyone. What does your bat file look like to gpu mine bitcoin to your wallet? I can't get mine working. Did you add a conf file if so what addnode and port do you use? I have to try it just to say I mined btc directly. Hey...I may hit a block. Grin

flipclip

Member

Offline

Activity: 111
Merit: 10

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 20, 2015, 05:56:30 PM

#1638

Quote from: sp_ on February 19, 2015, 09:04:13 PM

Must be a linux issue, or perhaps an issue with the 960 card.

51% of the qubit hashrate is now with my latest version of the mod @yaamp

ccminer/1.5.39-git(SP-MOD) 14 51% 161.1 mh/s 1%

I will be testing on the 960 this weekend.

The latest version on github has a bit high intensity. I get out of memory sometimes on my testrig, but the hashrate is +30-50KHASH. The GPU has plenty of memory left, but I think this is an issue with memory on the mainboard. I try to allocate a total of 1.2 gig on the graphic card, and it fails. 700mb is ok. (2GB 750ti's)

Probably a 960 issue... works fine Linux (2) 750 Ti.

rednoW

Legendary

Offline

Activity: 1510
Merit: 1003

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 20, 2015, 06:43:00 PM
Last edit: February 20, 2015, 07:28:47 PM by rednoW

#1639

Commit "Bether default throughput qubit(+30khash 750ti)" gives error

"Cuda error in func 'x11_simd512_cpu_init' at line 634 : out of memory."

on my poor gtx750 non-ti with only 1gb memory

Will try to lower "int intensity = 256 * 256 * 14;"

I tried 256 * 256 * 12 - it crashes driver when monitor attached and works but slow without monitor.
256 * 256 * 10 gives me 2-4 khs benefit compares to default 1U << 19 (256 * 256 * 8) from previous version

So I went to try -i command line parameter and figured out that for my card -i 19.3 is max for qubit (681472 cuda threads)

scryptr

Legendary

Offline

Activity: 1793
Merit: 1028

Re: 10MHASH CCminer modded NVIDIA Maxwell kernals by SP.

February 22, 2015, 04:32:59 AM
Last edit: February 22, 2015, 06:09:41 PM by scryptr

#1640

Quote from: flipclip on February 20, 2015, 05:56:30 PM

Quote from: sp_ on February 19, 2015, 09:04:13 PM

Must be a linux issue, or perhaps an issue with the 960 card.

51% of the qubit hashrate is now with my latest version of the mod @yaamp

ccminer/1.5.39-git(SP-MOD) 14 51% 161.1 mh/s 1%

I will be testing on the 960 this weekend.

The latest version on github has a bit high intensity. I get out of memory sometimes on my testrig, but the hashrate is +30-50KHASH. The GPU has plenty of memory left, but I think this is an issue with memory on the mainboard. I try to allocate a total of 1.2 gig on the graphic card, and it fails. 700mb is ok. (2GB 750ti's)

Probably a 960 issue... works fine Linux (2) 750 Ti.

QUBIT ISSUE:

I have 2 6x750ti rigs, 1 is Linux, the other is Win 8 (now). I also have a GTX 960 on Win 7. Qubit does not behave well, recently. Both my 750ti cards and my GTX 960 card have periods where no shares are accepted or submitted, although hashing goes on. No rejects, just hashing speed and new block notifications. Essentially, they burn electricity and do no useful work.

Also, I get strange "doubled-up" behavior, where every other share is rejected. This happens with tpruvot's ccminer v1.5.3. Apparently, it happens about the same time as trouble with SP_ 's ccminer.

Later, they may run qubit properly. I don't know why. --scryptr

EDIT: I suppose the reason I am posting about qubit is that qubit is a top-ranking earner just now. If I let my Windows rigs mine on the round-robin batch file, they get stuck on qubit and actually earn nothing. --scryptr

TIPS: BTC - 1Fs4uZ6a9ABYBTaHGUfqcwCQmeBRxkKRQT DASH - XrK81tW31SLsVvZ2WX9VhTjpT6GXJPLdbQ
SCRYPTR'S NOTEBOOK: https://bitcointalk.org/index.php?topic=5035515.msg46035530#msg46035530
GITHUB: "github.com/scryptr" MERIT is appreciated, also. Thanks!