sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
February 16, 2015, 11:04:01 PM |
|
The last change is Aes is slower on the 750ti. I am working to improve it.
|
|
|
|
|
|
In order to achieve higher forum ranks, you need both activity points and merit points.
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
February 17, 2015, 09:33:41 AM |
|
The cuda_x11_aes.cu is excluded from the project file, so if you change it it will not build unless you save echo or shavite or take a full build. To messure you can use Fresh, because this has fewer chained hashing algos. The differences are small, but the PTX code. (assembly code) shows less instructions with my changes. And also the different cards have different timings. A small speedup of 1kHASH will not be noticable, but 100 small speedups will.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
February 17, 2015, 09:36:47 AM |
|
x14 has 14 chained algos. If I optmize shabal 2% , the increase of the total hash is very small. like 0.005% faster. This is because (groest, echo, simd etc are much slower. and take most of the time)
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
February 17, 2015, 09:45:26 AM |
|
Another problem is 64bit vs 32bit. And windows vs linux
I optimize for windows and 32bit. On linux builds are normally 64 bit(Some optimalizations are faster when building 32bit. )
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
February 17, 2015, 09:49:47 AM |
|
qubit has slowed from 3609 kH/s on the previous version to 3571 kH/s on the current one - GTX 850M Linux, using "git pull" to update, build.sh has -O3 passed to configure
The fresh algo uses 2 rounds of shavite, 2 of simd and 1 echo. shavite512 simd512 shavite512 simd512 echo512 The shavite and echo is affected in my AES change, that is faster on the compute 5.2 cards. I am rewriting the AES now, so the next commit will hopefully increase the performance of fresh.
|
|
|
|
rednoW
Legendary
Offline
Activity: 1510
Merit: 1003
|
|
February 17, 2015, 11:09:57 AM |
|
The cuda_x11_aes.cu is excluded from the project file, so if you change it it will not build unless you save echo or shavite or take a full build. To messure you can use Fresh, because this has fewer chained hashing algos.
I checked a VS build log after rollback. cuda_x11_aes.cu was #included in 2 other .cu files that were rebuilt by VS. So I think I made it right.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
February 17, 2015, 09:33:38 PM |
|
Any coders who can help me?
I try to reduce the number of instructions from 4 to 2 per shared mem access in cuda_x11_aes.cu
I try to force the pointer to be correct with this code:
__device__ __forceinline__ uint32_t SPRead(uint32_t *const __restrict__ x, uint32_t low) { uint32_t tmp,res; asm("bfi.b32 %0, %1, %2, 10 , 8;" : "=r"(tmp) : "r"(low), "r"(x)); asm("ld.shared.u32 %0,[%1];" : "=r"(res) : "r"(tmp)); return res; }
the ptx looks good
cvta.shared.u32 %r393, _Z23x11_echo512_gpu_hash_64jjPy$__cuda_local_var_204566_57_non_const_sharedMemory; // inline asm bfi.b32 %r232, %r18, %r393, 10 , 8; // inline asm // inline asm ld.shared.u32 %r235,[%r232];
But the program crash with illegal memory access. The shared memory pointer is alligned to 1024 boundary.
Without the pointer hack I get code like this:
bfe.u32 %r243, %r4, %r440, 8; // inline asm shl.b32 %r468, %r243, 2; add.s32 %r469, %r468, %r451; ld.shared.u32 %r470, [%r469+1024];
4 instructions.
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
February 17, 2015, 09:46:25 PM |
|
Any coders who can help me?
I try to reduce the number of instructions from 4 to 2 per shared mem access in cuda_x11_aes.cu
I try to force the pointer to be correct with this code:
__device__ __forceinline__ uint32_t SPRead(uint32_t *const __restrict__ x, uint32_t low) { uint32_t tmp,res; asm("bfi.b32 %0, %1, %2, 10 , 8;" : "=r"(tmp) : "r"(low), "r"(x)); asm("ld.shared.u32 %0,[%1];" : "=r"(res) : "r"(tmp)); return res; }
tmp in the second instruction should be a pointer
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
February 18, 2015, 06:49:08 AM |
|
Didn't work.
the strange is that this doesn't work eighter:
__device__ __forceinline__ uint32_t SPRead(uint32_t *const __restrict__ x, uint32_t low) { uint32_t *tmp; tmp=x; return tmp[low]; } ... tmp=SPRead(sharedmemory,x0 &0xff);
but this works:
tmp= sharedmemory[x0 &0xff];
|
|
|
|
Namsbreh
|
|
February 18, 2015, 10:39:55 AM |
|
How is the next version of the spreadminer coming along?
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
February 18, 2015, 11:25:37 AM |
|
How is the next version of the spreadminer coming along?
I'm rewriting the sha part. Perhaps I'm done in the weekend.
|
|
|
|
scryptr
Legendary
Offline
Activity: 1793
Merit: 1028
|
|
February 19, 2015, 07:42:44 PM Last edit: February 19, 2015, 07:57:52 PM by scryptr |
|
QUBIT -- I've been mining qubit on Yaamp with no accepts. No errors, maybe one or two accepts, but a long chain of block change messages, one after another. My 960 is getting about 7880kh/s, but no productive work. I am using version 39, on Windows 7. sp-ccminer v39: scryptr imagetpruvot ccminer v1.5.3: scryptr image--scryptr
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
February 19, 2015, 09:04:13 PM |
|
Must be a linux issue, or perhaps an issue with the 960 card.
51% of the qubit hashrate is now with my latest version of the mod @yaamp
ccminer/1.5.39-git(SP-MOD) 14 51% 161.1 mh/s 1%
I will be testing on the 960 this weekend.
The latest version on github has a bit high intensity. I get out of memory sometimes on my testrig, but the hashrate is +30-50KHASH. The GPU has plenty of memory left, but I think this is an issue with memory on the mainboard. I try to allocate a total of 1.2 gig on the graphic card, and it fails. 700mb is ok. (2GB 750ti's)
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
February 19, 2015, 09:10:44 PM |
|
More free hash is comming in x13/x14/x15 (fugue). I can precalc most of the first round of the FUGUE512_3 function since it is starting with alot of constant inputs. It seems to give another 20-30KHASH(750ti), but more work is needed before I can know for sure. .
|
|
|
|
scryptr
Legendary
Offline
Activity: 1793
Merit: 1028
|
|
February 19, 2015, 09:14:02 PM |
|
Must be a linux issue, or perhaps an issue with the 960 card.
51% of the qubit hashrate is now with my latest version of the mod @yaamp
ccminer/1.5.39-git(SP-MOD) 14 51% 161.1 mh/s 1%
I will be testing on the 960 this weekend.
The latest version on github has a bit high intensity. I get out of memory sometimes on my testrig, but the hashrate is +30-50KHASH. The GPU has plenty of memory left, but I think this is an issue with memory on the mainboard. I try to allocate a total of 1.2 gig on the graphic card, and it fails. 700mb is ok. (2GB 750ti's)
Thanks for the quick reply! This card is running on Win7 x64. I also posted on tpruvot's thread. --scryptr
|
|
|
|
tbearhere
Legendary
Offline
Activity: 3136
Merit: 1003
|
|
February 19, 2015, 09:25:31 PM |
|
Interesting to see bitcoin (sha256) support added. No chance of profit but fun to have most of the algos supported in one miner. Of note, this doesn't seem to be a very optimal implementation. With cgminer (opencl) I was getting about 405,000 kh/s on my 2 x 750 Ti setup. With release 34 ccminer I'm down to about 345,000 kh/s. Still, it sits at the very bottom of the profitability list with either hash rate so I'm not too concerned. It burns many more times in power than it earns.
Klaus_t implementation was pretty good, but the launchconfiguration of the kernal was way off. I submitted a 17.5% speed increase to github. (750ti) Hah! 17.5% increase puts it right at the 405 mh/s I had with the OpenCL miner. If all the ASIC miners out there suddenly brick themselves we'll have a nice GPU Bitcoin miner here. @ StuffOfInterest or anyone. What does your bat file look like to gpu mine bitcoin to your wallet? I can't get mine working. Did you add a conf file if so what addnode and port do you use? I have to try it just to say I mined btc directly. Hey...I may hit a block.
|
|
|
|
flipclip
Member
Offline
Activity: 111
Merit: 10
|
|
February 20, 2015, 05:56:30 PM |
|
Must be a linux issue, or perhaps an issue with the 960 card.
51% of the qubit hashrate is now with my latest version of the mod @yaamp
ccminer/1.5.39-git(SP-MOD) 14 51% 161.1 mh/s 1%
I will be testing on the 960 this weekend.
The latest version on github has a bit high intensity. I get out of memory sometimes on my testrig, but the hashrate is +30-50KHASH. The GPU has plenty of memory left, but I think this is an issue with memory on the mainboard. I try to allocate a total of 1.2 gig on the graphic card, and it fails. 700mb is ok. (2GB 750ti's)
Probably a 960 issue... works fine Linux (2) 750 Ti.
|
|
|
|
rednoW
Legendary
Offline
Activity: 1510
Merit: 1003
|
|
February 20, 2015, 06:43:00 PM Last edit: February 20, 2015, 07:28:47 PM by rednoW |
|
Commit "Bether default throughput qubit(+30khash 750ti)" gives error
"Cuda error in func 'x11_simd512_cpu_init' at line 634 : out of memory."
on my poor gtx750 non-ti with only 1gb memory
Will try to lower "int intensity = 256 * 256 * 14;"
I tried 256 * 256 * 12 - it crashes driver when monitor attached and works but slow without monitor. 256 * 256 * 10 gives me 2-4 khs benefit compares to default 1U << 19 (256 * 256 * 8) from previous version
So I went to try -i command line parameter and figured out that for my card -i 19.3 is max for qubit (681472 cuda threads)
|
|
|
|
scryptr
Legendary
Offline
Activity: 1793
Merit: 1028
|
|
February 22, 2015, 04:32:59 AM Last edit: February 22, 2015, 06:09:41 PM by scryptr |
|
Must be a linux issue, or perhaps an issue with the 960 card.
51% of the qubit hashrate is now with my latest version of the mod @yaamp
ccminer/1.5.39-git(SP-MOD) 14 51% 161.1 mh/s 1%
I will be testing on the 960 this weekend.
The latest version on github has a bit high intensity. I get out of memory sometimes on my testrig, but the hashrate is +30-50KHASH. The GPU has plenty of memory left, but I think this is an issue with memory on the mainboard. I try to allocate a total of 1.2 gig on the graphic card, and it fails. 700mb is ok. (2GB 750ti's)
Probably a 960 issue... works fine Linux (2) 750 Ti. QUBIT ISSUE: I have 2 6x750ti rigs, 1 is Linux, the other is Win 8 (now). I also have a GTX 960 on Win 7. Qubit does not behave well, recently. Both my 750ti cards and my GTX 960 card have periods where no shares are accepted or submitted, although hashing goes on. No rejects, just hashing speed and new block notifications. Essentially, they burn electricity and do no useful work. Also, I get strange "doubled-up" behavior, where every other share is rejected. This happens with tpruvot's ccminer v1.5.3. Apparently, it happens about the same time as trouble with SP_ 's ccminer. Later, they may run qubit properly. I don't know why. --scryptr EDIT: I suppose the reason I am posting about qubit is that qubit is a top-ranking earner just now. If I let my Windows rigs mine on the round-robin batch file, they get stuck on qubit and actually earn nothing. --scryptr
|
|
|
|
|