pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
|
January 14, 2016, 09:19:00 AM |
|
I can test on linux with recent amd cpu and older intel without aes-ni.
|
|
|
|
chrysophylax
Legendary
Offline
Activity: 2912
Merit: 1091
--- ChainWorks Industries ---
|
|
January 14, 2016, 09:49:20 AM |
|
im here and commenting so that i get updates on the thread ... great work mate ... #crysx
|
|
|
|
MaxDZ8
|
|
January 14, 2016, 09:54:11 AM |
|
Hello Joblo, I've seen on the other thread you have asked for info about AES-NI on AMD. AMD might be hard buy but at least they have a somewhat coherent feature set. AES-NI is available on all recent processors AFAIK including low power Kabini as well as some low-power options are half a decade old. In general, every chip above 30 bucks has it including those in SoHo NAS. Thank you Intel for market segmentation!
Is people CPU mining qubit those days? A radeon 7750 will do twice the qubit hash at 10% less power. By contrast the x11 performance looks cool.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
January 14, 2016, 04:22:34 PM |
|
I can test on linux with recent amd cpu and older intel without aes-ni.
Thanks Pallas. The first release won't support CPUs without AES_NI and will just exit. However, for testing For the test build it wil still try to run. My only goal is to look for false positives and false negatives that indicate the AES_NI check isn't 100% accurate. I'm also curious to see what will happen if an older CPU tries to run AES_NI code. I've got all that covered for the test cycle but I'll keep your offer in mind for the second release which will support non-AES_NI CPUs. Also thanks for all your work, I'm sure there's much more than I'm aware of. I'm still somewhat puzzled about the different performance profile for your neoscrypt ccminer kernel vs DJM34's. Yours works better on Maxwell but DJM34's hashes 12% faster than your on my 780ti. Considering the nature of neoscrypt it's likely the HW configuration is the reason for the difference, but I can't help to think think that understanding why this performance reversal happened. I did some analysis when your kernel was released, and tried mixing up parts of each kernel to try to affect performance but had no success. As someone who is intimately familiar with both versions I was hoping to jog your mind and maybe spark an idea for further optimization.
|
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
|
January 14, 2016, 04:37:23 PM |
|
I can test on linux with recent amd cpu and older intel without aes-ni.
Thanks Pallas. The first release won't support CPUs without AES_NI and will just exit. However, for testing For the test build it wil still try to run. My only goal is to look for false positives and false negatives that indicate the AES_NI check isn't 100% accurate. I'm also curious to see what will happen if an older CPU tries to run AES_NI code. I've got all that covered for the test cycle but I'll keep your offer in mind for the second release which will support non-AES_NI CPUs. by "recent amd cpu" I mean one that supports aes-ni, an FX processor. Also thanks for all your work, I'm sure there's much more than I'm aware of.
I'm still somewhat puzzled about the different performance profile for your neoscrypt ccminer kernel vs DJM34's. Yours works better on Maxwell but DJM34's hashes 12% faster than your on my 780ti. Considering the nature of neoscrypt it's likely the HW configuration is the reason for the difference, but I can't help to think think that understanding why this performance reversal happened. I did some analysis when your kernel was released, and tried mixing up parts of each kernel to try to affect performance but had no success. As someone who is intimately familiar with both versions I was hoping to jog your mind and maybe spark an idea for further optimization.
let's move this to the ccminer thread
|
|
|
|
sp_
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
January 14, 2016, 05:01:36 PM |
|
I'm still somewhat puzzled about the different performance profile for your neoscrypt ccminer kernel vs DJM34's. Yours works better on Maxwell but DJM34's hashes 12% faster than your on my 780ti. Considering the nature of neoscrypt it's likely the HW configuration is the reason for the difference, but I can't help to think think that understanding why this performance reversal happened. I did some analysis when your kernel was released, and tried mixing up parts of each kernel to try to affect performance but had no success. As someone who is intimately familiar with both versions I was hoping to jog your mind and maybe spark an idea for further optimization.
It's because DJM34's version has a seperate kernal for compute 3.5 devices. It uses memshift varable of 4 since a cacheline on the 780ti is bigger than on the maxwell's.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
January 14, 2016, 05:30:37 PM |
|
I'm still somewhat puzzled about the different performance profile for your neoscrypt ccminer kernel vs DJM34's. Yours works better on Maxwell but DJM34's hashes 12% faster than your on my 780ti. Considering the nature of neoscrypt it's likely the HW configuration is the reason for the difference, but I can't help to think think that understanding why this performance reversal happened. I did some analysis when your kernel was released, and tried mixing up parts of each kernel to try to affect performance but had no success. As someone who is intimately familiar with both versions I was hoping to jog your mind and maybe spark an idea for further optimization.
It's because DJM34's version has a seperate kernal for compute 3.5 devices. It uses memshift varable of 4 since a cacheline on the 780ti is bigger than on the maxwell's. Bigger cache line and a memory intensive algo. Makes sense. Pallas suggested moving this to your thread but discussions about kepler aren't really on topic for SP_MOD.. Well it's my thread, I don't think there is much more to discuss but if it takes off we can start it's own thread. I'm satisfied with this explanation so the only solution^h^h^h^h^h workaround is a form of hybrid. I have built such a hybrid but it is bloated because of a 2 dimensional growth in neoscrypt code. Because I make the kernel selection at run time both versions of neoscrypt get buillt into all three versions of cuda. Only the 3.5 cuda will select DJM34 neo and the maxwell code will only select the Pallas neo. I'd like to move the check to compile time but haven't been motivated enough to implement it without a ccminer fork willing to host it. If there is interest I can take another look once things settle down a bit with cpuminer.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
January 15, 2016, 02:31:36 AM |
|
It seems my very first decision on this project was the wrong one. I forked from ccminer-multi-1.2pre instead of 1.1. I also failed to confirm 1.2pre would compile in windows.
Now I'm in a bind and it will delay windows support.
Hopefully the conflicts are only in files I haven't touched, if so I should be able to release with windows support without too much delay. Otherwise I will go ahead and release with only linux support.
|
|
|
|
chrysophylax
Legendary
Offline
Activity: 2912
Merit: 1091
--- ChainWorks Industries ---
|
|
January 15, 2016, 02:37:25 AM |
|
It seems my very first decision on this project was the wrong one. I forked from ccminer-multi-1.2pre instead of 1.1. I also failed to confirm 1.2pre would compile in windows.
Now I'm in a bind and it will delay windows support.
Hopefully the conflicts are only in files I haven't touched, if so I should be able to release with windows support without too much delay. Otherwise I will go ahead and release with only linux support.
you are doing a superb job as it stands joblo ... windows support ( as much as i dont like the os ) is important and can be sorted ... the main thing is to get the core working and stable ... the port can be done shortly after ... keep up the good work mate ... #crysx
|
|
|
|
theLosers106
|
|
January 15, 2016, 02:44:51 AM Last edit: January 15, 2016, 04:15:43 AM by theLosers106 |
|
Hi joblo,
Currently testing the cpuminer on an Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz .. Running using Fedora release 20 64bit .. a bit outdated as I'm too lazy to upgrade it as there are a lot of things running on it.. So far it is doing quite well but I keep getting "Result does not validate on CPU!" ..
EDIT2: Tested with x11, X13, X15, Quark and Qubit algos.
Will keep you updated as each test takes a while to get any accepted..
theLosers106.
|
BTC: 1KnLUyFTyqrMzcNrgACHFEoUtbqQUs8X1Q XRE: 15RjuCT6T8sF1KkD2MmT4pQvHU8UtSoYXG
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
January 15, 2016, 03:04:28 AM |
|
It seems my very first decision on this project was the wrong one. I forked from ccminer-multi-1.2pre instead of 1.1. I also failed to confirm 1.2pre would compile in windows.
Now I'm in a bind and it will delay windows support.
Hopefully the conflicts are only in files I haven't touched, if so I should be able to release with windows support without too much delay. Otherwise I will go ahead and release with only linux support.
Fortunately it was another noob mistake, tried to compile debug win32. Back on track.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
January 15, 2016, 03:11:45 AM Last edit: January 15, 2016, 04:14:14 AM by joblo |
|
Hi joblo,
Currently testing the cpuminer on an Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz .. Running using Fedora release 20 64bit .. a bit outdated as I'm too lazy to upgrade it as there are a lot of things running on it.. So far it is doing quite well but I keep getting "Result does not validate on CPU!" ..
Tested with x11, Quark and Qubit algos.
Will keep you updated as each test takes a while to get any accepted..
theLosers106.
The invalid nonces also occur in the parent applications so I'm not too worried about them. I included these messages only in the RC build just to make sure they don't get out of control. It does represent wasted hash (no one likes to waste hash) but the hash rate at the pool seems in line with the client. It's hard to tell with share submits so infrequent due to the low hash rate. I've observed the pool reports varies between -50% and 200% of the stable rate in cpuminer. To me this seems in line. I recommend you read the realease-notes and the info displayed on startup. It might answer some other questions that may arise.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
January 15, 2016, 03:13:31 AM |
|
It seems my very first decision on this project was the wrong one. I forked from ccminer-multi-1.2pre instead of 1.1. I also failed to confirm 1.2pre would compile in windows.
Now I'm in a bind and it will delay windows support.
Hopefully the conflicts are only in files I haven't touched, if so I should be able to release with windows support without too much delay. Otherwise I will go ahead and release with only linux support.
you are doing a superb job as it stands joblo ... windows support ( as much as i dont like the os ) is important and can be sorted ... the main thing is to get the core working and stable ... the port can be done shortly after ... keep up the good work mate ... #crysx Thanks for the encouragement.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
January 15, 2016, 10:21:05 AM |
|
Progress update
I've worked through some of the windows compile errors I was getting frustrated sochanged gears and started working on sse2 support.
sse2 qubit works and will be included in the first release.
I can get sse2 working on one algo that uses groestl at a time. If I include the sse2 groestl files in two algo at the same time I get multiple definition linker errors. The included files are full of macros so that kind of explains it. I may try turning them into functions so the files that include them don't pull the coded into themselves. If that works it will probably have a performance impact, hopefully not too big. I
This affects all the x algos and quark.
Here are some updated hash rates from my i7-4790K 4 GHZ showing the sse2 performance. This shows the difference between the aes_ni optimized kernels and sse2 on the same cpu. Actually running it on an older cpu will probably have even lower performance. I can't test sse2 on a real sse2 limited cpu because my core2 pc runs windows.
aes_ni sse2 sse2/aes_ni x11 707k 529 .75 x13 320 x14 x15 280 quark 1080 907 .84 qubit 1045 755 .72
|
|
|
|
zTheWolfz
|
|
January 15, 2016, 01:53:02 PM |
|
Progress update
I've worked through some of the windows compile errors I was getting frustrated sochanged gears and started working on sse2 support.
sse2 qubit works and will be included in the first release.
I can get sse2 working on one algo that uses groestl at a time. If I include the sse2 groestl files in two algo at the same time I get multiple definition linker errors. The included files are full of macros so that kind of explains it. I may try turning them into functions so the files that include them don't pull the coded into themselves. If that works it will probably have a performance impact, hopefully not too big. I
This affects all the x algos and quark.
Here are some updated hash rates from my i7-4790K 4 GHZ showing the sse2 performance. This shows the difference between the aes_ni optimized kernels and sse2 on the same cpu. Actually running it on an older cpu will probably have even lower performance. I can't test sse2 on a real sse2 limited cpu because my core2 pc runs windows.
aes_ni sse2 sse2/aes_ni x11 707k 529 .75 x13 320 x14 x15 280 quark 1080 907 .84 qubit 1045 755 .72
I spent the last few hours working on getting this compiled. I'm using a VM within windows 7. I tried 4 different flavors of Linux and either it doesn't like Linux Mint 32/64 bit or Ubuntu 64 or I'm missing some dependents it needs to compile correctly & never makes the cpumimer executable. What OS and version are you using? I'll give it another go once I get that info from you. My Linux skills are pretty rusty "never was all that great to stat with" but I'm fairly sure I was doing everything correctly.
|
|
|
|
theLosers106
|
|
January 15, 2016, 01:57:14 PM |
|
Progress update
I've worked through some of the windows compile errors I was getting frustrated sochanged gears and started working on sse2 support.
sse2 qubit works and will be included in the first release.
I can get sse2 working on one algo that uses groestl at a time. If I include the sse2 groestl files in two algo at the same time I get multiple definition linker errors. The included files are full of macros so that kind of explains it. I may try turning them into functions so the files that include them don't pull the coded into themselves. If that works it will probably have a performance impact, hopefully not too big. I
This affects all the x algos and quark.
Here are some updated hash rates from my i7-4790K 4 GHZ showing the sse2 performance. This shows the difference between the aes_ni optimized kernels and sse2 on the same cpu. Actually running it on an older cpu will probably have even lower performance. I can't test sse2 on a real sse2 limited cpu because my core2 pc runs windows.
aes_ni sse2 sse2/aes_ni x11 707k 529 .75 x13 320 x14 x15 280 quark 1080 907 .84 qubit 1045 755 .72
I spent the last few hours working on getting this compiled. I'm using a VM within windows 7. I tried 4 different flavors of Linux and either it doesn't like Linux Mint 32/64 bit or Ubuntu 64 or I'm missing some dependents it needs to compile correctly & never makes the cpumimer executable. What OS and version are you using? I'll give it another go once I get that info from you. My Linux skills are pretty rusty "never was all that great to stat with" but I'm fairly sure I was doing everything correctly. Hey, I used Fedora release 20 64bit with GCC v4.8.3 to compile and it worked fine. The release version and the GCC version is outdate but hey it works. (:
|
BTC: 1KnLUyFTyqrMzcNrgACHFEoUtbqQUs8X1Q XRE: 15RjuCT6T8sF1KkD2MmT4pQvHU8UtSoYXG
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
January 15, 2016, 04:33:51 PM Last edit: January 15, 2016, 04:44:01 PM by joblo |
|
I spent the last few hours working on getting this compiled. I'm using a VM within windows 7. I tried 4 different flavors of Linux and either it doesn't like Linux Mint 32/64 bit or Ubuntu 64 or I'm missing some dependents it needs to compile correctly & never makes the cpumimer executable. What OS and version are you using? I'll give it another go once I get that info from you. My Linux skills are pretty rusty "never was all that great to stat with" but I'm fairly sure I was doing everything correctly.
I also use Fedora 20 x64. It has to be 64 bit but other than that Mint should work. A VM is not the issue I build windows in a VM. You'll need libcurl-devel and some form of ssl development package (F20 has openssl-devel). That's about all I can offer without more info. Make sure you follow the instruction in the README.md. ./autogen.sh # only needed if building from git repo ./configure CFLAGS="-O3 -march=native" --with-crypto --with-curl make
Take care, -O3 is an upper case O not a zero. If you can compile cpuminer-multi-1.1 you should be able to compile cpuminer-1.9-RC. I you can't compile either it's something at your end. Post your error messages for more help.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
January 15, 2016, 06:24:41 PM |
|
I have a tricky compile error that I'm sure exerienced c coders can easilly solve but has me stumped. The functions involved are declared in grso-nomacro.h and defined in grso-nomacro.c. They are called by x11_sse2.c. Data follows. algo/x11_sse2.c:124:36: error: expected expression before ‘BitSequence’ update_grso( &ctx.groestl, BitSequence* hashbuf, (const BitSequence*)hash, 512 ); ^ algo/x11_sse2.c:124:36: error: too few arguments to function ‘update_grso’ In file included from algo/x11_sse2.c:35:0: algo/sse2/groestl/grso-nomacro/grso-nomacro.h:10:6: note: declared here void update_grso ( grsoState* sts_grs, BitSequence_gr* hashbuf, const BitSequence_gr* hash, DataLength_gr databitlen ); ^ algo/x11_sse2.c:125:35: error: expected expression before ‘BitSequence’ final_grso( &ctx.groestl, BitSequence* hashbuf,(const BitSequence*)hash ); ^ algo/x11_sse2.c:125:35: error: too few arguments to function ‘final_grso’ In file included from algo/x11_sse2.c:35:0: algo/sse2/groestl/grso-nomacro/grso-nomacro.h:12:6: note: declared here void final_grso ( grsoState* sts_grs, BitSequence_gr* hashbuf, const BitSequence_gr* hash ); ^
grso-nomacro.h typedef unsigned char BitSequence_gr; typedef unsigned long long DataLength_gr;
void init_grso ( grsoState* sts_grs );
void update_grso ( grsoState* sts_grs, BitSequence_gr* hashbuf, const BitSequence_gr* hash, DataLength_gr databitlen );
void final_grso ( grsoState* sts_grs, BitSequence_gr* hashbuf, const BitSequence_gr* hash );
grso-nomacro.c void update_grso ( grsoState* sts_grs, BitSequence_gr* hashbuf, const BitSequence_gr* hash, DataLength_gr databitlen ) { /* function code */ } void final_grso ( grsoState* sts_grs, BitSequence_gr* hashbuf, const BitSequence_gr* hash ) {/* function code */ }
x11_sse2_c update_grso( &ctx.groestl, BitSequence* hashbuf, (const BitSequence*)hash, 512 ); final_grso( &ctx.groestl, BitSequence* hashbuf,(const BitSequence*)hash );
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
January 15, 2016, 07:32:19 PM |
|
Progress update
I've worked through some of the windows compile errors I was getting frustrated sochanged gears and started working on sse2 support.
sse2 qubit works and will be included in the first release.
I can get sse2 working on one algo that uses groestl at a time. If I include the sse2 groestl files in two algo at the same time I get multiple definition linker errors. The included files are full of macros so that kind of explains it. I may try turning them into functions so the files that include them don't pull the coded into themselves. If that works it will probably have a performance impact, hopefully not too big. I
This affects all the x algos and quark.
I'm going to change my approach slightly. Instead of cloning the macros into a function I'll just wrap them in a function. I'm more optimistic about the perrformance impact of this. There will be a cost in the overhead of the function call/return but this design opens up other optimization opportunities. It remains to be seen what the net effect will be. If there is a net gain this change could be applied to other macro based sse2 sub-algos, which are still used by the aes_ni kernels ultimately speeding up aes_ni kernels as well. These changes are what caused the compile problem above, hopefuly the'l magically disappear. it's exciting times for a cpu miner.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
January 15, 2016, 07:51:40 PM |
|
Progress update
I've worked through some of the windows compile errors I was getting frustrated sochanged gears and started working on sse2 support.
sse2 qubit works and will be included in the first release.
I can get sse2 working on one algo that uses groestl at a time. If I include the sse2 groestl files in two algo at the same time I get multiple definition linker errors. The included files are full of macros so that kind of explains it. I may try turning them into functions so the files that include them don't pull the coded into themselves. If that works it will probably have a performance impact, hopefully not too big. I
This affects all the x algos and quark.
I'm going to change my approach slightly. Instead of cloning the macros into a function I'll just wrap them in a function. I'm more optimistic about the perrformance impact of this. There will be a cost in the overhead of the function call/return but this design opens up other optimization opportunities. It remains to be seen what the net effect will be. If there is a net gain this change could be applied to other macro based sse2 sub-algos, which are still used by the aes_ni kernels ultimately speeding up aes_ni kernels as well. These changes are what caused the compile problem above, hopefuly the'l magically disappear. it's exciting times for a cpu miner. Well the compoile errors didn't magically disappear. It's still a better design so I'll keep it. I just declared and defined wrappers in the same files where the macros were defined, eliminating the need for grso-nomacro files. And the wrappers do nothing else but call the macros. When I can get it to compile it should work.
|
|
|
|
|