djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 17, 2014, 09:00:57 AM |
|
Can anyone send me a link to the latest NVminer sourcecode. (merged)
Seems like the funnel shift is missing in some of the 11 algorithms: (With a ROL compute 5.0 + devices will get a boost)
From Github (ccminer 1.2):
cuda_x11_luffa512.cu:
define TWEAK(a0,a1,a2,a3,j)\ a0 = (a0<<(j))|(a0>>(32-j));\ a1 = (a1<<(j))|(a1>>(32-j));\ a2 = (a2<<(j))|(a2>>(32-j));\ a3 = (a3<<(j))|(a3>>(32-j)); #define MIXWORD(a0,a4)\ a4 ^= a0;\ a0 = (a0<<2) | (a0>>(30));\ a0 ^= a4;\ a4 = (a4<<14) | (a4>>(18));\ a4 ^= a0;\ a0 = (a0<<10) | (a0>>(22));\ a0 ^= a4;\ a4 = (a4<<1) | (a4>>(31));
cuda_x11_cubehash512.cu: #define ROTATEUPWARDS7(a) (((a) << 7) | ((a) >> 25)) #define ROTATEUPWARDS11(a) (((a) << 11) | ((a) >> 21))
etc..
By rewriting these macros shift+shift+or cuda instructions can be replaced with a single rol (3 times faster) (Compute maxwell / 5.0+)
which cuda version ?
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
sp_
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
July 17, 2014, 09:21:35 AM Last edit: July 17, 2014, 09:42:02 AM by sp_ |
|
Cuda 6.0 Check: Version features and specifications The funnel shift is available for compute 3.5 and higher. http://en.wikipedia.org/wiki/CUDAHow to inline CUDA Assembly: http://docs.nvidia.com/cuda/inline-ptx-assembly/index.html#axzz37iRLSsMjInstruction set. docs.nvidia.com/cuda/parallel-thread-execution/index.html#logic-and-shift-instructions 8.7.5.1. Logic and Shift Instructions: (8.7.5.6. Logic and Shift Instructions: shf) So the following macro should be converted to something like: a0 = (a0<<(j))|(a0>>(32-j)); --> shf.l.wrap.b32 a0,a0,j,c ... I have some time in the weekend to do the full implementation. Just give me the latest branch to work on...
|
|
|
|
miner256
Newbie
Offline
Activity: 23
Merit: 0
|
|
July 17, 2014, 09:50:59 AM |
|
Not that I can see either. All closed, and binaries for Windows only :-( Looking forward to trying something when (if?) the source does get released though. This weekend I am going to try KopiemTu 1.4 and see what that is like - reading good things about it, and I like the idea of trying some tweaking of the card on linux. https://litecointalk.org/index.php?topic=16800.0No sourcecode available in nvminer.zip.
|
|
|
|
yellowduck2
|
|
July 17, 2014, 10:00:44 AM |
|
Can anyone send me a link to the latest NVminer sourcecode. (merged)
Seems like the funnel shift is missing in some of the 11 algorithms: (With a ROL compute 5.0 + devices will get a boost)
From Github (ccminer 1.2):
cuda_x11_luffa512.cu:
define TWEAK(a0,a1,a2,a3,j)\ a0 = (a0<<(j))|(a0>>(32-j));\ a1 = (a1<<(j))|(a1>>(32-j));\ a2 = (a2<<(j))|(a2>>(32-j));\ a3 = (a3<<(j))|(a3>>(32-j)); #define MIXWORD(a0,a4)\ a4 ^= a0;\ a0 = (a0<<2) | (a0>>(30));\ a0 ^= a4;\ a4 = (a4<<14) | (a4>>(18));\ a4 ^= a0;\ a0 = (a0<<10) | (a0>>(22));\ a0 ^= a4;\ a4 = (a4<<1) | (a4>>(31));
cuda_x11_cubehash512.cu: #define ROTATEUPWARDS7(a) (((a) << 7) | ((a) >> 25)) #define ROTATEUPWARDS11(a) (((a) << 11) | ((a) >> 21))
etc..
By rewriting these macros shift+shift+or cuda instructions can be replaced with a single rol (3 times faster) (Compute maxwell / 5.0+)
Hope u succeed in optimizing code
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 17, 2014, 10:25:23 AM |
|
Can anyone send me a link to the latest NVminer sourcecode. (merged)
Seems like the funnel shift is missing in some of the 11 algorithms: (With a ROL compute 5.0 + devices will get a boost)
From Github (ccminer 1.2):
cuda_x11_luffa512.cu:
define TWEAK(a0,a1,a2,a3,j)\ a0 = (a0<<(j))|(a0>>(32-j));\ a1 = (a1<<(j))|(a1>>(32-j));\ a2 = (a2<<(j))|(a2>>(32-j));\ a3 = (a3<<(j))|(a3>>(32-j)); #define MIXWORD(a0,a4)\ a4 ^= a0;\ a0 = (a0<<2) | (a0>>(30));\ a0 ^= a4;\ a4 = (a4<<14) | (a4>>(18));\ a4 ^= a0;\ a0 = (a0<<10) | (a0>>(22));\ a0 ^= a4;\ a4 = (a4<<1) | (a4>>(31));
cuda_x11_cubehash512.cu: #define ROTATEUPWARDS7(a) (((a) << 7) | ((a) >> 25)) #define ROTATEUPWARDS11(a) (((a) << 11) | ((a) >> 21))
etc..
By rewriting these macros shift+shift+or cuda instructions can be replaced with a single rol (3 times faster) (Compute maxwell / 5.0+)
Hope u succeed in optimizing code The problem is that it may break definitively the compatibility with other versions
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
cayars
|
|
July 17, 2014, 10:40:43 AM |
|
I've update to the Beta drivers 340.88 and the hash increased slightly. Now getting 7.1 Mh or so but still not the 7.8/7.9 Mh you are getting. I also tried with minep.it pool but as I was expecting it didn't change anything.
Any more ideas?
I have not followed the whole conversation, but I am wondering if one of you is running on risers and the other not? That's an interesting theory. I read somewhere that indeed it could affect the hash rate http://cryptomining-blog.com/1276-first-impressions-from-a-6-card-mining-rig-using-geforce-gtx-750-ti-gpus/I, on my side am using risers. Powered for that matter. The difference it rather significant: ~800/900 Kh I am also using ASUS, not Gigabyte. Perhaps that could also be the reason. Good observation. No risers of any kind for me.
|
|
|
|
sp_
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
July 17, 2014, 10:46:49 AM |
|
Christian has already broken compabillity for any hardware below compute 3.0. In the Killer Groestl implementation he use Compute 3.0 + instructions. like perm:
static __device__ uint32_t cuda_swab32(uint32_t x) { return __byte_perm(x, 0, 0x0123); }
I will implement the changes by using a compilerflag like this:
#if __CUDA_ARCH__ >= 130 return (uint32_t)__double2hiint(__longlong_as_double(x)); #else return (uint32_t)(x >> 32); #endif
So if you compile with compute 5.0 you will get maxwell funnelshift instead.
The current CC miner runs at the same speed for compute 3.0, 3,5 and 5.0. This is about to change.
|
|
|
|
cayars
|
|
July 17, 2014, 10:54:16 AM |
|
Cuda 6.0 Check: Version features and specifications The funnel shift is available for compute 3.5 and higher. http://en.wikipedia.org/wiki/CUDAHow to inline CUDA Assembly: http://docs.nvidia.com/cuda/inline-ptx-assembly/index.html#axzz37iRLSsMjInstruction set. docs.nvidia.com/cuda/parallel-thread-execution/index.html#logic-and-shift-instructions 8.7.5.1. Logic and Shift Instructions: (8.7.5.6. Logic and Shift Instructions: shf) So the following macro should be converted to something like: a0 = (a0<<(j))|(a0>>(32-j)); --> shf.l.wrap.b32 a0,a0,j,c ... I have some time in the weekend to do the full implementation. Just give me the latest branch to work on... I won't have the source available until after the weekend as I want to compile/test on Linux first and need to get the makefiles setup properly. However, if you want to start playing now pull down the latest from dsm34 and use this to test with. https://github.com/djm34/ccminerIf I remember correctly I have tried to compile compute5.0 under CUDA 6 and it broke something in the FRESH algo. Because of this I went back to compute 3/3.5 on Cuda 5.5 for the last release. So point being after trying your changes, please make sure to test every algo to make sure something else doesn't brake under Cuda 6 and compute 5 (if you try this). BTW, are you doing your testing on Windows or Linux?
|
|
|
|
NeuroticFish
Legendary
Offline
Activity: 3850
Merit: 6583
Looking for campaign manager? Contact icopress!
|
|
July 17, 2014, 11:00:22 AM |
|
I am already bound to ancient versions of ccminer. Because I still have compute 2.1.... Such areas really can't be fixed / done with #ifdef and also kept a not-so-good version for older GPUs? But maybe I wish too much. Obviously everybody only improves the versions mostly for own needs....
|
|
|
|
sp_
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
July 17, 2014, 11:08:25 AM |
|
I won't have the source available until after the weekend as I want to compile/test on Linux first and need to get the makefiles setup properly. However, if you want to start playing now pull down the latest from dsm34 and use this to test with. https://github.com/djm34/ccminerIf I remember correctly I have tried to compile compute5.0 under CUDA 6 and it broke something in the FRESH algo. Because of this I went back to compute 3/3.5 on Cuda 5.5 for the last release. So point being after trying your changes, please make sure to test every algo to make sure something else doesn't brake under Cuda 6 and compute 5 (if you try this). BTW, are you doing your testing on Windows or Linux? Ok. I'll start with the dsm34 branch. But in order to test all the algo's I need the unified sourcecode. I will be developing on windows.
|
|
|
|
yellowduck2
|
|
July 17, 2014, 11:37:05 AM |
|
Can anyone send me a link to the latest NVminer sourcecode. (merged)
Seems like the funnel shift is missing in some of the 11 algorithms: (With a ROL compute 5.0 + devices will get a boost)
From Github (ccminer 1.2):
cuda_x11_luffa512.cu:
define TWEAK(a0,a1,a2,a3,j)\ a0 = (a0<<(j))|(a0>>(32-j));\ a1 = (a1<<(j))|(a1>>(32-j));\ a2 = (a2<<(j))|(a2>>(32-j));\ a3 = (a3<<(j))|(a3>>(32-j)); #define MIXWORD(a0,a4)\ a4 ^= a0;\ a0 = (a0<<2) | (a0>>(30));\ a0 ^= a4;\ a4 = (a4<<14) | (a4>>(18));\ a4 ^= a0;\ a0 = (a0<<10) | (a0>>(22));\ a0 ^= a4;\ a4 = (a4<<1) | (a4>>(31));
cuda_x11_cubehash512.cu: #define ROTATEUPWARDS7(a) (((a) << 7) | ((a) >> 25)) #define ROTATEUPWARDS11(a) (((a) << 11) | ((a) >> 21))
etc..
By rewriting these macros shift+shift+or cuda instructions can be replaced with a single rol (3 times faster) (Compute maxwell / 5.0+)
Hope u succeed in optimizing code The problem is that it may break definitively the compatibility with other versions 5.0 is the future. Need to get started and path the way for 800 series. By the time 800 series is out, there will be perfectly optimized 5.0
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
July 17, 2014, 11:54:27 AM |
|
I won't have the source available until after the weekend as I want to compile/test on Linux first and need to get the makefiles setup properly. However, if you want to start playing now pull down the latest from dsm34 and use this to test with. https://github.com/djm34/ccminerIf I remember correctly I have tried to compile compute5.0 under CUDA 6 and it broke something in the FRESH algo. Because of this I went back to compute 3/3.5 on Cuda 5.5 for the last release. So point being after trying your changes, please make sure to test every algo to make sure something else doesn't brake under Cuda 6 and compute 5 (if you try this). BTW, are you doing your testing on Windows or Linux? Ok. I'll start with the dsm34 branch. But in order to test all the algo's I need the unified sourcecode. I will be developing on windows. If possible, try to put changes in cuda_helper.h rather than breaking everybody else code...
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
polanskiman
|
|
July 17, 2014, 03:07:41 PM |
|
I've update to the Beta drivers 340.88 and the hash increased slightly. Now getting 7.1 Mh or so but still not the 7.8/7.9 Mh you are getting. I also tried with minep.it pool but as I was expecting it didn't change anything.
Any more ideas?
I have not followed the whole conversation, but I am wondering if one of you is running on risers and the other not? That's an interesting theory. I read somewhere that indeed it could affect the hash rate http://cryptomining-blog.com/1276-first-impressions-from-a-6-card-mining-rig-using-geforce-gtx-750-ti-gpus/I, on my side am using risers. Powered for that matter. The difference it rather significant: ~800/900 Kh I am also using ASUS, not Gigabyte. Perhaps that could also be the reason. Good observation. No risers of any kind for me. Well I don't think it's a riser issue. I've connected the 2 GPUs directly to the slots of the mobo and I get the same hashrate as with risers... back to square one.
|
|
|
|
Schleicher
|
|
July 17, 2014, 04:29:17 PM |
|
I am already bound to ancient versions of ccminer. Because I still have compute 2.1.... Such areas really can't be fixed / done with #ifdef and also kept a not-so-good version for older GPUs? But maybe I wish too much. Obviously everybody only improves the versions mostly for own needs.... Sure, that's possible. Example for rotate function in sha256: #if __CUDA_ARCH__<350 #define rrot(x, bits) ((x >> bits) | (x << (32 - bits))) #else #define rrot(x, bits) __funnelshift_r(x, x, bits) #endif
But usually there are other reasons for not supporting older cards
|
|
|
|
d33_man
Member
Offline
Activity: 65
Merit: 10
|
|
July 17, 2014, 08:44:11 PM |
|
Hey, I'm using ccminer 2.1 and keep getting the error "abnormal hashes, exiting with code 211!" when mining x11 on 4 750tis. Any advice as to what may be causing this? CCminer just shuts down after the error which occurs after a few hours mining. Thanks
|
|
|
|
Newwsr
|
|
July 17, 2014, 11:38:36 PM |
|
Hey, I'm using ccminer 2.1 and keep getting the error "abnormal hashes, exiting with code 211!" when mining x11 on 4 750tis. Any advice as to what may be causing this? CCminer just shuts down after the error which occurs after a few hours mining. Thanks Friend uses ccminer v.1.1 use it here I have not poblema not
|
|
|
|
Waldozaur12
Legendary
Offline
Activity: 1223
Merit: 1000
|
|
July 18, 2014, 01:01:42 AM |
|
any Virus&Trojans inside Cudaminer software ?
|
|
|
|
sp_
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
July 18, 2014, 01:05:49 AM Last edit: July 18, 2014, 01:25:03 AM by sp_ |
|
any Virus&Trojans inside Cudaminer software ?
Claymore made 100 000$ on the monero miner with 5% tip...
|
|
|
|
Poena
Newbie
Offline
Activity: 48
Merit: 0
|
|
July 18, 2014, 02:26:13 AM |
|
any Virus&Trojans inside Cudaminer software ?
Claymore made 100 000$ on the monero miner with 5% tip... Hidden tip or a legit fee for using his apps?
|
|
|
|
tarzanbigcity
|
|
July 18, 2014, 03:47:47 AM |
|
Hey guys, perhaps we can rally some people here for support for PIMP support on the nvidia platform. You can donate to the dev here. http://www.getpimp.org/features.html3) develop and release pimp nvidia 17FEj7UEwH32PadCtWnyAS5hGYv7f99Lki Jun 14th 0.23 2.00 1.77 11% They are currently only 11% funded. But with our support perhaps we can complete that goal. I for one love the pimp platform and I think bringing it to nvidia would be a godsend.
|
|
|
|
|