sp_ (OP)
Legendary
Offline
Activity: 2926
Merit: 1087
Team Black developer
|
 |
January 15, 2015, 07:47:07 PM Last edit: January 16, 2015, 05:34:37 AM by sp_ |
|
Checked in groestl speedup.
Faster groestl part 1. quark+150KHASH, x11 +60KHASH (750ti)
I managed to shrink the method "to_bitslice_quad" from around 800 asm instructions to around 80.
and "from_bitslice_quad" from around 400 instructions to around 200 instructions. With some more work, I will shrink this to 80 as well.
Instead of calculating one bit at a time I use the whole register in the cpu. Similar to a chunky2planar convertion.
|
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2926
Merit: 1087
Team Black developer
|
 |
January 15, 2015, 08:55:30 PM |
|
I will be taking a break now and work on the private spreadcoin miner. If you want more hash, please donate some BTC  thanks.
|
|
|
|
tbearhere
Legendary
Offline
Activity: 3276
Merit: 1003
|
 |
January 15, 2015, 09:54:49 PM |
|
Checked in groestl speedup.
Faster groestl part 1. quark+250KHASH, x11 +50KHASH (750ti)
I managed to shrink the method "to_bitslice_quad" from around 800 asm instructions to around 80.
and "from_bitslice_quad" from around 400 instructions to around 200 instructions. With some more work, I will shrink this to 80 as well.
Instead of calculating one bit at a time I use the whole register in the cpu. Similar to a chunky2planar convertion.
Good one sp..getting about 100+ to 150+ on quark...don't have time yet to check other algo's. 
|
|
|
|
flipclip
Member

Offline
Activity: 111
Merit: 10
|
 |
January 16, 2015, 01:04:17 AM Last edit: January 16, 2015, 03:45:55 AM by flipclip |
|
Quark: v28= ~11,160 kh/s v30= ~11,441 kh/s
x11: v28= ~5,730 kh/s v30= ~5,850 kh/s
lyra2: v28= ~1,350 kh/s v30= ~1,350 kh/s
(2 750Ti's, no overclock)
|
|
|
|
chrysophylax
Legendary
Offline
Activity: 3080
Merit: 1093
--- ChainWorks Industries ---
|
 |
January 16, 2015, 02:03:39 AM |
|
I will be taking a break now and work on the private spreadcoin miner. If you want more hash, please donate some BTC  thanks. tanx sp ... im pulling the latest from git - compiling - and then as of tomorrow ( adelaide australia time ) ill mine your address as donation for 48 hours with the upgraded miners on ccminer ... any algo you want to be mining with? on yaamp? ... btw - how do can i be included in the private project for spreadcoin - even if its just for testing? or at all? ... i would really like to see how it runs so far ... tanx ... #crysx
|
|
|
|
flipclip
Member

Offline
Activity: 111
Merit: 10
|
 |
January 16, 2015, 02:26:34 AM |
|
... ill mine your address as donation for 48 hours with the upgraded miners on ccminer ...
any algo you want to be mining with? on yaamp? ...
#crysx
I'd hazard a guess, the most profitable 
|
|
|
|
tbearhere
Legendary
Offline
Activity: 3276
Merit: 1003
|
 |
January 16, 2015, 02:38:29 AM Last edit: January 16, 2015, 10:31:19 AM by tbearhere |
|
Looks like pool rejects are higher making #30 less efficient then #29. quarkwrong its good 
|
|
|
|
jpouza
Legendary
Offline
Activity: 3080
Merit: 1131
|
 |
January 16, 2015, 03:40:10 AM |
|
I will be taking a break now and work on the private spreadcoin miner. If you want more hash, please donate some BTC  thanks. Nice, I've sent you a PM. Cheers
|
|
|
|
flipclip
Member

Offline
Activity: 111
Merit: 10
|
 |
January 16, 2015, 03:40:48 AM |
|
Looks like pool rejects are higher making #30 less efficient then #29. quark
how many rejects and which pool? I was on yaamp for 1.5 hours, with one reject ("reject reason: Job not found" right after a block change), which seemed fine to me.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2926
Merit: 1087
Team Black developer
|
 |
January 16, 2015, 05:55:46 AM Last edit: January 16, 2015, 06:18:50 AM by sp_ |
|
Quark: v28= ~11,160 kh/s v30= ~11,441 kh/s x11: v28= ~5,730 kh/s v30= ~5,850 kh/s lyra2: v28= ~1,350 kh/s v30= ~1,350 kh/s (2 750Ti's, no overclock)
Try groestl or diamondgroestl.  You can see on the commit on github that this is groestl speedup part 1. I have part 2 soon ready for checkin, but it is currently mixing the bits, and produce wrong results. Lyra is not using the bitslice groestl (killer groestl) so my improvements will not have an effect. But I guess if I swap the implementation, Lyra2 will get a boost as well. Did you try that DJM34? About the spreadcoinminer: I will integrate and optimize TSIV's spreadcoin implementation into the latest fork of ccminer. I will send out beta versions for a fee of 0.1 BTC. The beta will be a windows executable. I might send out more than one exe in the testing phase, if I manage to optimize more. I will publish the sourcecode after one month of betatesting. (When I publish the sourcecode, the spreadcoin will spread bether, and secure the coin) The current speed will be announced when the exefile is released. Hopefully next weekend 25-26 january. The estimated speed increase is 30-40% on the 980 cards If you want a early seat, you can start donating to my BTC or DRK adress in my signature.The current speed will be announced when the exefile is released. Hopefully next weekend 25-26 january.
|
|
|
|
tbearhere
Legendary
Offline
Activity: 3276
Merit: 1003
|
 |
January 16, 2015, 10:32:14 AM |
|
Looks like pool rejects are higher making #30 less efficient then #29. quark
how many rejects and which pool? I was on yaamp for 1.5 hours, with one reject ("reject reason: Job not found" right after a block change), which seemed fine to me. wrong its good ....great  sorry
|
|
|
|
chrysophylax
Legendary
Offline
Activity: 3080
Merit: 1093
--- ChainWorks Industries ---
|
 |
January 16, 2015, 11:10:07 AM |
|
Quark: v28= ~11,160 kh/s v30= ~11,441 kh/s x11: v28= ~5,730 kh/s v30= ~5,850 kh/s lyra2: v28= ~1,350 kh/s v30= ~1,350 kh/s (2 750Ti's, no overclock)
Try groestl or diamondgroestl.  You can see on the commit on github that this is groestl speedup part 1. I have part 2 soon ready for checkin, but it is currently mixing the bits, and produce wrong results. Lyra is not using the bitslice groestl (killer groestl) so my improvements will not have an effect. But I guess if I swap the implementation, Lyra2 will get a boost as well. Did you try that DJM34? About the spreadcoinminer: I will integrate and optimize TSIV's spreadcoin implementation into the latest fork of ccminer. I will send out beta versions for a fee of 0.1 BTC. The beta will be a windows executable. I might send out more than one exe in the testing phase, if I manage to optimize more. I will publish the sourcecode after one month of betatesting. (When I publish the sourcecode, the spreadcoin will spread bether, and secure the coin) The current speed will be announced when the exefile is released. Hopefully next weekend 25-26 january. The estimated speed increase is 30-40% on the 980 cards If you want a early seat, you can start donating to my BTC or DRK adress in my signature.The current speed will be announced when the exefile is released. Hopefully next weekend 25-26 january. any possibility of a linux x64 version for beta as well sp? ... ill donate ( and i assume most will ) for the cause of improving the spreadcoin miner ... but a windows version is useless to me ... anyway around that - or are you just keeping for the windows based systems? ... it just simply means that 30 days of no testing on my end ... #crysx
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2926
Merit: 1087
Team Black developer
|
 |
January 16, 2015, 12:30:56 PM |
|
The new gtx 960 might fail on the default settings because the intensity is set to high for compute 5.2 devices. How much memory are they planning to ship with the new cards?
The fix will be to set the intensity manually with the -i parameter. f.eks -i 19
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
 |
January 16, 2015, 01:28:40 PM |
|
Quark: v28= ~11,160 kh/s v30= ~11,441 kh/s x11: v28= ~5,730 kh/s v30= ~5,850 kh/s lyra2: v28= ~1,350 kh/s v30= ~1,350 kh/s (2 750Ti's, no overclock)
Try groestl or diamondgroestl.  You can see on the commit on github that this is groestl speedup part 1. I have part 2 soon ready for checkin, but it is currently mixing the bits, and produce wrong results. Lyra is not using the bitslice groestl (killer groestl) so my improvements will not have an effect. But I guess if I swap the implementation, Lyra2 will get a boost as well. Did you try that DJM34? I tried a bit (I was on a rather tight schedule) but without success, the problem the quad implementation was written for groestl512 and using it for groestl256 isn't really straight forward...
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
flipclip
Member

Offline
Activity: 111
Merit: 10
|
 |
January 16, 2015, 02:40:29 PM |
|
Quark: v28= ~11,160 kh/s v30= ~11,441 kh/s x11: v28= ~5,730 kh/s v30= ~5,850 kh/s lyra2: v28= ~1,350 kh/s v30= ~1,350 kh/s (2 750Ti's, no overclock)
Try groestl or diamondgroestl.  You can see on the commit on github that this is groestl speedup part 1. I have part 2 soon ready for checkin, but it is currently mixing the bits, and produce wrong results. Lyra is not using the bitslice groestl (killer groestl) so my improvements will not have an effect. But I guess if I swap the implementation, Lyra2 will get a boost as well. Did you try that DJM34? I realized Lyra2 wasn't part of the speed up, it just happened that the algo was profitable for like five minutes on yaamp, and I happen to be in front of my computer while it happened. Since I hadn't posted any Lyra2 rates in a while (ever?) thought I would do it, just in case someone was interested.
|
|
|
|
flipclip
Member

Offline
Activity: 111
Merit: 10
|
 |
January 16, 2015, 03:08:24 PM |
|
The new gtx 960 might fail on the default settings because the intensity is set to high for compute 5.2 devices. How much memory are they planning to ship with the new cards?
The fix will be to set the intensity manually with the -i parameter. f.eks -i 19
For people with mixed cards (or thinking about mixing in a 960) in their rigs, the -i parameter is not a per card setting, so they'll either need to run seperate ccminer instances or hope the same -i parameter works across cards  . Just something for people to think about.
|
|
|
|
flipclip
Member

Offline
Activity: 111
Merit: 10
|
 |
January 16, 2015, 03:20:26 PM |
|
Quark: v28= ~11,160 kH/s v30= ~11,441 kH/s
x11: v28= ~5,730 kH/s v30= ~5,850 kH/s
lyra2: v28= ~1,350 kH/s v30= ~1,350 kH/s
Mjollnir: v30=~21,000 MH/s
(2 750Ti's, no overclock)
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
 |
January 16, 2015, 03:21:02 PM |
|
anyone knows how, with visual studio, to get the "release" directory with less crap in it (I still need the ptx though...) ?
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
Bombadil
|
 |
January 16, 2015, 03:28:47 PM |
|
anyone knows how, with visual studio, to get the "release" directory with less crap in it (I still need the ptx though...) ?
Shift-delete the crap 
|
|
|
|
|