psw (OP)
Newbie
Offline
Activity: 3
Merit: 0
|
|
December 13, 2013, 03:58:14 PM Last edit: December 13, 2013, 04:23:15 PM by psw |
|
Hello, I spent several days optimizing Scrypt OpenCL code. It was quite challenging, because my primary work and hobby is low-level optimization of code, especially the cryptographic one. SHA-256 is very familiar to me, in particular, I contributed in SHA-256 assembler optimization of OpenSSL code. Concerning the current Scrypt OpenCL code, it's quite perfect. My first implementation was 10 times slower than that! Anyway, finally I've achieved a little speed-up. I have tested it in few AMD GPUs under Windows and latest AMD drivers (13.11), and the results are as follows: HD 6770 and HD 7950 - 2-3% HD 7770 - no change R9 280x - haven't seen any changes in -g 2 mode, but there is again 2% in -g 1. Here is the new OpenCL code: http://www.crark.net/download/scrypt130511.zip Instructions: 0) Save your scrypt130511.cl file 1) Unzip and copy it (overwriting) to the cgminer folder. If your cgminer uses another filename (like scrypt130302.cl), rename file to this name. All cgminer 3.x versions should be supported. 2) Delete all *.bin files (like scrypt130511Tahitiglg2tc8192w256l4.bin). 3) Restart cgminer and enjoy. All your previous settings, like lookup-gap and thread-concurrency should not be changed. Please let me know of the results you were able to get, including your hardware and drivers version. If you like my work, please donate BTC or LTC. SY, Pavel Semjanov.
|
|
|
|
digicoin
Legendary
Offline
Activity: 1106
Merit: 1000
|
|
December 13, 2013, 04:10:56 PM |
|
Virus warning
|
|
|
|
psw (OP)
Newbie
Offline
Activity: 3
Merit: 0
|
|
December 13, 2013, 04:15:14 PM |
|
Virus? In text OpenCL file? Are you kidding?
|
|
|
|
3gghead
Newbie
Offline
Activity: 9
Merit: 0
|
|
December 15, 2013, 02:01:21 AM |
|
You can take the programmer out of C but you can't take C out of the programmer. Nice catch. Further improvements are possible though if you don't mind coding for specific GPUs with AMD-specific optimizations. Thanks for sharin' though.
|
|
|
|
emdje
|
|
December 21, 2013, 11:18:18 AM |
|
Code seems to speed up my 7950 with about 2%. However when quiting cgminer and restarting it, it tends to hang around the 12 kh/s and i have to restart the pc to get it working again.
On the topic of improvements: Has someone ever implemented the uint8 into the code? I'm not a coder but I read on the net that the scrypt would benefit from this...
Gr, Maarten
|
|
|
|
Eastwind
|
|
December 21, 2013, 12:01:40 PM |
|
There is an optimised Optimized scrypt kernel files for 7950/7970/7990/R9 280x https://litecointalk.org/index.php?topic=6058.0;topicseenIt can increase the speed. If combine your script with that one, can we increase the speed further? From your instruction, we have to delete the existing .bin files.
|
|
|
|
emdje
|
|
December 22, 2013, 05:35:46 PM Last edit: December 22, 2013, 05:57:39 PM by emdje |
|
I tried to optimize the code even further, but i have limited coding skills. Below is the part that I tried to make so that is can be executed parallel. But I get the error: line 469: error: expected an identifier for(uint k=0; k<8; k++); Somebody that has a clue to what goes wrong here? void SHA256_fixed(uint4*restrict state0,uint4*restrict state1) { uint4 S0 = *state0; uint4 S1 = *state1;
#define A S0.x #define B S0.y #define C S0.z #define D S0.w #define E S1.x #define F S1.y #define G S1.z #define H S1.w #define k 0
#pragma unroll for(uint k=0; k<8; k++); RND(A,B,C,D,E,F,G,H, fixedW[(8*k)+0]); #pragma unroll for(uint k=0; k<8; k++); RND(H,A,B,C,D,E,F,G, fixedW[(8*k)+1]);
#pragma unroll for(uint k=0; k<8; k++); RND(G,H,A,B,C,D,E,F, fixedW[(8*k)+2]);
#pragma unroll for(uint k=0; k<8; k++); RND(F,G,H,A,B,C,D,E, fixedW[(8*k)+3]);
#pragma unroll for(uint k=0; k<8; k++); RND(E,F,G,H,A,B,C,D, fixedW[(8*k)+4]);
#pragma unroll for(uint k=0; k<8; k++); RND(D,E,F,G,H,A,B,C, fixedW[(8*k)+5]);
#pragma unroll for(uint k=0; k<8; k++); RND(C,D,E,F,G,H,A,B, fixedW[(8*k)+6]);
#pragma unroll for(uint k=0; k<8; k++); RND(B,C,D,E,F,G,H,A, fixedW[(8*k)+7]); #undef A #undef B #undef C #undef D #undef E #undef F #undef G #undef H #undef k
*state0 += S0; *state1 += S1; }
|
|
|
|
pmconrad
|
|
December 22, 2013, 06:39:40 PM |
|
just a wild guess... remove the semicolon from the for... line?
|
|
|
|
emdje
|
|
December 22, 2013, 07:43:21 PM |
|
Semicolons at the end of the for statement do nothing for the current error. But you are right the don't belong there so I removed them thnx.
|
|
|
|
pmconrad
|
|
December 22, 2013, 11:14:03 PM |
|
Argh... dont #define k if you want to use it as a loop var. :-)
#define k 0
#pragma unroll for(uint k=0; k<8; k++);
|
|
|
|
datguyian
|
|
December 23, 2013, 06:32:50 AM |
|
Hello, I spent several days optimizing Scrypt OpenCL code. It was quite challenging, because my primary work and hobby is low-level optimization of code, especially the cryptographic one. SHA-256 is very familiar to me, in particular, I contributed in SHA-256 assembler optimization of OpenSSL code. Concerning the current Scrypt OpenCL code, it's quite perfect. My first implementation was 10 times slower than that! Anyway, finally I've achieved a little speed-up. I have tested it in few AMD GPUs under Windows and latest AMD drivers (13.11), and the results are as follows: HD 6770 and HD 7950 - 2-3% HD 7770 - no change R9 280x - haven't seen any changes in -g 2 mode, but there is again 2% in -g 1. Here is the new OpenCL code: http://www.crark.net/download/scrypt130511.zip Instructions: 0) Save your scrypt130511.cl file 1) Unzip and copy it (overwriting) to the cgminer folder. If your cgminer uses another filename (like scrypt130302.cl), rename file to this name. All cgminer 3.x versions should be supported. 2) Delete all *.bin files (like scrypt130511Tahitiglg2tc8192w256l4.bin). 3) Restart cgminer and enjoy. All your previous settings, like lookup-gap and thread-concurrency should not be changed. Please let me know of the results you were able to get, including your hardware and drivers version. If you like my work, please donate BTC or LTC. SY, Pavel Semjanov. The modifications seem to be working well for me. After some config tweaks, I went from 640 KH/s at best to 670 KH/s w/Sapphire Vapor-X 7950 w/boost. My efficiency and shares/min seem to have noticeably improved as well. Here's my cgminer config for anyone interested: "api-allow" : "W:127.0.0.1", "api-listen" : true, "expiry" : "3", "log" : "5", "queue" : "2", "scan-time" : "1", "scrypt" : true, "kernel" : "scrypt", "auto-fan" : true, "gpu-threads" : "1", "gpu-engine" : "1150", "gpu-memclock" : "1500", "intensity" : "19", "temp-target" : "70", "temp-overheat" : "85", "temp-cutoff" : "95", "temp-hysteresis" : "3", "gpu-powertune" : "20", "gpu-vddc" : "1.25", "worksize" : "256", "lookup-gap" : "2", "shaders" : "1792", "vectors" : "1", "thread-concurrency" : "21712" Still playing around with the gpu-vddc setting, but something about these GPUs don't seem to like anything less than 1.25 regardless of what I set my clocks at (I've set them way lower hoping to be able to set the voltage lower, but they always end up crashing). One day when I'm feeling more ambitious I may try to flash the GPU bios. I have also played around with the thread-concurrency quite a bit, but 21712 seems to be the sweet spot. To the OP: Unfortunately I don't have any LTC or BTC atm - I had to sell most of them a couple of months ago to cover other expenses and just recently started mining again. What little funds I have right now are invested in other coins. Happen to have a TAG, WDC or NXT address? I would be happy to send a few over to you if so.
|
|
|
|
suchcoins
Newbie
Offline
Activity: 19
Merit: 0
|
|
January 02, 2014, 08:54:35 AM Last edit: January 16, 2014, 06:40:12 PM by suchcoins |
|
Please let me know of the results you were able to get, including your hardware and drivers version.
This definitely worked on my 7950s under BAMT/Ubuntu with catalyst 12.6 (8.98) But strangely it did not affect my 7970 in the slightest. Here are the previous attempts to tweak scrypt.cl https://litecointalk.org/index.php/topic,4082.0.htmlhttps://litecointalk.org/index.php/topic,6020.0.htmlwhich didn't work for me at all - but yours did Note how they had to make a different version for 13.4+ and pre-13.4 The 7950s went from 610-611 to 620-625 so roughly a 2% performance increase. On rare occasion I've noticed the rate bounce down to 610 but climb back up. Watching the longer term WU to see what happens. I'm curious why the 7970 is not affected at all, I only have a single thread running. If you'd like some dogecoin as a thank you, let me know your address.
|
|
|
|
noedelx
|
|
January 02, 2014, 09:51:39 AM |
|
il give it a shot on my 6970
|
|
|
|
Davidbc
|
|
January 02, 2014, 10:01:49 AM |
|
Thanks, gave me a 2.5% increase in khash/s with a 7950.
|
|
|
|
jomay
|
|
January 04, 2014, 12:47:32 PM |
|
I actually lost 20% hashing performance on my 7950. But I'm on an old version of drivers (13.4) and use settings on my 7950 that do not max MHz on mem and gpu (=>lower power). I'd guess it's the driver.
|
BTC 1NoV8NFSB7eiuK2aABFtBTdUdXhbEdG7Ss LTC LaFyWSfzKY7CKwwmbxhyf8S2iJvfT7JFtL YAC YKKwR5B64Z9ww971J42vEGVPaema623Tz6
|
|
|
kopam
|
|
January 16, 2014, 09:26:58 PM |
|
I just found this thread. I have no idea how to optimize like that so i would like to ask for help I am waiting for a lot of 7950 to arrive very soon so 2-3% difference will be a lot. If anyone manages to help me i will make a donation !
|
|
|
|
demonserbia
Newbie
Offline
Activity: 28
Merit: 0
|
|
January 16, 2014, 09:37:37 PM |
|
Just tried on my 4x7950 rig adn can confirm it realy make some sweet spot to gpu-s so they make more hashpower.
|
|
|
|
emdje
|
|
January 19, 2014, 06:45:24 PM |
|
Hi Kopam, I can sent you the cl file that I use for my 7950, which fluctuates between 680 and 710 kh/s (usually lower because it is in the computer that I use). I have a file with the settings in it, or the file without the setting where the settings are in a .conf file.
Just PM me your email adress.
Greetings Maarten
|
|
|
|
BitcoinEXpress
Legendary
Offline
Activity: 1210
Merit: 1024
|
|
January 19, 2014, 07:05:29 PM |
|
Virus warning
Retard warning~BCX~
|
|
|
|
suchcoins
Newbie
Offline
Activity: 19
Merit: 0
|
|
January 28, 2014, 02:02:47 PM |
|
Something I've noticed after testing this on and off for quite awhile now.
While it definitely increases raw kh/s and makes the gpu work harder (based on gpu temps at same clock rate)
- the reality is, it does nothing for WU
Long term WU does not change compared to without these tweaks.
I am not knowledgeable enough to explain why but I can pretty much report this as fact.
So if it makes you feel better to see higher kh/s and use more power and run gpu hotter, this helps.
But for actual real-world improvement, this does not seem to do much.
If someone has proof otherwise, please share your WU improvements?
|
|
|
|
|