Bitcoin Forum
November 13, 2024, 08:26:57 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Minor Scrypt OpenCL optimization  (Read 11338 times)
psw (OP)
Newbie
*
Offline Offline

Activity: 3
Merit: 0


View Profile
December 13, 2013, 03:58:14 PM
Last edit: December 13, 2013, 04:23:15 PM by psw
 #1

Hello,

I spent several days optimizing Scrypt OpenCL code. It was quite challenging, because my primary work and hobby is low-level optimization of code, especially the cryptographic one. SHA-256 is very familiar to me, in particular, I contributed in SHA-256 assembler optimization of OpenSSL code.

Concerning the current Scrypt OpenCL code, it's quite perfect. My first implementation was 10 times slower than that! Anyway, finally I've achieved a little speed-up. I have tested it in few AMD GPUs under Windows and latest AMD drivers (13.11), and the results are as follows:
HD 6770 and HD 7950 - 2-3%
HD 7770 - no change
R9 280x - haven't seen any changes in -g 2 mode, but there is again 2% in -g 1.

Here is the new OpenCL code: http://www.crark.net/download/scrypt130511.zip

Instructions:
0) Save your scrypt130511.cl file
1) Unzip and copy it (overwriting) to the cgminer folder. If your cgminer uses another filename (like scrypt130302.cl), rename file to this name.
All cgminer 3.x versions should be supported.
2) Delete all *.bin files (like scrypt130511Tahitiglg2tc8192w256l4.bin).
3) Restart cgminer and enjoy. All your previous settings, like lookup-gap and thread-concurrency should not be changed.

Please let me know of the results you were able to get, including your hardware and drivers version.
 If you like my work, please donate BTC or LTC.

SY, Pavel Semjanov.
digicoin
Legendary
*
Offline Offline

Activity: 1106
Merit: 1000



View Profile
December 13, 2013, 04:10:56 PM
 #2

Virus warning
psw (OP)
Newbie
*
Offline Offline

Activity: 3
Merit: 0


View Profile
December 13, 2013, 04:15:14 PM
 #3

Virus? In text OpenCL file? Are you kidding?
3gghead
Newbie
*
Offline Offline

Activity: 9
Merit: 0



View Profile
December 15, 2013, 02:01:21 AM
 #4

You can take the programmer out of C but you can't take C out of the programmer.  Nice catch.  Further improvements are possible  though if you don't mind coding for specific GPUs with AMD-specific optimizations.  Thanks for sharin' though.
emdje
Hero Member
*****
Offline Offline

Activity: 686
Merit: 500


View Profile WWW
December 21, 2013, 11:18:18 AM
 #5

Code seems to speed up my 7950 with about 2%.
However when quiting cgminer and restarting it, it tends to hang around the 12 kh/s and i have to restart the pc to get it working again.

On the topic of improvements: Has someone ever implemented the uint8 into the code? I'm not a coder but I read on the net that the scrypt would benefit from this...

Gr, Maarten
Eastwind
Hero Member
*****
Offline Offline

Activity: 896
Merit: 1000



View Profile
December 21, 2013, 12:01:40 PM
 #6

There is an optimised Optimized scrypt kernel files for 7950/7970/7990/R9 280x
https://litecointalk.org/index.php?topic=6058.0;topicseen

It can increase the speed.

If combine your script with that one, can we increase the speed further?

From your instruction, we have to delete the existing .bin files.
emdje
Hero Member
*****
Offline Offline

Activity: 686
Merit: 500


View Profile WWW
December 22, 2013, 05:35:46 PM
Last edit: December 22, 2013, 05:57:39 PM by emdje
 #7

I tried to optimize the code even further, but i have limited coding skills.

Below is the part that I tried to make so that is can be executed parallel. But I get the error: line 469: error: expected an identifier
for(uint k=0; k<8; k++);

Somebody that has a clue to what goes wrong here?

Code:
void SHA256_fixed(uint4*restrict state0,uint4*restrict state1)
{
uint4 S0 = *state0;
uint4 S1 = *state1;

#define A S0.x
#define B S0.y
#define C S0.z
#define D S0.w
#define E S1.x
#define F S1.y
#define G S1.z
#define H S1.w
#define k 0

#pragma unroll
for(uint k=0; k<8; k++);
RND(A,B,C,D,E,F,G,H, fixedW[(8*k)+0]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(H,A,B,C,D,E,F,G, fixedW[(8*k)+1]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(G,H,A,B,C,D,E,F, fixedW[(8*k)+2]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(F,G,H,A,B,C,D,E, fixedW[(8*k)+3]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(E,F,G,H,A,B,C,D, fixedW[(8*k)+4]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(D,E,F,G,H,A,B,C, fixedW[(8*k)+5]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(C,D,E,F,G,H,A,B, fixedW[(8*k)+6]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(B,C,D,E,F,G,H,A, fixedW[(8*k)+7]);

#undef A
#undef B
#undef C
#undef D
#undef E
#undef F
#undef G
#undef H
#undef k

*state0 += S0;
*state1 += S1;
}
pmconrad
Full Member
***
Offline Offline

Activity: 149
Merit: 102


View Profile WWW
December 22, 2013, 06:39:40 PM
 #8

just a wild guess... remove the semicolon from the for... line?

emdje
Hero Member
*****
Offline Offline

Activity: 686
Merit: 500


View Profile WWW
December 22, 2013, 07:43:21 PM
 #9

Semicolons at the end of the for statement do nothing for the current error.
But you are right the don't belong there so I removed them thnx.
pmconrad
Full Member
***
Offline Offline

Activity: 149
Merit: 102


View Profile WWW
December 22, 2013, 11:14:03 PM
 #10

Argh... dont #define k if you want to use it as a loop var. :-)

#define k 0

#pragma unroll
   for(uint k=0; k<8; k++);

datguyian
Sr. Member
****
Offline Offline

Activity: 840
Merit: 251



View Profile
December 23, 2013, 06:32:50 AM
 #11

Hello,

I spent several days optimizing Scrypt OpenCL code. It was quite challenging, because my primary work and hobby is low-level optimization of code, especially the cryptographic one. SHA-256 is very familiar to me, in particular, I contributed in SHA-256 assembler optimization of OpenSSL code.

Concerning the current Scrypt OpenCL code, it's quite perfect. My first implementation was 10 times slower than that! Anyway, finally I've achieved a little speed-up. I have tested it in few AMD GPUs under Windows and latest AMD drivers (13.11), and the results are as follows:
HD 6770 and HD 7950 - 2-3%
HD 7770 - no change
R9 280x - haven't seen any changes in -g 2 mode, but there is again 2% in -g 1.

Here is the new OpenCL code: http://www.crark.net/download/scrypt130511.zip

Instructions:
0) Save your scrypt130511.cl file
1) Unzip and copy it (overwriting) to the cgminer folder. If your cgminer uses another filename (like scrypt130302.cl), rename file to this name.
All cgminer 3.x versions should be supported.
2) Delete all *.bin files (like scrypt130511Tahitiglg2tc8192w256l4.bin).
3) Restart cgminer and enjoy. All your previous settings, like lookup-gap and thread-concurrency should not be changed.

Please let me know of the results you were able to get, including your hardware and drivers version.
 If you like my work, please donate BTC or LTC.

SY, Pavel Semjanov.

The modifications seem to be working well for me. After some config tweaks, I went from 640 KH/s at best to 670 KH/s w/Sapphire Vapor-X 7950 w/boost. My efficiency and shares/min seem to have noticeably improved as well. Here's my cgminer config for anyone interested:

"api-allow" : "W:127.0.0.1",
"api-listen" : true,
"expiry" : "3",
"log" : "5",
"queue" : "2",
"scan-time" : "1",
"scrypt" : true,
"kernel" : "scrypt",
"auto-fan" : true,
"gpu-threads" : "1",
"gpu-engine" : "1150",
"gpu-memclock" : "1500",
"intensity" : "19",
"temp-target" : "70",
"temp-overheat" : "85",
"temp-cutoff" : "95",
"temp-hysteresis" : "3",
"gpu-powertune" : "20",

"gpu-vddc" : "1.25",
"worksize" : "256",
"lookup-gap" : "2",
"shaders" : "1792",
"vectors" : "1",
"thread-concurrency" : "21712"

Still playing around with the gpu-vddc setting, but something about these GPUs don't seem to like anything less than 1.25 regardless of what I set my clocks at (I've set them way lower hoping to be able to set the voltage lower, but they always end up crashing). One day when I'm feeling more ambitious I may try to flash the GPU bios. I have also played around with the thread-concurrency quite a bit, but 21712 seems to be the sweet spot.

To the OP: Unfortunately I don't have any LTC or BTC atm - I had to sell most of them a couple of months ago to cover other expenses and just recently started mining again. What little funds I have right now are invested in other coins. Happen to have a TAG, WDC or NXT address? I would be happy to send a few over to you if so.

            ▄▄▄▄▄▄▄▄
       ▄▄██████████████▄
     █████████████████████▄
   █████████████████████████
  ██████████▀▀       ▀▀██████▄
 █████████               █████
▐███████▌                 ▀███▌
████████                   ████
▐██████▌                   ▐██▌
 ███████                   ███
  ███████                 ███
   ▀██████▄             ▄██▀
     ▀███████▄▄▄▄▄▄▄▄████▀
        ▀▀███████████▀▀



 ▄▄▄             ▄▄▄           ▄▄▄   ▄▄▄▄▄         ▄▄▄         ▄▄▄▄▄▄       ▄▄▄                    ▄▄▄▄▄▄        ▄▄▄▄          ▄▄▄   ▄▄▄▄▄▄▄▄▄▄▄▄▄   
 ███             ███           ███   ███████▄      ███        ████████      ███                   ████████       ██████▄       ███   ███████████████▄
 ███             ███           ███   ███ ▀████     ███       ███▀  ▀███     ███                  ███▀  ▀███      ███ ▀███▄     ███   ███         ▀███
 ███             ███           ███   ███   ▀███▄   ███      ███▀    ▀███    ███                 ███▀    ▀███     ███   ████    ███   ███          ███
 ███             ███           ███   ███     ▀███▄ ███     ████▄▄▄▄▄▄████   ███                ████▄▄▄▄▄▄████    ███    ▀███▄  ███   ███          ███
 ████▄▄▄▄▄▄▄▄▄▄▄ ████▄▄▄▄▄▄▄▄▄▄███   ███       ███████    ███▀▀▀▀▀▀▀▀▀▀███  ████▄▄▄▄▄▄▄▄▄▄▄   ███▀▀▀▀▀▀▀▀▀▀███   ███      ▀███▄███   ███▄▄▄▄▄▄▄▄▄████
  ▀████████████▌  ▀█████████████▀    ███        ▀▀████   ███▀          ▀███  ▀█████████████  ███▀          ▀███  ███        ▀▀████   █████████████▀▀
                     ▄▄███████
                 ▄████████████
              ▄██████▀▀▀██████
       ▄▄   ▄███████     ████
   ▄▄███▀  ██████████▄▄▄████▀
 ▄████▀▀  █████████████████
         ████████████████▀
        ▀██████████████▀
          ▀█████████▀
     ▄█▀    ▀██▀▀   ▄▄
    ██  ▄█▀      ▄███▌
   █████▀        ███▀
   ▀▀▀          ███▀
                ▀     



  ▄█████████  ███       ██▄      ▄██         █████       ████▌   ▄██████████   
 ██▌          ███        ▀██▄  ▄██▀          ██▌███     ██▀██▌  ▐██           
 ███████████  ███          ▀████▀            ██▌ ███   ██▀ ██▌  ▐███████████   
 ██▌          ███▄          ▐██▌             ██▌  ███ ██▀  ██▌  ▐██           
 ██▌           ▀█████████   ▐██▌             ██▌   ▀███▀   ██▌   ▀██████████



 █████████████▌  ▄███████████▄         █████████████▌  ██▌      ▐██    ▄██████████         █████       █████    ▄██████████▄     ▄██████████▄   ▐████▄     ▐██ 
      ▐██       ▐██▀       ▀██▌             ▐██        ██▌      ▐██   ▐██                  ██▌███     ███▐██   ▐██▀      ▀██▌   ▐██▀      ▀██▌  ▐██▀███    ▐██ 
      ▐██       ▐██         ██▌             ▐██        ████████████   ▐███████████         ██▌ ███   ███ ▐██   ▐██        ██▌   ▐██        ██▌  ▐██  ▀██▄  ▐██ 
      ▐██       ▐██▄       ▄██▌             ▐██        ██▌      ▐██   ▐█▌                  ██▌  ███ ███  ▐██   ▐██▄      ▄██▌   ▐██▄      ▄██▌  ▐██    ▀██▄▐██ 
      ▐██        ▀███████████▀              ▐██        ██▌      ▐██    ▀██████████         ██▌   ▀███▀   ▐██    ▀██████████▀     ▀██████████▀   ▐██      ▀████ 
  (
BUY LLN
)Twitter
Facebook
Telegram
suchcoins
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile
January 02, 2014, 08:54:35 AM
Last edit: January 16, 2014, 06:40:12 PM by suchcoins
 #12

Please let me know of the results you were able to get, including your hardware and drivers version.

This definitely worked on my 7950s under BAMT/Ubuntu with catalyst 12.6 (8.98)

But strangely it did not affect my 7970 in the slightest.

Here are the previous attempts to tweak scrypt.cl

https://litecointalk.org/index.php/topic,4082.0.html

https://litecointalk.org/index.php/topic,6020.0.html

which didn't work for me at all - but yours did

Note how they had to make a different version for 13.4+ and pre-13.4

The 7950s went from 610-611  to  620-625   so roughly a 2% performance increase.
On rare occasion I've noticed the rate bounce down to 610 but climb back up.
Watching the longer term WU to see what happens.

I'm curious why the 7970 is not affected at all, I only have a single thread running.

If you'd like some dogecoin as a thank you, let me know your address.
noedelx
Full Member
***
Offline Offline

Activity: 140
Merit: 100


View Profile
January 02, 2014, 09:51:39 AM
 #13

il give it a shot on my 6970
Davidbc
Full Member
***
Offline Offline

Activity: 140
Merit: 100


View Profile
January 02, 2014, 10:01:49 AM
 #14

Thanks, gave me a 2.5% increase in khash/s with a 7950.
jomay
Full Member
***
Offline Offline

Activity: 167
Merit: 100


View Profile
January 04, 2014, 12:47:32 PM
 #15

I actually lost 20% hashing performance on my 7950.
But I'm on an old version of drivers (13.4) and use settings on my 7950 that do not max MHz on mem and gpu (=>lower power). I'd guess it's the driver.

BTC 1NoV8NFSB7eiuK2aABFtBTdUdXhbEdG7Ss
LTC LaFyWSfzKY7CKwwmbxhyf8S2iJvfT7JFtL YAC YKKwR5B64Z9ww971J42vEGVPaema623Tz6
kopam
Hero Member
*****
Offline Offline

Activity: 518
Merit: 500


View Profile
January 16, 2014, 09:26:58 PM
 #16

I just found this thread.
I have no idea how to optimize like that so i would like to ask for help Smiley
I am waiting for a lot of 7950 to arrive very soon so 2-3% difference will be a lot.
If anyone manages to help me i will make a donation !

                ▐▒▄
          ▄▄▌   ▐▒██▒▒▄
       ▄▒███▌     ▀▒███▒▒▄
   ░▒▒██▒▒██▌  ▄▒▄▄   ▀▒███▒▄
   ▐██▒  ▐██▒▒██████▒▄▄   ▐███
   ▐██▌  ▐███▒▀   ▀▀▒██▒   ███
   ▐██▌  ▐██▌       ███   ███
   ▐██▌  ▐██▌       ███   ███
   ▐██▌  ▐██▒▒▄   ▄▄▒███   ███
   ▐██▌   ▀▒███▒▒██▒▀▀   ▐███
   ▐▒███▒▄▄   ▀▒▒▀   ▄▒████▒
       ▀▒███▒▄      ████▒▀
          ▀▀▒███▒   █▀
              ▀▒█
.
BEXAM
███  █
    █
███  █
    █
███  █
    █
███  █
 
█  ███
█   
█  ███
█   
█  ███
█   
█  ███

 

 



                             ▄████▄
                       ▄▄█████▀▀███
                   ▄▄████▀▀    ███
             ▄▄▄████▀▀   ▄▄   ▐██
         ▄▄█████▀      ▄█▀   ██▌
    ▄▄████▀▀▀      ▄███▀     ██▌
   ████▀       ▄▄████▀      ▐██
    ██████▄▄  ▄█████▀        ██▌
         ▀████████          ▐██
           ▀████▌           ███
            ▀███  ▄██▄▄    ▐██▀
             ███▄███▀███▄   ███
             ▀███▀▀   ▀▀███▄██▌
                         ▀▀█▀▀




                     ▄▄▄██▄▄▄   ▄
    ██▄           ▄████████████▀
    █████▄▄      ▐█████████████▀
     █████████▄▄▄▄▐████████████▌
    █▄█████████████████████████▌
    ▀██████████████████████████
      ▀███████████████████████
      ▐██████████████████████
        ▀██████████████████▀
          ▄▄█████████████▀
    ▀████████████████▀▀
         ▀▀▀▀▀▀▀▀



   ▄██████████████████████████▄
   ████████████████████████████
   ████████████████▀▀▀▀▀▀██████
   ███████████████      ██████
   ██████████████▌   ▐█████████
   ████████████▀▀    ▀▀▀██████
   ████████████         ██████
   ██████████████▌   ▐█████████
   ██████████████▌   ▐█████████
   ██████████████▌   ▐█████████
   ██████████████▌   ▐█████████
   ████████████████████████████
    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
█  ███
█   
█  ███
█   
█  ███
█   
█  ███

███  █
    █
███  █
    █
███  █
    █
███  █
demonserbia
Newbie
*
Offline Offline

Activity: 28
Merit: 0


View Profile
January 16, 2014, 09:37:37 PM
 #17

Just tried on my 4x7950 rig adn can confirm it realy make some sweet spot to gpu-s so they make more hashpower.
emdje
Hero Member
*****
Offline Offline

Activity: 686
Merit: 500


View Profile WWW
January 19, 2014, 06:45:24 PM
 #18

Hi Kopam, I can sent you the cl file that I use for my 7950, which fluctuates between 680 and 710 kh/s (usually lower because it is in the computer that I use).
I have a file with the settings in it, or the file without the setting where the settings are in a .conf file.

Just PM me your email adress.

Greetings Maarten
BitcoinEXpress
Legendary
*
Offline Offline

Activity: 1210
Merit: 1024



View Profile
January 19, 2014, 07:05:29 PM
 #19

Virus warning


Retard warning





~BCX~
suchcoins
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile
January 28, 2014, 02:02:47 PM
 #20

Something I've noticed after testing this on and off for quite awhile now.

While it definitely increases raw kh/s and makes the gpu work harder (based on gpu temps at same clock rate)

- the reality is, it does nothing for WU

Long term WU does not change compared to without these tweaks.

I am not knowledgeable enough to explain why but I can pretty much report this as fact.

So if it makes you feel better to see higher kh/s and use more power and run gpu hotter, this helps.

But for actual real-world improvement, this does not seem to do much.

If someone has proof otherwise, please share your WU improvements?


Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!