Bitcoin Forum
November 06, 2024, 12:11:50 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 [8] 9 10 11 12 13 14 15 16 17 18 19 20 »  All
  Print  
Author Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels  (Read 61242 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 08, 2015, 07:36:35 AM
 #141

And you have the first/last round optimizations so it must be faster!
If it's as fast as the asm version, then I don't have to deal with the kernel parameters, which is boring/painful. My asm was only needed to encourage you to shrink the code/regs. Cheesy

Can you share the new source?
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 08, 2015, 09:28:52 AM
 #142

And you have the first/last round optimizations so it must be faster!
If it's as fast as the asm version, then I don't have to deal with the kernel parameters, which is boring/painful. My asm was only needed to encourage you to shrink the code/regs. Cheesy

Can you share the new source?

Unfortunately it's only about 25% faster, but we should compare apples to apples: could you try your code on hawaii chipset so we have a constant testbed?
Now I'm working on further first round optimizations, they bring little improvement but it's still worth imho.

realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 08, 2015, 09:59:22 AM
 #143

25% seems like only that loss coming back which is lost with the 14.9. I really thought you had it 3.5x faster.

Are you sure that it only uses 123VGPRS AND code size is 28KB only? Or does it started to use Scratch regs (those are terribly slow)?

Unfortunatelly I can't try on anything else than HD7770. But I'd also like to see how it runs on faster systems. I uploaded it onto my blog in the download area if someone wish to try it. I'm not familiar with the latest GCN chips (I think AMD only improve their instruction from time to time, and maybe cut down double precision performance), but with this particular program, I'm pretty sure that it will bring the 3.48x speedup on the R9 290x too. Because all the CUs can work alone using LDS and L1 cache and ICache on their own, that's why. So if current ocl code on the R9 290x runs at 20MH/s then the latest asm code should be run at 70MH/s.
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 08, 2015, 10:03:12 AM
 #144

25% seems like only that loss coming back which is lost with the 14.9. I really thought you had it 3.5x faster.

Are you sure that it only uses 123VGPRS AND code size is 28KB only? Or does it started to use Scratch regs (those are terribly slow)?

Unfortunatelly I can't try on anything else than HD7770. But I'd also like to see how it runs on faster systems. I uploaded it onto my blog in the download area if someone wish to try it. I'm not familiar with the latest GCN chips (I think AMD only improve their instruction from time to time, and maybe cut down double precision performance), but with this particular program, I'm pretty sure that it will bring the 3.48x speedup on the R9 290x too. Because all the CUs can work alone using LDS and L1 cache and ICache on their own, that's why. So if current ocl code on the R9 290x runs at 20MH/s then the latest asm code should be run at 70MH/s.

25% compared to 14.6, it's 43% compared to 14.9.
No scratch reg use (when I triggered it a couple times, it slowed down to less than 1 Mh/s).
I'd like to try your asm code myself, but I'd need the linux version of the assembler.

realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 08, 2015, 12:55:13 PM
 #145

Now I managed to build sgminer5.1 on my sys. I still have to make my kernel to work with it.

Does sgminer has an offline 'diagnostic' mode, just for testing the kernel if it runs and how fast it runs?

"I'd need the linux version of the assembler."
Sorry, it's impossible. It's not even written in Cpp just to be able to compile on any other system, than win.

And to make things more complicated Cheesy You have to compile with it for every type of gcn cards multiplied by every Catalyst driver that was altered by AMD developers. My compiler only patches the binary into the .elf, the actual elf file is generated by the current Catalyst Driver of the currently selected gfx card.
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 08, 2015, 01:39:04 PM
 #146

Does sgminer has an offline 'diagnostic' mode, just for testing the kernel if it runs and how fast it runs?

There is a simple "benchmark" option:

--benchmark         Run sgminer in benchmark mode - produces no shares

realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 08, 2015, 03:43:38 PM
 #147

Unfortunately there is no --benchmark parameter. I checked in in the source code too, but nothing similar https://github.com/sgminer-dev/sgminer/blob/master/sgminer.c.
Is there a simple war to run it? Now I have a groestl wallet, but where can I get username from? What parameters should I use other than -k groestl and -d 1?
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 08, 2015, 03:57:32 PM
 #148

Unfortunately there is no --benchmark parameter. I checked in in the source code too, but nothing similar https://github.com/sgminer-dev/sgminer/blob/master/sgminer.c.
Is there a simple war to run it? Now I have a groestl wallet, but where can I get username from? What parameters should I use other than -k groestl and -d 1?

Probably they removed it, I'm using an older version.
I run it like this, for solo mine:

sgminer -k groestlcoin --difficulty-multiplier 0.0039062500 -o http://localhost:GROESTLCOIN_RPC_PORT -u YOURUSER -p YOURPASSWORD

Then you have to find and add your best intensity and worksize (my OS kernel works with 256 only).
username and password are set in groestlcoin.conf; the port you can easily find in their thread (or via netstat).

realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 08, 2015, 08:08:22 PM
 #149

Thanks for help. Now it runs, and I found that this command produces the best results:
sgminer -d 1 -k groestlcoin --difficulty-multiplier 0.0039062500 -o http://localhost:1441 -u u -p p --shaders 1280 --worksize 256 -g 1 --intensity 24

It produces (avg)2MH/s which is the half of the 4Mh/s I calculated earlier.
Does sgminer divides the Groestl-hash calculation number by 2? Although, It would be more reasonable.
Or something is really wrong, that It runs on half speed (exactly hald speed)?
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 08, 2015, 08:30:59 PM
 #150

Thanks for help. Now it runs, and I found that this command produces the best results:
sgminer -d 1 -k groestlcoin --difficulty-multiplier 0.0039062500 -o http://localhost:1441 -u u -p p --shaders 1280 --worksize 256 -g 1 --intensity 24

It produces (avg)2MH/s which is the half of the 4Mh/s I calculated earlier.
Does sgminer divides the Groestl-hash calculation number by 2? Although, It would be more reasonable.
Or something is really wrong, that It runs on half speed (exactly hald speed)?
Are you sure it is running your kernel?.  Look in your sgminer dir, for a .bin file generated by OCL it may be running default groestlcoin OCL.  delete .bin and replace with your own of same name generated, it will not be regenerated it it exists in dir.  you must delete .bin whenever you change configs to force OCL recompile ... but you don't want that, u want to run your asm kernel ... so will have to figure out the parameter passing from sgminer ...
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 08, 2015, 08:41:59 PM
 #151

Thanks for help. Now it runs, and I found that this command produces the best results:
sgminer -d 1 -k groestlcoin --difficulty-multiplier 0.0039062500 -o http://localhost:1441 -u u -p p --shaders 1280 --worksize 256 -g 1 --intensity 24

It produces (avg)2MH/s which is the half of the 4Mh/s I calculated earlier.
Does sgminer divides the Groestl-hash calculation number by 2? Although, It would be more reasonable.
Or something is really wrong, that It runs on half speed (exactly hald speed)?

Intensity 24 is too much, I'd stay between 20 and 22, otherwise you'll produce a lot of rejected shares (or orphans if solo mining).
The shaders option is ignored for groestl.
The hashrate should be calculated on the full computation, i.e. 2 chained hashes.
What kernel are you using?

realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 08, 2015, 09:48:35 PM
 #152

I'm using your kernel: groestlcoin.cl.

Now I disassembled a dummy kernel with the appropriate parameters and I forgot about the T buffers. OpenCL uploads them in an extra buffer automatically. I don't even wanna know how the driver send that extra buffer and most importantly can't make an automatic skeleton kernel to get the binary with a placeholder for constant data that my program can patch with the output of the assembler.

So the easiest way would be to modify sgminer to handle my kernel. I have found the the 'queue_sph_kernel()' function where I can start from.
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 08, 2015, 09:57:06 PM
 #153

I'm using your kernel: groestlcoin.cl.

Now I disassembled a dummy kernel with the appropriate parameters and I forgot about the T buffers. OpenCL uploads them in an extra buffer automatically. I don't even wanna know how the driver send that extra buffer and most importantly can't make an automatic skeleton kernel to get the binary with a placeholder for constant data that my program can patch with the output of the assembler.

So the easiest way would be to modify sgminer to handle my kernel. I have found the the 'queue_sph_kernel()' function where I can start from.

I never tested my kernel with cards smaller than tahiti, I also have no reports of it running on <= pitcairn: other groestlcoin kernels might be faster in that case.

pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 09, 2015, 03:55:25 PM
 #154

I see there is very little interest in mining groestl coins with GPU: very few users joined the recent discussion (2/3).
Let alone contributing to the code (2) or donating (2), in the whole life of this thread.

utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 09, 2015, 04:08:09 PM
 #155

I see there is very little interest in mining groestl coins with GPU: very few users joined the recent discussion (2/3).
Let alone contributing to the code (2) or donating (2), in the whole life of this thread.
Well I still prefer GPU mining while block rwd 1.0 and will see what happens to diff when Rwd drops to 0.1 ... So count me in on new kernel, I donated a bit last time u did new kernel and will donate again for new super-super asm kernel Smiley
I expect diff will drop remarkably when Rwd drops and solo mining might still be attractive even aftre ...
I have 1 280x solo mining DMD (Pallas Diamond) approx 18.6 MHs (2-4 coins per day)
and 7950 solo mining FTC (neoscrypt) 278 KHs (would be sweet if these opt'z could be applied to Neoscrypt also ... wolf0 where are u?)

@realhet
Would be great if you could add a kernel setting parameter (perhaps realhet) that selects using your kernel and supply a windows x64 build of your sgminer ... I'd donate for that Smiley
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 10, 2015, 01:41:26 AM
 #156

Well, I found it better not to alter sgminer, that I'm totally unfamiliar with it and rather started to turn my kernel to be exactly the same as groestlcoin.cl from the outside. It will be a half page of additional code that deals with the kernel parameters. With a small dummy kernel it is already working now, but I'm just too tired to continue now. Cheesy
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 10, 2015, 03:48:37 AM
 #157

Well, I found it better not to alter sgminer, that I'm totally unfamiliar with it and rather started to turn my kernel to be exactly the same as groestlcoin.cl from the outside. It will be a half page of additional code that deals with the kernel parameters. With a small dummy kernel it is already working now, but I'm just too tired to continue now. Cheesy

well that's easier to use for the people.
waiting forward to seeing your progress! :-)

realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 11, 2015, 03:01:34 AM
 #158

Do I need a better proof than this? Grin
http://x.pgy.hu/~worm/het/my_first_grs.png
I'm the proud owner of my first 19 GRS coins, haha. I guess I was super lucky to get an 'accepted' right after 10 minutes of mining.

The speed increase in sgminer is the same that I measured in my 'workbench': From 2MH/s it raised to 7MH/s. (Or if we calculate in GroestlHash/s then it is 4MH/s -> 14MH/s.)

If anyone willing to help me testing this, please tell me! You'll need a Windows with cat14.9 and you also have to brave enough to run my IDE (HetPas.exe) on that system.

I can't wait to see your reports that how fast it is on the big cards. Cheesy
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 11, 2015, 03:24:35 AM
 #159

The compiled bin file should work regardless of catalyst version or operating system, so could you please post a link to the bin file? Thanks.
There is something weird about the sgminer screenshot, are you sure it's working correctly? It shows a single, disabled GPU with id 0, and the share got accepted was from GPU id 1. The diff numbers are also kinda weird.

qwep1
Hero Member
*****
Offline Offline

Activity: 610
Merit: 500


View Profile
January 11, 2015, 09:21:06 AM
 #160

I would also tested

              ▄▄██▄▄
          ▄▄██████████▄▄
      ▄▄██████████████████▄▄
  ▄▄██████████▀▀ ▀▀██████████▄▄
▄█████████▀▀          ▀▀█████████▄
██████▀▀        ▄▄        ▀▀██████
██████      ▄▄██████▄▄      ██████
██████    ██████████████    ██████
██████    ██████████████    ██████
██████    ██████████████    ██████
██████      ▀▀██████▀▀      ██████
██████          ▀▀        ▄▄██████
▀█████    ▄▄          ▄▄█████████▀
   ▀▀█    ████▄▄ ▄▄██████████▀▀
          ████████████████▀▀
          ▀▀██████████▀▀
              ▀▀██▀▀
P H O R E

     █
    █
   █
  █
   █
    █
   █
  █
 █
    KryptKoin rebranded to Phore   
     █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
PoS 3.0  -  Masternodes  -  Obfuscation


     █
    █
   █
  █
   █
    █
   █
  █
 █
.


            ▄▄██▄▄
        ▄▄██████████▄▄
    ▄▄████████▀▀████████▄▄
 ▄████████▀▀      ▀▀████████▄
▐█████▀▀              ▀▀█████▌
▐████       ▄▄██▄▄       ████▌
▐████    ▄██████████▄    ████▌
▐████    ████████████    ████▌
▐████    ▀██████████▀    ████▌
▐████       ▀▀██▀▀       ████▌
 ▀███                 ▄▄█████▌
    ▀    █▄▄      ▄▄████████▀
         █████▄▄████████▀▀
         ▀██████████▀▀
            ▀▀██▀▀
Pages: « 1 2 3 4 5 6 7 [8] 9 10 11 12 13 14 15 16 17 18 19 20 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!