Bitcoin Forum
April 20, 2024, 01:34:06 AM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 [12] 13 14 15 16 17 18 19 20 »  All
  Print  
Author Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels  (Read 61211 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 13, 2015, 10:41:51 AM
 #221

I am at stock core 1070 and mem 1100. this is the difference maybe.
Not to be condescending but have u tried on sgminer command line
--gpu-clock 1100 --mem-clock 150
1713576846
Hero Member
*
Offline Offline

Posts: 1713576846

View Profile Personal Message (Offline)

Ignore
1713576846
Reply with quote  #2

1713576846
Report to moderator
There are several different types of Bitcoin clients. The most secure are full nodes like Bitcoin Core, but full nodes are more resource-heavy, and they must do a lengthy initial syncing process. As a result, lightweight clients with somewhat less security are commonly used.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
physixz
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
January 13, 2015, 10:45:29 AM
 #222

Well the best i can run at without crashing is at 1040 core / 1250 memory as it wont go lower with a -0.055 core volt drop

1 card the rig pulls 510W at 28.5MH/s so 0.056MH per watt
2 cards the rig pulls 740W at 57MH/s so 0.077MH per watt
3 cards the rig pull 990W at 85.5MH/s so 0.086MH per watt

which is about 230 - 250W per card with 0.114MH per watt excluding the system use

if anybody can get a higher hash per watt then let me know

EDIT

Even at that rate with my electricity costs i cant make a profit...
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 13, 2015, 10:45:34 AM
 #223

Wow I got Best share: 702K
Was it a block?Huh :-D

Now let's get serious: I finally have a little time to write some considerations on the ocl and asm kernels.
I believe we should pursue the asm path for a number or reasons:

- currently the OCL kernel is a little faster on hawaii but not on all other cards and I don't think it can be improved in this respect
- the OCL kernel has been tweaked and optimized for months, while the asm one is new so there is probably much more room for improvement
- just by applying the first and last round optimization the asm kernel will probably be faster on hawaii as well; I'm sure that Realhet will find other asm tricks to apply
- with all these catalyst version problems, the best way to share kernels for the people to mine is by bin files, making the asm version and ocl equivalent (for distribution purposes); better yet would be a miner with all the bundled bin files (takes time)
- asm is cooler than ocl ;-)

what do you guys think?
I'm all for sticking with asm route ... u need to feed your ocl tweaks to realhet and lets maximize asm kernel.
As I already suggested to realhet "cross-compile" to generate bins for all arch we support is possible, he needs our bins created on each arch to dig out minor diffs between bins.
mitache365
Hero Member
*****
Offline Offline

Activity: 731
Merit: 500


View Profile
January 13, 2015, 02:59:31 PM
 #224

Can I ask what software you are using to change the values as I'm using msi after burner 4.1 but it wont change the memory clock and the core clock is always lower than what I set?

These changes are really pushing my cards now as they normally sat at 50'C but are now over 60'C (They are watercooled). I will put up power usage when i next reboot and plug the power meter in.

also afterburner but 14.4 driver for me

BTC
mitache365
Hero Member
*****
Offline Offline

Activity: 731
Merit: 500


View Profile
January 13, 2015, 03:00:28 PM
 #225

I am at stock core 1070 and mem 1100. this is the difference maybe.
Not to be condescending but have u tried on sgminer command line
--gpu-clock 1100 --mem-clock 150

I will stay at core 1070(dont like to overclock) but will set mem at 150 to see the result.

edit. hm again 23.4 but lower temps. thats fine enough I think.

BTC
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 13, 2015, 03:04:19 PM
 #226

I am at stock core 1070 and mem 1100. this is the difference maybe.
Not to be condescending but have u tried on sgminer command line
--gpu-clock 1100 --mem-clock 150

I will stay at core 1070(dont like to overclock) but will set mem at 150 to see the result.

edit. hm again 23.4 but lower temps. thats fine enough I think.

lower mem clock = less power usage and bigger core overclock potential.

mitache365
Hero Member
*****
Offline Offline

Activity: 731
Merit: 500


View Profile
January 13, 2015, 03:33:17 PM
 #227

hmm now I see miner still showing the stock memlock 1550 for 280x vapor. why its not changed? driver 14.4
I changed it from the batch file also from the miner later.

BTC
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 13, 2015, 06:03:39 PM
Last edit: January 13, 2015, 06:16:26 PM by utahjohn
 #228

hmm now I see miner still showing the stock memlock 1550 for 280x vapor. why its not changed? driver 14.4
I changed it from the batch file also from the miner later.
Why can't ppl read the thread ... u will have best performance with driver 14.7RC3 ...

Many 280x are locked to small range of adjustment on clocks (PowerColor 280x being one of them, that's why I had to low-level vbios mod mine.  Also many 280x will throttle gpu-clock at temps above 72C)
mitache365
Hero Member
*****
Offline Offline

Activity: 731
Merit: 500


View Profile
January 13, 2015, 06:19:24 PM
 #229

hmm now I see miner still showing the stock memlock 1550 for 280x vapor. why its not changed? driver 14.4
I changed it from the batch file also from the miner later.
Why can't ppl read the thread ... u will have best performance with driver 14.7RC3 ...

Many 280x are locked to small range of adjustment on clocks (PowerColor 280x being one of them, that's why I had to low-level vbios mod mine.  Also many 280x will throttle gpu-clock at temps above 72C)

thread readed. tryed 14.4 14.6 14.7 14.9
can't put the cards at lower memlock than the stock 1550 Sad maybe really locked!

BTC
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 13, 2015, 06:22:54 PM
Last edit: January 13, 2015, 06:43:59 PM by utahjohn
 #230

Trust me 14.7RC3 is best.
Then u are unlucky enough to have a "Locked" card.
Only recourse for u is vbios modding your card if u want to lower memclock to 150 and be able to do higher overclock on gpu.
Do the research on vbios modding ... there are pointers in this thread by myself, I hate repeating my self a hundred times that's why the info is in thread.
https://bitcointalk.org/index.php?topic=779598.msg9043545#msg9043545

BTW I can still clock mem at 1625 via sgminer setting when I mine X11 or Neoscrypt with it ... just have to set it manual for them ...

Welcome to Extreme Diamond Mining LOL
mitache365
Hero Member
*****
Offline Offline

Activity: 731
Merit: 500


View Profile
January 13, 2015, 07:37:37 PM
 #231

Trust me 14.7RC3 is best.
Then u are unlucky enough to have a "Locked" card.
Only recourse for u is vbios modding your card if u want to lower memclock to 150 and be able to do higher overclock on gpu.
Do the research on vbios modding ... there are pointers in this thread by myself, I hate repeating my self a hundred times that's why the info is in thread.
https://bitcointalk.org/index.php?topic=779598.msg9043545#msg9043545

BTW I can still clock mem at 1625 via sgminer setting when I mine X11 or Neoscrypt with it ... just have to set it manual for them ...

Welcome to Extreme Diamond Mining LOL

I have 3 different 280x cards(dual,vapor,toxic). Strange to see all are locked. The same thing is that all are sapphire. Will try this vbios modding theese days.
Thank you for the support. I will leave the rigs for now at 22.3mh(1020/1500) /23.4mh(1070/1550) /24mh(1100/1600). Little tired last 2 days trying to config everything Smiley) maybe I am wrong somewhere.

BTC
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 13, 2015, 09:02:58 PM
 #232

For those of you with 290/290x cards that are locked:
http://www.overclock.net/t/1443242/the-r9-290-290x-unlock-thread

That thread is about transforming a 290 into 290x by means of firmware flashing.
But some are hardware locked, like mine... :-(

utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 13, 2015, 09:10:54 PM
 #233

For those of you with 290/290x cards that are locked:
http://www.overclock.net/t/1443242/the-r9-290-290x-unlock-thread

That thread is about transforming a 290 into 290x by means of firmware flashing.
But some are hardware locked, like mine... :-(
Still may be useful for someone ...7% gain from 290 to 290x ...  Smiley
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 13, 2015, 09:17:19 PM
 #234

For those of you with 290/290x cards that are locked:
http://www.overclock.net/t/1443242/the-r9-290-290x-unlock-thread

That thread is about transforming a 290 into 290x by means of firmware flashing.
But some are hardware locked, like mine... :-(
Still may be useful for someone ...7% gain from 290 to 290x ...  Smiley

I have 30 on 290 and 33 on 290x, same clock, so it's 10% :-)

utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 14, 2015, 07:07:58 PM
Last edit: January 14, 2015, 07:28:04 PM by utahjohn
 #235

Quote
* "Guys! We do not need more optimization!"
I've thought about this too. But I think if everyone use better kernels, then everyone will use the same power to get the same profit as difficulty will be harder but mining will require less power.
But what if not everyone uses the faster kernel. I think my compuler/IDE is helping in this a lot, as it is kinda user unfriendly
Just as I thought also, there has not been a widespread migration to new kernel Smiley  Difficulty for newbs to set-up properly, combined with falling BTC value make direct mining less attractive to the "Dumpers" ... Diff is back to a reasonable range again as miners drop out of game ...
This is good for those of us who stick with it Smiley BTC will recover eventually, I am a long-term DMD holder anyway Smiley
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 14, 2015, 09:13:04 PM
 #236

Utahjohn: true!
Realhet: still there?

utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 14, 2015, 09:31:58 PM
Last edit: January 15, 2015, 12:31:55 AM by utahjohn
 #237

Realhet has given us an impressive tool to work with ... time to learn ASM coding ...
I don't really have a grasp on parallel processing and all the nuances of register usage on GPU ... looks like the ball is in your court Pallas.  Hopefully Realhet returns to continue on this project.

Found this, looking thru it now ...
http://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf

Wish I had a printed book of this reference ... wonder if one could order it online, I have no printer ...

This is what I was looking for Smiley Wow a lot to grasp Smiley

Nice, even gives opcodes, I bet this is reference realhet used to build hetpas Smiley

In his dev thread he mentions u can inline an instruction opcode for any that are not supported by hetpas assembler.  The opcode tables for generating instructions manually are in the ref manual Smiley
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 14, 2015, 11:47:44 PM
 #238

@pallas
will you be able to do your best first/last pass implementation in ASM?
Looks like realhet has moved on to another project ...
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 15, 2015, 02:28:11 AM
 #239

Hi,

"- asm is cooler than ocl ;-)"  Haha, yes!
And ocl needs black magic to optimize, asm just does what you tell it to.

"Nice, even gives opcodes, I bet this is reference realhet used to build hetpas Smiley"
I remember, I had a work that time when 7970 came out. I just got one in 2011 december. There was no manual for more than half a year, but the disassembler worked well. So I decoded the instruction set using the disassembler. I even found some undocumented ones that way. It was fun.
But for some unknown reasons this approach is broken because 1-2 years ago the disassembler is just does nothing when the .elf is a binary only .elf (this is the case when you use my assembler).

Some tips:
- Use Ctrl+Space in the IDE! It's like Intellisense/codeInsight. (Just start typing v_something!)
- Press F1 on any instruction, it will show a mini help.
- You can DD anything that doesn't implemented. (eg. "dd $12345678, 0x74732921, 1234" emits 3 uints into code)
- Disassembling small opencl programs is a good source of knowledge. Also this is the 'documentstion' on how to specific set of pass kernel parameters.

"Looks like realhet has moved on to another project ..."
Yea, I have to continue my job soon, as my free time runs out. I'm only planning to experiment with a bit of 2D rope physics. But whatever, now I'm in a Red Alert 2 'project', haha Cheesy
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 15, 2015, 09:18:27 AM
 #240

Oh, that's a pity you moved away...
Not sure I like the idea of learning another asm, even if it's very cool!
I understand the first and last round optimizations are boring to do, but could you please, before leaving us, fix the problem with multiple cards? Where card 0 doesn't provide any work unit while card 1 works fine? Thanks!

Pages: « 1 2 3 4 5 6 7 8 9 10 11 [12] 13 14 15 16 17 18 19 20 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!