Bitcoin Forum
December 05, 2016, 06:44:28 PM *
News: Latest stable version of Bitcoin Core: 0.13.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 9 [10] 11 12 13 14 15 16 »  All
  Print  
Author Topic: Modified Kernel for Phoenix 1.5  (Read 92140 times)
ssateneth
Legendary
*
Offline Offline

Activity: 1288



View Profile
August 04, 2011, 09:08:03 AM
 #181

Since I really liked the graph on the front page but thought it lacked granularity, I'm going to take a shot at making a graph too. I'll be doing tests on a 5830 instead of a 5870 though (My 5830 seems a LOT more stable when it comes to memory speeds compared to my 5870).
They'll be based on...
GUIMiner v2011-07-01
Built-in Phoenix miner
11.7 Catalyst
2.5 SDK
phatk 2.1 kernel with..
BFI_INT FASTLOOP=false AGGRESSION=14 and varying worksizes, memory speeds, and VECTORS vs VECTORS4.

Stay tuned Smiley

Edit: Here's a work in progress spreadsheet. It's updated as I test more combos (need to manually test and update spreadhseet manually).
https://spreadsheets.google.com/spreadsheet/ccc?key=0AjXdY6gpvmJ4dEo4OXhwdTlyeS1Vc1hDWV94akJHZFE&hl=en_US

I was planning to put in worksizes of 192, 96, and 48 too, but phatk 2.1 doesn't seem to support it. Less work for me though Tongue

1480963468
Hero Member
*
Offline Offline

Posts: 1480963468

View Profile Personal Message (Offline)

Ignore
1480963468
Reply with quote  #2

1480963468
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1480963468
Hero Member
*
Offline Offline

Posts: 1480963468

View Profile Personal Message (Offline)

Ignore
1480963468
Reply with quote  #2

1480963468
Report to moderator
1480963468
Hero Member
*
Offline Offline

Posts: 1480963468

View Profile Personal Message (Offline)

Ignore
1480963468
Reply with quote  #2

1480963468
Report to moderator
1480963468
Hero Member
*
Offline Offline

Posts: 1480963468

View Profile Personal Message (Offline)

Ignore
1480963468
Reply with quote  #2

1480963468
Report to moderator
Diapolo
Hero Member
*****
Offline Offline

Activity: 769



View Profile WWW
August 04, 2011, 12:32:11 PM
 #182

@Phat:

What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix).
Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
ssateneth
Legendary
*
Offline Offline

Activity: 1288



View Profile
August 04, 2011, 01:45:53 PM
 #183

So far tests indicate the first "sweet spot" is ~220 MHz with VECTORS WORKSIZE=128. The next "sweet spot" (and fastest one yet) is ~370-380MHz with VECTORS WORKSIZE=256. Will keep you guys posted as I run through more combos.

dishwara
Legendary
*
Offline Offline

Activity: 1372


Truth may get delay, but NEVER fails


View Profile
August 04, 2011, 03:33:08 PM
 #184

My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325
CanaryInTheMine
Donator
Legendary
*
Offline Offline

Activity: 1512


between a rock and a block!


View Profile
August 04, 2011, 04:16:57 PM
 #185

My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?

| In Default we Trust | Need gold/silver for btc? | Buy bitcoins |
mike678
Full Member
***
Offline Offline

Activity: 168


View Profile
August 04, 2011, 04:33:01 PM
 #186

My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?
I'd say it hold fairly true for my 5830's I get my best megahash as 1030/350.

I've only done minor tests with my 5850's but I have a really hard time getting the mem clock down because it starts to crash my system when I do that.
CanaryInTheMine
Donator
Legendary
*
Offline Offline

Activity: 1512


between a rock and a block!


View Profile
August 04, 2011, 04:35:46 PM
 #187

My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?
I'd say it hold fairly true for my 5830's I get my best megahash as 1030/350.

I've only done minor tests with my 5850's but I have a really hard time getting the mem clock down because it starts to crash my system when I do that.

Mike, did you ever figure out that problem you had with MSI Afterburner?  I think you posted on my thread as well about same issue I had...

| In Default we Trust | Need gold/silver for btc? | Buy bitcoins |
mike678
Full Member
***
Offline Offline

Activity: 168


View Profile
August 04, 2011, 04:58:13 PM
 #188

Mike, did you ever figure out that problem you had with MSI Afterburner?  I think you posted on my thread as well about same issue I had...
Which thread are you talking about? I know I made a thread the other day in support about afterbuner freezing when I hit apply for my 5850's but cant remember what your thread was. If your talking about the freezing I haven't had a chance to test any further with the 5850's because I literally spent from the time I got out of work to like 1 am working on a skeleton case and trying to figure out why the psu was making a clicking noise.

Also I know you got the ncixus 5850's as well whats your top speed on those so far? I can get up to 395ish with stock voltage.
Phateus
Jr. Member
*
Offline Offline

Activity: 52


View Profile
August 04, 2011, 05:18:10 PM
 #189

@deepceleron

That is completely baffling to me... I use the EXACT same code to determine which hashes to send as diapolo (from the poclbm kernel).

How are you getting the hash results?

If they are from the server, then i'm pretty sure they never will be 00000.... because the rejection comes because of a mismatch in state data, so the hash comes out different on the server than on your client, right?

If they are from the Client, are you running a modified version of phoenix?  I don't think the stock phoenix logs that information.  If you are, could you post details, so I can look into the bug.

Quote
Two miner instances per GPU.
Why are you running 2 instances per GPU?  That seems like it would just increase overhead and double the amount of stales.  Try only running 1 instance per GPU and perhaps decreasing the AGGRESSION from 13 to 12.  If that doesn't fix it, I'm not sure what else I can do without further information.

Anyone else getting this bug?

http://deepbit.net/userbar/4dcec4d1816197e144000002_bfe143123a.png

Feeling Generous?
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv
Phateus
Jr. Member
*
Offline Offline

Activity: 52


View Profile
August 04, 2011, 05:33:39 PM
 #190

@Phat:

What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix).
Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?

Dia

Yeah... not sure... I wasn't paying that much attention to in 2.4, but what I've noticed is that using less registers improves performance.  Mainly, I've noticed that with heavy register usage (or high WORKSIZE numbers), the MH/s became more and more dependent on memory speed (probably the main reason why VECTORS4 performs terribly at low memory speeds).

But, overall I didn't even know I had 2.5, so clearly I didn't really notice a difference, lol.

As of the newest edit on the first page, I am using CAL 11.7.


On a side note, my card exploded (well, the fan died) yesterday, so my work may be slowed a little.  I'm not saying this has anything to do with joulesbeef's cat sicking up on the carpet, but cats are a crafty bunch...

http://deepbit.net/userbar/4dcec4d1816197e144000002_bfe143123a.png

Feeling Generous?
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv
dishwara
Legendary
*
Offline Offline

Activity: 1372


Truth may get delay, but NEVER fails


View Profile
August 04, 2011, 06:17:14 PM
 #191

My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?
Only trial & error will tell.
My sweet spot for 6870 is mem clk = (core clk/3) + 14.
I havn't tested sweet spot for 6970 yet, since my mother board is in repair for the past 7 days & when i was mining Linux didn't allowed to under clock not more than core clock minus 125 Mhz.
I hope Windows 11.8 will give correct sweet spot for 6970, which i know once i got my mother board back.
Diapolo
Hero Member
*****
Offline Offline

Activity: 769



View Profile WWW
August 04, 2011, 07:43:07 PM
 #192

@Phat:

What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix).
Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?

Dia

Yeah... not sure... I wasn't paying that much attention to in 2.4, but what I've noticed is that using less registers improves performance.  Mainly, I've noticed that with heavy register usage (or high WORKSIZE numbers), the MH/s became more and more dependent on memory speed (probably the main reason why VECTORS4 performs terribly at low memory speeds).

But, overall I didn't even know I had 2.5, so clearly I didn't really notice a difference, lol.

As of the newest edit on the first page, I am using CAL 11.7.


On a side note, my card exploded (well, the fan died) yesterday, so my work may be slowed a little.  I'm not saying this has anything to do with joulesbeef's cat sicking up on the carpet, but cats are a crafty bunch...

In terms of efficiency one has to consider if a higer RAM frequency is worth it, becaus the cards draws much more power with a higher mem clock :-/. The sweet spot for my 5870 and 5830 seems to be @ 350 MHz Mem.

Hope you get a new card soon Smiley!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476


moOo


View Profile
August 04, 2011, 07:57:53 PM
 #193

Yeah Phatk i hate to say it but I am having similar issues as deepceleron.

I started to notice an uptick in stales, I thought it was due to our proxy as we had problems before and we update it a lot.
about 3-5% across the board

i reverted back to dia 7-17 for the the past 10 hours, and I have less than 1% stales.. which is normal for me.
Using a 5830, 2.4 11.6 win7 32

phoenix,  guiminer  VECTORS BFI_INT -k phatk FASTLOOP=false WORKSIZE=256 AGGRESSION=12 -q2

mooo for rent
Phateus
Jr. Member
*
Offline Offline

Activity: 52


View Profile
August 04, 2011, 08:30:41 PM
 #194

@joulesbeef

Hmm... this might be a really hard bug to find.  If anyone has any ideas...
At first I was thinking it was because I compare the nonce to 0, but that would only give false negatives (1 in every 4 billion nonce will not be found)
The main difference between mine and diapolo's init file is that I pack 2 bases together and send them to the kernel.  I may try to get rid of the Base variable altogether and just use the offset parameter of the EnqueueKernel() command (I think you can do that in pyopencl)... 
Basically just thinking out loud... Undecided
If i didn't love low level programming so much, I think I would shoot myself  Tongue

@Diapolo

Yeah, the unreleased version I am working on uses 20 registers (It performs about the same as a configuration which uses 19 but has 2 more ALU OPs)
Also, are you getting increased number of stales now that you have implemented some of the optimizations from phatk?

http://deepbit.net/userbar/4dcec4d1816197e144000002_bfe143123a.png

Feeling Generous?
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv
deepceleron
Legendary
*
Offline Offline

Activity: 1470



View Profile WWW
August 04, 2011, 09:04:00 PM
 #195

@deepceleron

That is completely baffling to me... I use the EXACT same code to determine which hashes to send as diapolo (from the poclbm kernel).

How are you getting the hash results?

If they are from the server, then i'm pretty sure they never will be 00000.... because the rejection comes because of a mismatch in state data, so the hash comes out different on the server than on your client, right?

If they are from the Client, are you running a modified version of phoenix?  I don't think the stock phoenix logs that information.  If you are, could you post details, so I can look into the bug.

Quote
Two miner instances per GPU.
Why are you running 2 instances per GPU?  That seems like it would just increase overhead and double the amount of stales.  Try only running 1 instance per GPU and perhaps decreasing the AGGRESSION from 13 to 12.  If that doesn't fix it, I'm not sure what else I can do without further information.

Anyone else getting this bug?


The output that I pastebinned is the standard console output of phoenix in -v verbose mode, I just highlighted the screen output on my console (with a 3000 line buffer) and copy-pasted it. It includes the first eight bytes of the hash in the results as you can see.

Actually when I said that it was unmodified phoenix that I was running, I lied, by forgetting I had done this modification at line 236 in KernelInterface.py (because of a difficulty bug in a namecoin pool I was previously using):

Original:
        if self.checkTarget(hash, nr.unit.target):
            formattedResult = pack('<76sI', nr.unit.data[:76], nonce)
            d = self.miner.connection.sendResult(formattedResult)
            def callback(accepted):
                self.miner.logger.reportFound(hash, accepted)
            d.addCallback(callback)
            return True
        else:
            self.miner.logger.reportDebug("Result didn't meet full "
                   "difficulty, not sending")
            return False

Mine:
        formattedResult = pack('<76sI', nr.unit.data[:76], nonce)
        d = self.miner.connection.sendResult(formattedResult)
        def callback(accepted):
            self.miner.logger.reportFound(hash, accepted)
        d.addCallback(callback)
        return True


All I've done is remove the second difficulty check in phoenix, and trust that the kernel is returning only valid difficulty 1 shares. Now, instead of spitting out an error "Result didn't meet full difficulty, not sending", phoenix sends on all results returned by the kernel to the pool. Without this mod, logs of your kernel would just show a "didn't meet full difficulty" error message instead of rejects from the pool, which would still be a problem (but the helpful hash value wouldn't be printed for debugging). We can see from the hash value that the bad results are nowhere near a valid share.

This code mod only exposes a problem in the kernel optimization code, that sometimes wild hashes are being returned by the kernel from some bad math (or by the kernel code being vulnerable to some overclocking glitch that no other kernel activates.) Are these just "extra" hashes that are leaking though, or is the number of valid shares being returned by the kernel lower too - hard to tell without a very long statistics run.

I am running two miners per GPU not for some random reason, but because it works. With the right card/overclock/OS/etc, I seem to get a measurable improvement in total hashrate vs one miner (documented by using the phoenix -a average flag with a very long time period and letting the miners run days). The only way my results could not be true would be if the time-slicing between two miners messes up the hashrate calculation displayed, but if this was true, such a bug would present with multiple-gpu systems running one phoenix per gpu too.

With only a 1% improvement from a kernel that works for me, reducing the aggression or running one miner would put the phatk 2.1 performance below what I already had. Putting back the diapolo kernel, I'm back to below 2500/100 on my miners.

My python lib versions are documented here.

Joulesbeef:
I don't like the word 'stales' for rejected shares unless it specifically refer to shares rejected at a block change because they were obsolete when submitted to a pool, as logged by pushpool. The results I have above are not stale work, they are invalid hashes.

BOARBEAR
Member
**
Offline Offline

Activity: 77


View Profile
August 04, 2011, 09:14:48 PM
 #196

Do you think VLIW4 is a step backward from VLIW5?

VLIW4 is slower than VLIW5 in many computational tasks
Phateus
Jr. Member
*
Offline Offline

Activity: 52


View Profile
August 04, 2011, 10:15:06 PM
 #197

Quote
The output that I pastebinned is the standard console output of phoenix in -v verbose mode
Oh, thanks, I didn't even know you could do that... I'll do some testing with that

I think I'm going to have to download the source code for Phoenix and see what is actually happening...

Quote
I am running two miners per GPU not for some random reason, but because it works. With the right card/overclock/OS/etc, I seem to get a measurable improvement in total hashrate vs one miner (documented by using the phoenix -a average flag with a very long time period and letting the miners run days). The only way my results could not be true would be if the time-slicing between two miners messes up the hashrate calculation displayed, but if this was true, such a bug would present with multiple-gpu systems running one phoenix per gpu too.

With only a 1% improvement from a kernel that works for me, reducing the aggression or running one miner would put the phatk 2.1 performance below what I already had. Putting back the diapolo kernel, I'm back to below 2500/100 on my miners.

I agree totally, go with what works.  I am just trying to figure all this out.  Thanks for all your help.

-Phateus

http://deepbit.net/userbar/4dcec4d1816197e144000002_bfe143123a.png

Feeling Generous?
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv
ssateneth
Legendary
*
Offline Offline

Activity: 1288



View Profile
August 05, 2011, 01:51:10 AM
 #198

If it's relavant, I have not had any increase in stales. GUIMiner v2011-07-01, built-in phoenix, phatk 2.1, catalyst 11.7, 2.5 SDK, 1 5870 + 5 5830's using these extra flags in GUI miner: -k phatk VECTORS BFI_INT WORKSIZE=256 FASTLOOP=false AGGRESSION=14

deepceleron
Legendary
*
Offline Offline

Activity: 1470



View Profile WWW
August 05, 2011, 02:15:00 AM
 #199

If it's relavant, I have not had any increase in stales. GUIMiner v2011-07-01, built-in phoenix, phatk 2.1, catalyst 11.7, 2.5 SDK, 1 5870 + 5 5830's using these extra flags in GUI miner: -k phatk VECTORS BFI_INT WORKSIZE=256 FASTLOOP=false AGGRESSION=14

Unless you use the -v flag for verbose logging in phoenix, set your console window so it has a log of thousands of lines you can scroll back through, and look for the "Result didn't meet full difficulty, not sending" error message, you wouldn't see any difference.

joulesbeef
Sr. Member
****
Offline Offline

Activity: 476


moOo


View Profile
August 05, 2011, 02:39:40 AM
 #200

I'll give it another try.. and use the verbose tag to see what is going on.
right now i have 2 rejects over 360 shares on diablos newest 8-4 version.
3 different pools, both rejects at the same pool, all 3 have over 100 shares.
30 shares with yours 2.1 and no rejects which looks good so far.. I'll let you know when i get up over 300, maybe it was a fluke as some of my pools had connection issues.

mooo for rent
Pages: « 1 2 3 4 5 6 7 8 9 [10] 11 12 13 14 15 16 »  All
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!