Bitcoin Forum
December 11, 2016, 06:30:26 AM *
News: Latest stable version of Bitcoin Core: 0.13.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 [9] 10 11 12 13 14 15 16 »  All
  Print  
Author Topic: Modified Kernel for Phoenix 1.5  (Read 92316 times)
iopq
Hero Member
*****
Offline Offline

Activity: 644


View Profile
August 02, 2011, 03:14:14 PM
 #161

using poclbm fork with phatk2.1 and it's the fastest kernel so far
I tried with 2.4 opencl and it was slower, so I went back to 2.1 which is the fastest on my card (hd 5750)

Yeh I must say, poclbm fork with phatk2 outperformed phatk2 on phoenix 1.5 (and up till now all phatk mods performed between on phoenix 1.5 for me)

quite interesting.

ps. iopq, can you post the changes made to run phatk2.1 on poclbm mod by fpgaminer, I assume you are using that? Also what arg is added to use vectors4. Ive replaced phatk2.cl with phatk2.1 cl but I get ~11mh less with phatk2.1 so I am wondering if there is other changes required. I am using sdk 2.4

I'm using that, just replaced the phatk2 kernel with phatk2.1 and that's it
vectors4 should be slower, why would you want to use it? I use -v only

1481437826
Hero Member
*
Offline Offline

Posts: 1481437826

View Profile Personal Message (Offline)

Ignore
1481437826
Reply with quote  #2

1481437826
Report to moderator
1481437826
Hero Member
*
Offline Offline

Posts: 1481437826

View Profile Personal Message (Offline)

Ignore
1481437826
Reply with quote  #2

1481437826
Report to moderator
1481437826
Hero Member
*
Offline Offline

Posts: 1481437826

View Profile Personal Message (Offline)

Ignore
1481437826
Reply with quote  #2

1481437826
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1481437826
Hero Member
*
Offline Offline

Posts: 1481437826

View Profile Personal Message (Offline)

Ignore
1481437826
Reply with quote  #2

1481437826
Report to moderator
1481437826
Hero Member
*
Offline Offline

Posts: 1481437826

View Profile Personal Message (Offline)

Ignore
1481437826
Reply with quote  #2

1481437826
Report to moderator
1481437826
Hero Member
*
Offline Offline

Posts: 1481437826

View Profile Personal Message (Offline)

Ignore
1481437826
Reply with quote  #2

1481437826
Report to moderator
Clipse
Hero Member
*****
Offline Offline

Activity: 504


View Profile
August 02, 2011, 03:21:02 PM
 #162

using poclbm fork with phatk2.1 and it's the fastest kernel so far
I tried with 2.4 opencl and it was slower, so I went back to 2.1 which is the fastest on my card (hd 5750)

Yeh I must say, poclbm fork with phatk2 outperformed phatk2 on phoenix 1.5 (and up till now all phatk mods performed between on phoenix 1.5 for me)

quite interesting.

ps. iopq, can you post the changes made to run phatk2.1 on poclbm mod by fpgaminer, I assume you are using that? Also what arg is added to use vectors4. Ive replaced phatk2.cl with phatk2.1 cl but I get ~11mh less with phatk2.1 so I am wondering if there is other changes required. I am using sdk 2.4

I'm using that, just replaced the phatk2 kernel with phatk2.1 and that's it
vectors4 should be slower, why would you want to use it? I use -v only

Just wanted to test vectors4 with default memory, not high priority.

Still phatk2.1 is much slower than phatk2 for me as I said ~11mh per card, ati hd5850 , I wonder why o_0

...In the land of the stale, the man with one share is king... >> Clipse

We pay miners at 130% PPS | Signup here : Bonus PPS Pool (Please read OP to understand the current process)
Tx2000
Full Member
***
Offline Offline

Activity: 182



View Profile
August 02, 2011, 03:52:39 PM
 #163

I have a 3Mhash avg improvement over Diapolo last kernel update (393 -> 396)

Setup is as follows:

Reference 5850, 1.100v 920 core / 350 mem.  11.4 preview / SDK 2.4.   Lastest GUIMiner / phoenix 1.50


Going to run it for a day to see it's stability and report back if anything arises.
Phateus
Jr. Member
*
Offline Offline

Activity: 52


View Profile
August 02, 2011, 05:34:57 PM
 #164

@Phat:

I don't understand how you achieve, that base is always an uint as kernel parameter now that base has (uint2)(0, 1) or (uint4)(0, 1, 2, 3) added into it via the init-file. If I try to do this with my mod it just crashes Phoenix, now if I use const u base, instead of const uint base, it seems to work (because u reflects the correct variable type uint, uint2 or uint4). Have you got an idea for this?

Thanks,
Dia

I'm not sure I understand you... Depending on whether the number of nonces per thread (VECTORS) is 1, 2, or 4, the kernel compiles as base being either uint, uint2 or uint4.  The init file packs either 1, 2 or 4 uinits into each base entry and therefore, the init files always produces the same size variable as the kernel needs.  So, in short, both the base{i] variable being passed to the kernel and the "u base" value in the kernel can be either 1, 2 or 4 uints.  Does that answer your question?

http://deepbit.net/userbar/4dcec4d1816197e144000002_bfe143123a.png

Feeling Generous?
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476


moOo


View Profile
August 02, 2011, 05:42:57 PM
 #165

Quote
Woooo!, found the bug... it is in my kernel...


you rock sir phateus... all is working here and nice speed up.. especially over the stock phatk 1.0

but yeha faster than diablo 7-17 for me.. on a 5830 sdk 2.4 11.6 cat guiminer..

mooo for rent
ssateneth
Legendary
*
Offline Offline

Activity: 1288



View Profile
August 02, 2011, 07:15:09 PM
 #166

in case some feedback was wanted for VECTORS4, I got about 20 mhash improvment on my 5870 when I have it set to stock speeds (850/1200) when using computer normally (360 -> 380 mhash)
I will continue to use VECTORS when I am AFK (1015/355) for 470.1 mhash.

Diapolo
Hero Member
*****
Offline Offline

Activity: 769



View Profile WWW
August 02, 2011, 07:20:40 PM
 #167

@Phat:

I don't understand how you achieve, that base is always an uint as kernel parameter now that base has (uint2)(0, 1) or (uint4)(0, 1, 2, 3) added into it via the init-file. If I try to do this with my mod it just crashes Phoenix, now if I use const u base, instead of const uint base, it seems to work (because u reflects the correct variable type uint, uint2 or uint4). Have you got an idea for this?

Thanks,
Dia

I'm not sure I understand you... Depending on whether the number of nonces per thread (VECTORS) is 1, 2, or 4, the kernel compiles as base being either uint, uint2 or uint4.  The init file packs either 1, 2 or 4 uinits into each base entry and therefore, the init files always produces the same size variable as the kernel needs.  So, in short, both the base{i] variable being passed to the kernel and the "u base" value in the kernel can be either 1, 2 or 4 uints.  Does that answer your question?

I understand what you say and it makes sense, but not what I see now ... the variable base in your code _IS_ declared as u and not uint2. Did I look at the old 2.0 version!?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Phateus
Jr. Member
*
Offline Offline

Activity: 52


View Profile
August 02, 2011, 09:26:38 PM
 #168

@Phat:

I don't understand how you achieve, that base is always an uint as kernel parameter now that base has (uint2)(0, 1) or (uint4)(0, 1, 2, 3) added into it via the init-file. If I try to do this with my mod it just crashes Phoenix, now if I use const u base, instead of const uint base, it seems to work (because u reflects the correct variable type uint, uint2 or uint4). Have you got an idea for this?

Thanks,
Dia

I'm not sure I understand you... Depending on whether the number of nonces per thread (VECTORS) is 1, 2, or 4, the kernel compiles as base being either uint, uint2 or uint4.  The init file packs either 1, 2 or 4 uinits into each base entry and therefore, the init files always produces the same size variable as the kernel needs.  So, in short, both the base{i] variable being passed to the kernel and the "u base" value in the kernel can be either 1, 2 or 4 uints.  Does that answer your question?

I understand what you say and it makes sense, but not what I see now ... the variable base in your code _IS_ declared as u and not uint2. Did I look at the old 2.0 version!?

Dia

yes, it is declared as u (it was uint2 in 2.0, but have made it variable for efficiency)

Code:
#ifdef VECTORS4
typedef uint4 u;
#else
#ifdef VECTORS
typedef uint2 u;
#else
typedef uint u;
#endif
#endif

u is uint2 when VECTORS is declared

Bah, I know all of this scattered code is confusing

http://deepbit.net/userbar/4dcec4d1816197e144000002_bfe143123a.png

Feeling Generous?
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv
BTC_Junkie
Member
**
Offline Offline

Activity: 97


View Profile
August 03, 2011, 01:49:08 AM
 #169

Thanks, getting +1-3% on my cards... better improvement on 5800 series than 6900 series.

12jAZVfnCjKmPUXTszwmoji9S4NmY26Qvu
fpgaminer
Hero Member
*****
Offline Offline

Activity: 546



View Profile WWW
August 03, 2011, 02:07:11 AM
 #170

Hey fpgaminer, I really like this poclbm version of phatk2 but could you update the same version with --phatk2_1 switch or something so we could testdrive both versions with ease Smiley
Sure thing. All updated. Added --phatk2_1 option, and --vectors4 (which can only be used in combo with phatk2_1).

https://github.com/progranism/poclbm

Let me know how it works. I tested it on my 5850s. I tested with no vectors, vectors, and vectors4 and they all seemed to work.

For my own sake, I also added a special feature where you can use "-e -1" to force the hashing estimation algorithm to estimate hashing speed over the entire run-time of the miner, and include both accepted and rejected shares. I'm using it to check that the code is actually hashing at the reported rate; no duplicate nonces or other bugs.

what@3
Jr. Member
*
Offline Offline

Activity: 45


View Profile
August 03, 2011, 04:44:24 AM
 #171

my 6950 took a hit from 390 to 356 Mh/s

however all my 6870's all got a 7 mhs bump!

5830 up by 7 also to 327.9 mhs

Thanks!

https://imcex.com// a totally worthless exchange, trade orders are fake
lagmo
Member
**
Offline Offline

Activity: 67


View Profile
August 03, 2011, 03:51:42 PM
 #172

Awesome!
V. 2.1 kernel works flawlessly on my Linuxcoin 2.0 rigs (SDK 2.4 + 11.5 catalyst, HD5850/5830) generally + 3-4MH/s across the board compared to Diapolo 17-07
Excellent job!  Grin
phelix
Legendary
*
Offline Offline

Activity: 1680


nmc:id/phelix


View Profile
August 03, 2011, 03:57:03 PM
 #173

the graph is really cool! how did you create it?

it's very interesting to see that the hashrate really is increasing with lower mem clock.

Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.

blockchained.com ■ bitcointalk top posts
Phateus
Jr. Member
*
Offline Offline

Activity: 52


View Profile
August 03, 2011, 06:25:18 PM
 #174

the graph is really cool! how did you create it?

it's very interesting to see that the hashrate really is increasing with lower mem clock.

Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.


Manually testing and inputting data into a google docs spreadsheet :-p
As for going under 300, it might decrease performance, but it would probably also start getting unstable around 250 (from what I've tried).  Best to just try it on your own hardware.

http://deepbit.net/userbar/4dcec4d1816197e144000002_bfe143123a.png

Feeling Generous?
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv
iopq
Hero Member
*****
Offline Offline

Activity: 644


View Profile
August 04, 2011, 03:57:33 AM
 #175

the graph is really cool! how did you create it?

it's very interesting to see that the hashrate really is increasing with lower mem clock.

Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.


Manually testing and inputting data into a google docs spreadsheet :-p
As for going under 300, it might decrease performance, but it would probably also start getting unstable around 250 (from what I've tried).  Best to just try it on your own hardware.
225 gets unstable, but 200 is fine, you just didn't go LOW enough (kind of how 400 hung my GPU)
try 200, it's the best performance on my card with worksize 256, vectors 2

Phateus
Jr. Member
*
Offline Offline

Activity: 52


View Profile
August 04, 2011, 04:24:32 AM
 #176

Alright, new version 2.2 is coming out in the next couple days.

As the front page says, 1354 ALU Ops for the 5xxx series vs. 1359 for 2.1

Changes I've made in 2.2 are:
  • added a rotC function for constant values since the compiler apparently does not know how to perform rotate() on constants
Code:
#define rotC(x,n) (x<<n | x >> (32-n))
    [/li]
  • Small tweaking of the order of certain functions and other random things that shouldn't really have done anything >Shocked

I will add anything else I think of the next couple days... Also, keep the bug reports coming, so I know if I need to fix anything.


-Phateus

http://deepbit.net/userbar/4dcec4d1816197e144000002_bfe143123a.png

Feeling Generous?
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv
Phateus
Jr. Member
*
Offline Offline

Activity: 52


View Profile
August 04, 2011, 04:25:03 AM
 #177

the graph is really cool! how did you create it?

it's very interesting to see that the hashrate really is increasing with lower mem clock.

Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.


Manually testing and inputting data into a google docs spreadsheet :-p
As for going under 300, it might decrease performance, but it would probably also start getting unstable around 250 (from what I've tried).  Best to just try it on your own hardware.
225 gets unstable, but 200 is fine, you just didn't go LOW enough (kind of how 400 hung my GPU)
try 200, it's the best performance on my card with worksize 256, vectors 2

Awesome, thanks for the info, I'll definitely try it out.

http://deepbit.net/userbar/4dcec4d1816197e144000002_bfe143123a.png

Feeling Generous?
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476


moOo


View Profile
August 04, 2011, 04:34:42 AM
 #178

so i coppied
Code:
#define rotC(x,n) (x<<n | x >> (32-n))

and pasted it in my kernel file and nothing blew up

didnt really get any speed increases but of course I am flying blind here so perhaps that wasnt the right thing to do but hey i did it anyways.

My cat got sick on the carpet but I am willing to believe for now that it has nothing to do with your function

mooo for rent
dishwara
Legendary
*
Offline Offline

Activity: 1386

Truth may get delay, but NEVER fails


View Profile
August 04, 2011, 04:57:05 AM
 #179

My cat got sick on the carpet but I am willing to believe for now that it has nothing to do with your function
LOL
deepceleron
Legendary
*
Offline Offline

Activity: 1470



View Profile WWW
August 04, 2011, 06:23:13 AM
 #180

I have bad news to report - phatk 2.1 sends bad shares.

On pool mining hardware that consistently gets <2% rejects (and those only are stales within 5 seconds of a new block), I have only changed the phatk kernel:

2956/190 = 6.0% rejected
1944/290 = 13.0% rejected
2656/116 = 4.2% rejected
2615/184 = 6.6% rejected

Here's a log from this new kernel showing the atypical random rejects:
(old links)

We can see on the result line that the hashes are bad, by not starting with 00000000:
[03/08/2011 22:17:48] Result c877f46db0d6ab44... rejected

These do not give an "OpenCL error, hardware problem?", or a "didn't meet minimum difficulty, not sending", they are sent and rejected.

For an improvement in hashrate of 1% (333.58->336.53 typical) over Diapolo's 07-17 kernel, I get a 5% increase in rejects. I will have to revert. This is on WinXP/5830/11.6/SDK2.4 running phoenix.py 1.50 unmodified source on Python 2.6.6/numpy-1.6.0/... Two miner instances per GPU.

Command line is:
python phoenix.py -v -u http://xxx/ -k phatk VECTORS AGGRESSION=13 BFI_INT WORKSIZE=256 PLATFORM=0 DEVICE=0

Pages: « 1 2 3 4 5 6 7 8 [9] 10 11 12 13 14 15 16 »  All
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!