further improved phatk_dia kernel for Phoenix + SDK 2.6

Diapolo (OP)

Hero Member

Offline

Activity: 773
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 06, 2011, 11:02:36 AM

#261

Quote from: ovidiusoft on August 06, 2011, 10:57:41 AM

It says otherwise in the first post Tongue

You are right, sorry for that! Just VECTORS2 is the way to go. I edited the first post.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

joulesbeef

Sr. Member

Offline

Activity: 476
Merit: 250

moOo

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 06, 2011, 08:31:26 PM

#262

Quote

Phoenix 1.5 has the bfipatcher.py included, so I never included it in any kernel package

strange must not have been in my guiminer's version of phoenix 1.5 which does seem different.

Quote

You are right, sorry for that! Just VECTORS2 is the way to go. I edited the first post

just to be clear though.. vectors vectors2 doesnt hurt anything.. it is just extraneous?

I preferred to leave it as it was lest typing and deleting when testing the versions.

mooo for rent

Diapolo (OP)

Hero Member

Offline

Activity: 773
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 06, 2011, 09:31:21 PM

#263

Quote from: joulesbeef on August 06, 2011, 08:31:26 PM

Quote

Phoenix 1.5 has the bfipatcher.py included, so I never included it in any kernel package

strange must not have been in my guiminer's version of phoenix 1.5 which does seem different.

Quote

You are right, sorry for that! Just VECTORS2 is the way to go. I edited the first post

just to be clear though.. vectors vectors2 doesnt hurt anything.. it is just extraneous?

I preferred to leave it as it was lest typing and deleting when testing the versions.

Yeah, it doesn't hurt, it's just ignored.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

bcforum

Full Member

Offline

Activity: 140
Merit: 100

Re: further improved phatk OpenCL Kernel (> 3% increase) for Phoenix - 2011-08-04

August 07, 2011, 01:39:19 AM

#264

Quote from: Diapolo on August 04, 2011, 07:32:46 PM

New version was just released, it should be the fastest for 69XX cards:
Download version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788

This is the preferred switch for Phoenix in order to achieve comparable performance:

Code:

-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=256

Code:

-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128

Please test this version with SDK 2.4 / SDK 2.5! SDK 2.1 performance seems worse, but at least it should work. Report any errors and problems here and let me know what you think.
Have a look at your cards temperatures, I got a report, that they may be lower, which would be great

.

Regards,
Dia

I get 0.8MH/s faster with phoenix-r112, but temps do appear to be 3C-4C lower.

6970 Lightning (940,1375) x2
Ubuntu 10.10
SDK 2.4
Cat 11.3

If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM

dishwara

Legendary

Offline

Activity: 1855
Merit: 1016

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 07, 2011, 03:52:36 AM

#265

Quote from: joulesbeef on August 06, 2011, 08:31:26 PM

just to be clear though.. vectors vectors2 doesnt hurt anything.. it is just extraneous?
I preferred to leave it as it was lest typing and deleting when testing the versions.

But if their is 2 vectors like "vectors vectors2" , which will be taken in to acc. 1st one or last one in command line?
coz vectors2 & vectors both give different performances.

jedi95

Full Member

Offline

Activity: 219
Merit: 120

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 07, 2011, 06:37:46 AM

#266

The latest version (2011-08-04) has a major problem that I can see.

The assumption that there won't be more than 1 valid nonce per kernel execution is very wrong. At aggression 14 for example each kernel execution tests 2^30 nonces. The chance that there will be more than 1 valid nonce in any given kernel execution in this case is going to be about 2.5% (if I did the math right) This effectively causes a net loss in performance compared to the previous version at high aggression. At lower aggression values (10 and below) this is less of a problem since the performance loss in these cases will be much less than 1%.

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU

Diapolo (OP)

Hero Member

Offline

Activity: 773
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 07, 2011, 07:21:39 AM

#267

Quote from: jedi95 on August 07, 2011, 06:37:46 AM

You have to compare the loss of valid nonces to the higher efficiency because of the removed control flow in the kernel (all current GPUs dislike if/else and so on). I thought this tradeoff would be well worth it, but you could prove me wrong. I was thinking about a better way of writing the positive nonces into output, but that didn't work.

Any good ideas for that part of the kernel will be a big plus!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

Diapolo (OP)

Hero Member

Offline

Activity: 773
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 07, 2011, 03:13:05 PM

#268

Updated 1st post kernel performance data with SDK 2.5 and KernelAnalyzer 1.9 Cal 11.7 profile.

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

Beta-coiner1

Hero Member

Offline

Activity: 532
Merit: 500

Re: further improved phatk OpenCL Kernel (> 3% increase) for Phoenix - 2011-08-04

August 07, 2011, 05:55:18 PM

#269

Quote from: bcforum on August 07, 2011, 01:39:19 AM

Quote from: Diapolo on August 04, 2011, 07:32:46 PM

Code:

-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=256

Code:

-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128

.

Regards,
Dia

I get 0.8MH/s faster with phoenix-r112, but temps do appear to be 3C-4C lower.

6970 Lightning (940,1375) x2
Ubuntu 10.10
SDK 2.4
Cat 11.3

I can confirm the temps difference,which I thought was strange.Using Catalyst 11.6B/SDK 2.5 on a 6950 @867/1250 using V 4 W64 F3 temps are 3 C lower using GUI miner.Hash rate has also increased 3 Mh's using those settings as well as invalids are definitely much lower vs. Phataeus.

BITMIXER.IO High Volume Bitcoin MIXER

Diapolo (OP)

Hero Member

Offline

Activity: 773
Merit: 500

Re: further improved phatk OpenCL Kernel (> 3% increase) for Phoenix - 2011-08-04

August 07, 2011, 07:15:22 PM

#270

Quote from: Beta-coiner1 on August 07, 2011, 05:55:18 PM

Quote from: bcforum on August 07, 2011, 01:39:19 AM

Quote from: Diapolo on August 04, 2011, 07:32:46 PM

Code:

-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=256

Code:

-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128

.

Regards,
Dia

I get 0.8MH/s faster with phoenix-r112, but temps do appear to be 3C-4C lower.

6970 Lightning (940,1375) x2
Ubuntu 10.10
SDK 2.4
Cat 11.3

I have to ask to understand you ... you say that my current pre-release version generates 3°C less heat for your card and invalid share rate is lower in comparison to the latest Phateus phatk?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

Diapolo (OP)

Hero Member

Offline

Activity: 773
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 07, 2011, 07:27:31 PM

#271

To all happy new kernel users, there is one thing you should know ... there have been NO donations since 2011-07-31, which makes me a bit sad.

It's my free time that I put in here (it were many hours till now) and the motivation is not only to get a "Thank you!". Remember, you guys generate more BTC with the kernel mods. It doesn't matter if it's my mod, Phateus mod or any others mod ... just be a little thankful and you keep a free and fast kernel + a motivated kernel mixer Diapolo Wink

.

No offense to all the great people who already donated a few bitcents or even more, who helped me testing this, who helped me fix bugs or who added great ideas into this work!

Regards,
Diapolo

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

Beta-coiner1

Hero Member

Offline

Activity: 532
Merit: 500

Re: further improved phatk OpenCL Kernel (> 3% increase) for Phoenix - 2011-08-04

August 07, 2011, 08:07:03 PM

#272

Quote from: Diapolo on August 07, 2011, 07:15:22 PM

I have to ask to understand you ... you say that my current pre-release version generates 3°C less heat for your card and invalid share rate is lower in comparison to the latest Phateus phatk?

Dia
[/quote]Yes,that would be correct.also sent a Bitcent your way to help out even though it might not be much.Here's hoping to more development for the 69xx architecture. Wink

BITMIXER.IO High Volume Bitcoin MIXER

drlatino999

Sr. Member

Offline

Activity: 335
Merit: 250

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 07, 2011, 10:59:38 PM
Last edit: August 08, 2011, 01:00:16 AM by drlatino999

#273

Using the recommended settings -

Code:

-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128

My 6950 dropped 3C, 5830 stayed the same.

Sappers clear the way

joulesbeef

Sr. Member

Offline

Activity: 476
Merit: 250

moOo

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 07, 2011, 11:49:40 PM

#274

Quote

WORKSIZE=128p

typo or something knew I dont know about?

mooo for rent

drlatino999

Sr. Member

Offline

Activity: 335
Merit: 250

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 08, 2011, 01:00:03 AM

#275

Typo, let me edit that to reflect.

Sappers clear the way

Diapolo (OP)

Hero Member

Offline

Activity: 773
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 08, 2011, 04:31:25 AM

#276

Quote from: joulesbeef on August 07, 2011, 11:49:40 PM

Quote

WORKSIZE=128p

typo or something knew I dont know about?

It's only a typo there ...

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

RedLine888

Full Member

Offline

Activity: 236
Merit: 109

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 08, 2011, 09:19:31 AM

#277

Hi! Dunno whether the info I provide would be of any use but nevertheless...

Installed the 2011-08-04 kernel version and got + ~4 MHs on 6950 and - ~3 MHs on 5870 and my 5870 became unstable!!!

It works at 990 core and 360 mem with the previous version of your kernel and is perfectly stable but with this new version the driver crashes after a few seconds at even 980 core. The temps are perfect and stay at less than 78 C.

Thanx though for your work!

ssateneth

Legendary

Offline

Activity: 1344
Merit: 1004

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 08, 2011, 05:14:29 PM

#278

I still don't know why people are doing "VECTORS VECTORS2". VECTORS is an invalid argument for diapolo phatk ever since 8-04. The only valid arguments are VECTORS2 and VECTORS4.

Quote

Important: since version 2011-08-04 (pre-release) you have to use the switch VECTORS2 instead of VECTORS. I made this change to be clear what vectors are used in the kernel (2- or 4-component). To use 4-component vectors use switch VECTORS4.

I am a long time trusted user: Bitcointalk forum trust ratings, Bitcoin-OTC Ratings, eBay Feedback, and Localbitcoins public profile.

jedi95

Full Member

Offline

Activity: 219
Merit: 120

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 08, 2011, 06:57:58 PM
Last edit: August 09, 2011, 02:55:18 AM by jedi95

#279

Quote from: Diapolo on August 07, 2011, 07:21:39 AM

After looking at the code more carefully your method is only problematic if more than 1 vector component returns a valid nonce. The odds of this happening are EXTREMELY small, since you would have to find more than 1 valid hash in a range of only 2 or 4 hashes.

That said, I have devised a way to remove the if(nonce) control structure entirely. This makes a couple assumptions:

1. Control flow instructions have a large clock cycle penalty regardless of the branch taken (so you get 44 cycle penalty on Cypress and Cayman regardless of if H == 0)
2. Writing values to output[] for every nonce even if the nonce is invalid does not incur a significant clock cycle cost relative to the control flow instructions. (ideally <10 clocks, but if it's below ~30 the code below will still be faster than the current code)

The steps:

1. OR the low 16-bits of H against the high 16 bits
2. Take the resulting 16-bit number and OR the low 8 bits against the high 8-bits
3. Take the resulting 8-bit number and OR the low 4 bits against the high 4-bits
4. Take the resulting 4-bit number and OR the low 2 bits against the high 2-bits
5. Take the resulting 2-bit number and NOR the first bit against the second bit

6. do bitwise AND of the resulting 1-bit number against the nonce
7. take the result from #6 and XOR the low 16-bits against the high 16-bits
8. take the resulting 16-bit number from #7 and OR the low 8-bits against the high 8-bits
9. store the result by doing output[OUTPUT_SIZE] = OUTPUT[result of #8] = nonce

Steps 1-5 create a single bit indicating if the nonce meets H == 0. When you bitwise AND this against the nonce in step 6 you will get 0 for any invalid nonces and for valid nonces you will just get the nonce again. (1 AND X = X)

Steps 7-8 are to produce an 8-bit index that is 0 for all invalid nonces and hopefuly unique for each valid nonce assuming there are a small number of valid nonces. However in the worst case (more than 1 hash found in a single execution) at least 1 will be returned. However if 3 or less nonces are found per execution all of them should be returned in most cass.

output[0] will be overwritten constantly by invalid nonces (since the 1-bit number from step 5 will be 0 unless the hash satisfies H == 0, the resulting 8-bit number will also be 0) output[>0] will contain valid nonces will a small chance of collisions.

Cypress and Cayman (58xx and 69xx respectively) have a 44 cycle latency for control flow instructions

Steps 1 - 8 should execute in 1 clock each (however they can't be vectorized, so this won't exploit any ILP)

Step 9 takes no longer than the current code for valid nonces, but this will now also apply to invalid nonces.

overall this should be fast, return only valid nonces, and retain the capability to return more than one nonce if the assumptions above are true.

An example of how even a single 1 in the input will cause the output of steps 1-5 to be 0:
--------------------------------------------------------------------------------------

H = 0000000000000001 0000000000000000

00000000 00000001
00000000 00000000
-------------------OR
00000000 00000001

0000 0000
0000 0001
----------OR
0000 0001

00 00
00 01
------OR
00 01

0 0
0 1
---OR
0 1

0
1
-NOR
0

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU

Diapolo (OP)

Hero Member

Offline

Activity: 773
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-04

August 08, 2011, 07:38:59 PM

#280

Quote from: jedi95 on August 08, 2011, 06:57:58 PM

Quote from: Diapolo on August 07, 2011, 07:21:39 AM

After looking at the code more carefully your method is only problematic if more than 1 vector component returns a valid nonce. The odds of this happening are EXTREMELY small, since you would have to find more than 1 valid hash in a range of only 2 or 4 hashes.

That said, I have devised a way to remove the if(nonce) control structure entirely. This makes a couple assumptions:

1. Control flow instructions have a large clock cycle penalty regardless of the branch taken (so you get 44 cycle penalty on Cypress and Cayman regardless of if H == 0)
2. Writing values to output[] for every nonce even if the nonce is invalid does not incur a significant clock cycle cost relative to the control flow instructions. (ideally <10 clocks, but if it's below ~30 the code below will still be faster than the current code)

The steps:

1. AND the low 16-bits of H against the high 16 bits
2. Take the resulting 16-bit number and OR the low 8 bits against the high 8-bits
3. Take the resulting 8-bit number and OR the low 4 bits against the high 4-bits
4. Take the resulting 4-bit number and OR the low 2 bits against the high 2-bits
5. Take the resulting 2-bit number and NOR the first bit against the second bit

6. do bitwise AND of the resulting 1-bit number against the nonce
7. take the result from #6 and XOR the low 16-bits against the high 16-bits
8. take the resulting 16-bit number from #7 and OR the low 8-bits against the high 8-bits
9. store the result by doing output[OUTPUT_SIZE] = OUTPUT[result of #8] = nonce

Steps 1-5 create a single bit indicating if the nonce meets H == 0. When you bitwise AND this against the nonce in step 6 you will get 0 for any invalid nonces and for valid nonces you will just get the nonce again. (1 AND X = X)

Steps 7-8 are to produce an 8-bit index that is 0 for all invalid nonces and hopefuly unique for each valid nonce assuming there are a small number of valid nonces. However in the worst case (more than 1 hash found in a single execution) at least 1 will be returned. However if 3 or less nonces are found per execution all of them should be returned in most cass.

output[0] will be overwritten constantly by invalid nonces (since the 1-bit number from step 5 will be 0 unless the hash satisfies H == 0, the resulting 8-bit number will also be 0) output[>0] will contain valid nonces will a small chance of collisions.

Cypress and Cayman (58xx and 69xx respectively) have a 44 cycle latency for control flow instructions

Steps 1 - 8 should execute in 1 clock each (however they can't be vectorized, so this won't exploit any ILP)

Step 9 takes no longer than the current code for valid nonces, but this will now also apply to invalid nonces.

overall this should be fast, return only valid nonces, and retain the capability to return more than one nonce if the assumptions above are true.

An example of how even a single 1 in the input will cause the output of steps 1-5 to be 0:
--------------------------------------------------------------------------------------

H = 0000000000000001 0000000000000000

00000000 00000001
00000000 00000000
-------------------OR
00000000 00000001

0000 0000
0000 0001
----------OR
0000 0001

00 00
00 01
------OR
00 01

0 0
0 1
---OR
0 1

0
1
-NOR
0

Thanks Jedi, I will look into this tomorrow, the last thing I tried was (and look into every piece of the output buffer):

Code:

const uint2 nonce = (uint2){((Vals[7].x == -H[7]) * W_3.x), ((Vals[7].y == -H[7]) * W_3.y)};

output[OUTPUT_MASK & (nonce.x >> 2)] = nonce.x;
output[OUTPUT_MASK & (nonce.y >> 2)] = nonce.y;

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 [14] 15 16 17 18 19 20 21 » All

Bitcoin Forum > Bitcoin > Mining > Mining software (miners) > further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13

« previous topic next topic »