Bitcoin Forum
May 08, 2024, 02:14:30 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 »  All
  Print  
Author Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13  (Read 51194 times)
zmcgrew
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
July 07, 2011, 12:54:13 AM
 #61

Just wanted to say thanks for the hard work, but today's (07/06/2011) kernel dropped me by about 2 Mh/s.

07/03/2011 got me ~300.8 Mh/s, but 07/06/2011 won't go above ~298.5 Mh/s.

I'm running on a 6870, Catalyst 11.6, SDK 2.4, and using the following: BFI_INT VECTORS AGGRESSION=13 WORKSIZE=128

Card is clocked at 960Mhz core, and 300 Mhz RAM.
Once a transaction has 6 confirmations, it is extremely unlikely that an attacker without at least 50% of the network's computation power would be able to reverse it.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715177670
Hero Member
*
Offline Offline

Posts: 1715177670

View Profile Personal Message (Offline)

Ignore
1715177670
Reply with quote  #2

1715177670
Report to moderator
1715177670
Hero Member
*
Offline Offline

Posts: 1715177670

View Profile Personal Message (Offline)

Ignore
1715177670
Reply with quote  #2

1715177670
Report to moderator
1715177670
Hero Member
*
Offline Offline

Posts: 1715177670

View Profile Personal Message (Offline)

Ignore
1715177670
Reply with quote  #2

1715177670
Report to moderator
swivel
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
July 07, 2011, 05:15:47 AM
 #62

Nice work! Plugged in the 2011-07-06 kernel to phoenix and saw my 5850 jump from 348 Mhash/s to 354 Mhash/s.


Debian sid 64-bit
Catalyst 11.6 and AMD APP 2.4 SDK
phoenix 1.50 with VECTORS BFI_INT WORKSIZE=256 AGGRESSION=12
XFX 5850 BE 860 core 300 memory stock voltage fan speed at 55% temp at 61C

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 07, 2011, 05:59:43 AM
 #63

Just wanted to say thanks for the hard work, but today's (07/06/2011) kernel dropped me by about 2 Mh/s.

07/03/2011 got me ~300.8 Mh/s, but 07/06/2011 won't go above ~298.5 Mh/s.

I'm running on a 6870, Catalyst 11.6, SDK 2.4, and using the following: BFI_INT VECTORS AGGRESSION=13 WORKSIZE=128

Card is clocked at 960Mhz core, and 300 Mhz RAM.

Could you raise your Mem clock to ~350 MHz and report back. What about Worksize of 256, for 5830 cards this helps a lot.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
gominoa
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
July 07, 2011, 07:16:09 AM
 #64

This doesnt compile when VECTORS is defined.

Quote
Build on <pyopencl.Device 'Cypress' at 0x34a3680>:

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[19] = P4(19) + 0x11002000 + P1(19);
                         ^

I cant post on the mining thread, but this is the same error reported there.
Works fine without VECTORS defined.
Vrekk
Newbie
*
Offline Offline

Activity: 3
Merit: 0


View Profile
July 07, 2011, 08:01:12 AM
 #65

Got an increase from 425 to 435 :-) Thanks a bunch!! Sent a little something something your way/
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 07, 2011, 09:50:37 AM
 #66

This doesnt compile when VECTORS is defined.

Quote
Build on <pyopencl.Device 'Cypress' at 0x34a3680>:

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[19] = P4(19) + 0x11002000 + P1(19);
                         ^

I cant post on the mining thread, but this is the same error reported there.
Works fine without VECTORS defined.

I'm looking into this, it seems to only happen for SDK 2.1!
In the other thread, we try to nail it down ... if I find a solution to this a fixed version will be upped.
If you have no problem with a bit fiddling in the code, you can try to change a few lines.

Code:
W[19] = P4(19) + (u)0x11002000 + P1(19);

W[30] = P3(30) + (u)0xA00055 + P1(30);

Vals[3] = (u)L + W[64];

W[81] = P4(81) + P2(81) + (u)0xA00000;

W[87] = P4(87) + P3(87) + (u)0x11002000 + P1(87);

W[94] = P3(94) + (u)0x400022 + P1(94);

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
SeriousWorm
Newbie
*
Offline Offline

Activity: 54
Merit: 0



View Profile
July 07, 2011, 10:32:27 AM
 #67

Wow, I got a nice increase when I upped my memory to 350mhz.
6870 @ 980/350/1.25V:
310 mhash/sec, 10 aggression.
312 mhash/sec, 12 aggression.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 07, 2011, 11:03:11 AM
 #68

Wow, I got a nice increase when I upped my memory to 350mhz.
6870 @ 980/350/1.25V:
310 mhash/sec, 10 aggression.
312 mhash/sec, 12 aggression.


Latest kernel seems to be sensitive to higher Mem clock, thanks for verifying.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 07, 2011, 03:20:02 PM
 #69

New version 2011-07-07 is ready: http://www.mediafire.com/?7j70gnmllgi9b73

This is mainly a bugfix release for SDK 2.1 with some code restructuring to save a few writes and additions. I can not guarantee, that this really works for 2.1, because I didn't test it. If you are unsure, wait for users to test it for you and consider applying this patch later!

By the way, I want to thank all of those who donated a few Bitcents to me, feels great!

Thanks,
Dia

PS.: If it works, please post here and consider a small donation @ 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM Smiley.

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
conspirosphere.tk
Legendary
*
Offline Offline

Activity: 2352
Merit: 1064


Bitcoin is antisemitic


View Profile
July 07, 2011, 04:19:36 PM
Last edit: July 08, 2011, 12:55:35 AM by conspirosphere.tk
 #70

This cause the immediate crash and closing of Phoenix miner 1.50 for me, so I'm reverting to your previous patch.
Donation sent.

update: it was my -f flag. Without it, it now works.

BTW: How do you get accurate measures of your Mhs??? My Phoenix miner oscillates between 170 and 190 Mhs.
gominoa
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
July 07, 2011, 10:58:10 PM
 #71

New version 2011-07-07 works on SDK 2.1 w/ VECTORS.

Thanks
Bert
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 08, 2011, 02:31:17 AM
 #72

... snip ...
BTW: How do you get accurate measures of your Mhs??? My Phoenix miner oscillates between 170 and 190 Mhs.

I add "-a 50" to average the Mhash/sec over 50 samples, this overrides the default value of 10 and smooths out the jumps, but it is slower to converge to the real hash rate. So the jumps are 5 times smaller.


$ ./phoenix.py --help
Usage: phoenix.py -u URL [-k kernel] [kernel params]

Options:
  -h, --help            show this help message and exit
  -v, --verbose         show debug messages
  -k KERNEL, --kernel=KERNEL
                        the name of the kernel to use
  -u URL, --url=URL     the URL of the mining server to work for [REQUIRED]
  -q QUEUESIZE, --queuesize=QUEUESIZE
                        how many work units to keep queued at all times
 -a AVGSAMPLES, --avgsamples=AVGSAMPLES
                        how many samples to use for hashrate average
$

Tip jar: 1BW6kXgUjGrFTqEpyP8LpVEPQDLTkbATZ6
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 08, 2011, 04:57:06 AM
 #73

New version 2011-07-07 works on SDK 2.1 w/ VECTORS.

Thanks

So how does it work for you? Compared to other kernels? Which cards do you use?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
burningrave101
Newbie
*
Offline Offline

Activity: 55
Merit: 0


View Profile
July 08, 2011, 05:28:39 AM
 #74

Tested the latest 2011-07-07 kernel on my 6990 @ 880Mhz core using the latest 7/1 version of GUIMiner without any additional kernel tweaks and saw roughly a 15 Mh/s increase. Thanks and hope to see further improvements in hash rate to come Smiley.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 08, 2011, 05:30:47 AM
 #75

Tested the latest 2011-07-07 kernel on my 6990 @ 880Mhz core using the latest 7/1 version of GUIMiner without any additional kernel tweaks and saw roughly a 15 Mh/s increase. Thanks and hope to see further improvements in hash rate to come Smiley.

It gets's harder after each new version, so I guess next version could take some time Smiley. Any ideas and hints are welcome.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
kr105
Hero Member
*****
Offline Offline

Activity: 938
Merit: 501


View Profile
July 08, 2011, 06:44:36 AM
 #76

Asus EAH5850, core 840, mem 180, volt 1080:

version 2011-07-01: 338mh/s
version 2011-07-03: 336mh/s
version 2011-07-06: 301mh/s
version 2011-07-07: 301mh/s

I'll try to play with core/mem clocks again, because this values was the optimals for the old phatk. Thanks.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 08, 2011, 07:45:38 AM
 #77

Asus EAH5850, core 840, mem 180, volt 1080:

version 2011-07-01: 338mh/s
version 2011-07-03: 336mh/s
version 2011-07-06: 301mh/s
version 2011-07-07: 301mh/s

I'll try to play with core/mem clocks again, because this values was the optimals for the old phatk. Thanks.

I bet 0,1 BTC, that you will reach higher values, with raised mem clocks Cheesy. Deal?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Bert
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 08, 2011, 08:18:32 AM
Last edit: July 08, 2011, 09:58:23 AM by Bert
 #78

... snip ...
Any ideas and hints are welcome.

Dia

I've been toying with an idea, but I don't have the necessary programming skills (or knowledge of the SHA-256 algorithm) to implement anything.

http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_APP_SDK_FAQ.pdf
Quote
41. What is the difference between 24-bit and 32-bit integer operations?

    24-bit operations are faster because they use floating point hardware and can execute on all compute unts. Many 32-bit integer operations also run on all stream processors, but if both a 24-bit and a 32-bit version exist for the same  instruction, the 32-bit instruction executes only one per cycle.

43. Do 24-bit integers exist in hardware?

    No, there are 24-bit instructions, such as MUL24/MAD24, but the smallest integer in hardware registers is 32-bits.

75. Is it possible to use all 256 register in a thread?

    No, the compiler limits a wavefront to half of the register pool, so there can always be at least two wavefronts executing in parallel.
http://developer.amd.com/sdks/amdappsdk/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf
Page 4-62
Quote
24-bit integer MULs and MADs have five times the throughput of 32-bit integer multiplies. 24-bit unsigned integers are natively supported only on the Evergreen family of devices and later. Signed 24-bit integers are supported only on the Northern Island family of devices and later. The use of OpenCL built-in functions for mul24 and mad24 is encouraged. Note that mul24 can be useful for array indexing operations.

http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=144722
Quote
On the 5800 series, signed mul24(a,b) is turned into
Code:
(((a<<8)>>8)*((b<<8)>>8))
. This makes it noticeably SLOWER than simply using a*b. Unsigned mul24(a,b) uses a native function. mad24 is similar. I made some kernels which just looped the same operation over and over:
signed a * b: 0.9736s
unsigned mul24(a,b): 0.9734s
signed mul24(a,b): 2.2771s

So anyhow what I was thinking was the following

Current  kernel: 1 * 256 bit hash / 32int =  8 32bit operations (speed 100% )
Possible Kernel: 3 * 256 bit hash / 24int = 32 24bit operations (speed a maximum of  166% [5 times faster divided by 3 SHA-256 operations in parallel])*

* It may actually end up being slower than the current kernel.cl if 32bit and 24bit operations are sent as wavefronts at the same time.

There may be some merit in trying to write a new kernel.cl that uses 32 x 24bit integers to carry out 3 parallel SHA-256 operations at once faster than one SHA-256 operation using 8 32bit integers .

But not everything can be carried out as 24bit operations, only mul24(a,b) and mad24(a,b), so the 166% speed up would only be achieved if every SHA-256 operation was covered by these two operations. The new kernel.cl would be limited to modern ATI hardware (54xx-59xx,67xx-69xx), which is generally what miners are using.

But to be honest I haven't looked into the SHA-256 algorithm, so I'm not sure if parts of it could ever be rewritten to utilise mad24(a,b) or mul24(a,b). But I like thinking outside the box.

Tip jar: 1BW6kXgUjGrFTqEpyP8LpVEPQDLTkbATZ6
zmcgrew
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
July 08, 2011, 09:36:38 AM
 #79

Could you raise your Mem clock to ~350 MHz and report back. What about Worksize of 256, for 5830 cards this helps a lot.

Played with mem clock speeds. 350 saw no improvement, but 600 to 1050 saw a ~.5 Mh/s improvement, but still not enough to get me back the 2 Mh/s I lost.

Work size of 256 dropped off another few Mh/s, so that definitely didn't help. It seems like 07/03/2011 is the winner for me! =)
Thanks for your efforts though, I'll definitely keep testing and see if the newer kernels can return to the 07/03/2011 level.
makiet
Newbie
*
Offline Offline

Activity: 20
Merit: 0


View Profile
July 08, 2011, 10:13:27 AM
 #80

nice work, I'll try it  Wink
Pages: « 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!