Bitcoin Forum
May 02, 2024, 06:46:18 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 [11] 12 13 14 15 16 17 18 19 20 »  All
  Print  
Author Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels  (Read 61214 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 12, 2015, 09:35:34 PM
 #201

The new version compiles fine, but of the two GPUs only id 1 works, id 0 doesn't produce any valid work unit.
Speed: r9 290 30Mh/s, r9 290x 33Mh/s (1100 MHz)
My experimental opencl kernel is a couple percent faster.

Even if you use Bitcoin through Tor, the way transactions are handled by the network makes anonymity difficult to achieve. Do not expect your transactions to be anonymous unless you really know what you're doing.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714675578
Hero Member
*
Offline Offline

Posts: 1714675578

View Profile Personal Message (Offline)

Ignore
1714675578
Reply with quote  #2

1714675578
Report to moderator
1714675578
Hero Member
*
Offline Offline

Posts: 1714675578

View Profile Personal Message (Offline)

Ignore
1714675578
Reply with quote  #2

1714675578
Report to moderator
1714675578
Hero Member
*
Offline Offline

Posts: 1714675578

View Profile Personal Message (Offline)

Ignore
1714675578
Reply with quote  #2

1714675578
Report to moderator
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 12, 2015, 09:50:34 PM
Last edit: January 12, 2015, 10:01:06 PM by utahjohn
 #202

The new version compiles fine, but of the two GPUs only id 1 works, id 0 doesn't produce any valid work unit.
Speed: r9 290 30Mh/s, r9 290x 33Mh/s (1100 MHz)
My experimental opencl kernel is a couple percent faster.
care to share newest incarnation of OCL ? PM me a link for personal use only Smiley Smiley
physixz
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
January 12, 2015, 11:43:48 PM
Last edit: January 13, 2015, 12:33:16 AM by physixz
 #203

when i run HetPas it doesn't detect the graphics cards even though im running 14.9 drivers. anybody know why?

i get either Runtime error: openCL error: CL_Device_not_found or no GCN device found when i re-enable the intel integrated graphics. i am running 3 R9 290's and ive tried 14.9 and 14.12 beta drivers and neither work
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 13, 2015, 12:46:01 AM
 #204

when i run HetPas it doesn't detect the graphics cards even though im running 14.9 drivers. anybody know why?

i get either Runtime error: openCL error: CL_Device_not_found or no GCN device found when i re-enable the intel integrated graphics. i am running 3 R9 290's and ive tried 14.9 and 14.12 beta drivers and neither work
I had to disable intel onboard graphics. uninstall all drivers and reinstall 14.7RC3.  What is happening is your AMD cards are being on wrong gpu-platform 1 in my case and Intel was gpu-platform 0.
Hetpas appears to be looking only on gpu-platform 0
completely uninstall all display drivers with DDU and then go to BIOS and disable onboard intel.  When AMD cards redetect they will appear on gpu-platform 0

AVOID 14.9 like the plague, it's OCL compiler is retarded.

physixz
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
January 13, 2015, 01:27:09 AM
 #205

Thanks, I wasnt disabling the intel in UEFI which was my problem. its working now at 26.5MH/s per card which is amazing.
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 13, 2015, 01:35:36 AM
 #206

Thanks, I wasnt disabling the intel in UEFI which was my problem. its working now at 26.5MH/s per card which is amazing.
Pallas is getting 30MHs on 290 with realhet ASM kernel ... so some further tuning now, play with intensity, gpu clock, drop mem clock to lowest possible (150 on my 280x).
JuanHungLo
Hero Member
*****
Offline Offline

Activity: 935
Merit: 1001


I don't always drink...


View Profile
January 13, 2015, 01:43:17 AM
 #207

Thanks, I wasnt disabling the intel in UEFI which was my problem. its working now at 26.5MH/s per card which is amazing.
I'm using 14.7r3, xI 2048, 1100/150, -w 256 undervolted to 1.00 and getting 23.38 MH/s.  What's your config?

Bull markets are born on pessimism, grow on skepticism, mature on optimism, and die on euphoria. - John Templeton
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 13, 2015, 01:43:20 AM
 #208

Note Intel GPU can be used for other algo such as X11, neoscrypt.  Now that AMD is on gpu-platform 0, you can try re-enable intel and see if it will pop up on gpu-platform 1.
check with sgminer -n in a command prompt window.
to specify which platform to use on sgminer command line --gpu-platform 0, or 1 ...
display # counts still start from 0 on each gpu-platform.

I have heard of ppl also running nvidia cards in same box with AMD, yet another gpu-platform selection ... Smiley
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 13, 2015, 02:28:13 AM
 #209

Hi All,

Important things to the top:
* I slightly updated the HetPas150111_Groestl.zip -> MH/s values are now the same as in SG.
* I've updated the main page with benchmark data I've collected: http://realhet.wordpress.com/gcn-asm-groestl-coin-kernel/
* I've uploaded the diamondTahiti binary, so now there are 2 precompiled bins, thank you utahjohn!


* My MH/s missunderstanding.
Thany you all for the investigations, now I see it clearly.
When I tried the groestlcoin.cl on my card on 14.9 and it ran on 2 MH/s. If I convert the 25MH/s from R9 290 down to my HD7770, then I should have got 4 MH/s.
And here comes my bad decision:
I didn't believed that the 14.6->14.9 changes were so bad that they slowed the kernel more than 2x. Actually it was 2.6x slower than my expectations.
And because the algorithm contains technically 2 hash calculations I thought that multiplying by 2 gives me the correct MH/s.
But as it turned out they indeed broke 14.9 so badly.
So If I ever thought about hating ocl, now I hate it more than twice. To be precise I hate it 2.6x more. Cheesy
But on the optimistic side because of 14.9 made an exceptional quality of cr4p out of the ocl kernel, that gave me the false feel of success to continue optimizing, haha.
Anyways, I'm happy that it is solved now.


* HetPas and Catalyst version
When you compile an ASM kernel, my compiler generates a pure binary (and some parameters eg. LDS size)
In order to make it run it have to generate a complicated ELF binary image, so it will ask for one from the OpenCL compiler.
This small skeleton kernel contains the kernel parameters that you request in the assembly source.
For this groestl kernel I supply it a special skeleton.cl (see below in this post).
So when CpenCL compiled this small skeleton kernel, my program will patch the binary and other parameters into is. Also cut out every unwanted parts such as ocl, llwmir, amd_il sections. There is even a few kilobytes of zeroes in the ELF just to be compatible with terribly old hardware, I cut that out too.
And because I use the current OpenCL system, that's why the produced binary will be only compatible with that kind of hardware.


* Binary kernels and Catalyst versions
AFAIK when a kernel binary os loaded by clBuildKernei it doesn't check if it is compatible by cat version. Or any other version number.
So the binary is quiet transferable between versions.
When incompatibility occurs that can be caused by these things:
- driver developers changed the ELF file structure (for example they removed some sections: in 13.4 they removed the amd_il section from the inner ELF image. Yes, it is an ELF inside an ELF. Cheesy) This can cause an error ow access violation when loading the kernel.
- driver developers changed the way/format kernel parameters are passed. This kind of incompatibility can causes a crash on the GPU.
So it doesn't matter that you compile with hetpas on 14.7, I just wrote 14.9 on my blog because I was 100% sure that my program works on 14.9


* "cross compile" option
Yea, it would be a nice feature. To do it I need binaries from all hardware, so I can 'dissect' them and maybe find out how to produce them manually.
I'm not going to understand the complete binary structure as amd can change it any time, and they must do it when they improve things anyways.
I only want to inject GCN binary into the hardware as simple as I can.
But with analyzing different binaries maybe I can find out how to change a binary to be compatible with a specific hardware.
For example If there are too much hardware dependent options that also depends on the kernel's parameters, then it's impossible to do without fully understanding how parameters are exchanged between the driver and the (specific) hardware.


* 32bit/64bit
Ok, now I understand. HetPas is all 32bit, so I haven't noticed there can be 64bit ELF's too.
I can guess that the Linux driver uses a an API of the OS to access ELF contents and that's why 32/64nit is important...
Please compile this kernel to a 64 bit binary and send me:
__attribute__((reqd_work_group_size(256, 1, 1)))
void search(__global unsigned char* block, volatile __global uint* output, const ulong target)
{ if(target>0) output[get_local_id(0)] = block[get_global_id(0)]; }


* "neoscrypt kernel"
Is this similar to LiteCoin?
1 year ago I played with LiteCoin's salsa, It was fun, but I wasn't able to outperform opencl.
But in the future I have plans to make a special salsa that will use LDS instead of the slow ram. This will be an interesting experiment as I gonna have to try some assembly exclusive things in order to outperform the original kernel:
- To be able to use 64KB lds for one thread I'll have to connect wavefront pairs to share their 32KB allocs with each other. For this I have to know that the current wavefront is running on which compute unit (s_get_hwreg).
- synching the two kernels on each CU individually will require some research. (GCN has an awesome global wave synch feature by hardware, so maybe there is something for 'local' too. If not, maybe I can poll GDS)
- because only one 'thread' will work actively on a CU, there'll be no latency hiding, so I have to program the kernel in a paralell way (but, no probs, I'll have all the 256 regs...)
- By the textbook: LDS throughput is 64x better(IMO it's not) than MEM throughput on a HD7970. So this would be the benefit.
- threads in workitems can copy register data from each other. So while I calculate only 1 salsa using the 2*32KB LDS for lookup (lookup_gap=2), I can spread data across more lanes on the wavefront and make calculations in paralell.
I've just checked neoscrypt.cl, it's insane Cheesy But if I see it well, the half of it is SALSA.


* "Guys! We do not need more optimization!"
I've thought about this too. But I think if everyone use better kernels, then everyone will use the same power to get the same profit as difficulty will be harder but mining will require less power.
But what if not everyone uses the faster kernel. I think my compuler/IDE is helping in this a lot, as it is kinda user unfriendly Cheesy


* Just a question about LiteCoin
Do you know that is it worth to optimize it on GPU? Or too many FPGA/ASIC there too?
I'm just curious only. I'd like to play that salsa algo, but my free time is running out soon.


* @qwep1  ### RESULT IS WRONG ####
Something is totally went bad there.
Tahiti is tested already, and the 'elapsed' is ok too, but in the memory dump that is garbage. What Catalyst are you using? Is the memory clock setting ok?
* kernel_dump\ folder is in the same folder as the groestl_isa.hpas program that you're running in HetPas.


* "AVOID 14.9 like the plague"
Haha, I'll try 14.12 now.
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 13, 2015, 02:48:04 AM
Last edit: January 13, 2015, 03:24:28 AM by utahjohn
 #210

forget litecoin and all scrypt coins, they are asic territory now and GPU mining pointless on them.  Wolf0 can explain what he did to optimize neoscrypt, It was some major improvements ... I might be able to dig up a link ...

Here is last neoscrypt OCL I got from wolf0
https://mega.co.nz/#!cFEGTBBY!snQhOeLs6E_giKx2rY_i7XNcv95dASkrrRzlDOq7fIE
and some update from the forum
https://forum.feathercoin.com/index.php?/topic/7780-dev-neoscrypt-gpu-miner-public-beta-test/page-41#entry71777
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 13, 2015, 03:35:12 AM
 #211

Look better Smiley  Dev 0 is 280x Dev 1 is 7950

List of opencl devices:
Device #0
Target: Tahiti  Series: 7  Core:1150 MHz  CU:32  RAM:3072 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Device #1
Target: Tahiti  Series: 7  Core:1150 MHz  CU:28  RAM:3072 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

Using device:
Target: Tahiti  Series: 7  Core:1150 MHz  CU:32  RAM:3072 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
* core MHz value is not always accurate, use Catalyst Control Center (or ADL) instead!

Using new GCN ASM code
Kernel binary saved: C:\Miners\HetPas150111_Groestl\groestl\kernel_dump\kernel.elf

elapsed: 53.609 ms  24.449 MH/s   gain:  12.22x
elapsed: 50.710 ms  25.847 MH/s   gain:  12.92x
elapsed: 50.670 ms  25.868 MH/s   gain:  12.93x
elapsed: 50.707 ms  25.849 MH/s   gain:  12.92x

Functional test: RESULT IS OK

   idx        hi       lo           hi           lo
     0: 16410000 D9080000    373358592   -653787136
     1: 4A820000 D0630000   1250033664   -798818304
     2: C3E00000 EDA60000  -1008730112   -307888128
     3: 1F020100 33FF0000    520225024    872349696
     4: 8A200100 F8F10000  -1977614080   -118423552
     5: 9F3A0100 C22D0100  -1623588608  -1037238016
     6: A6000200 86D40100  -1509948928  -2032926464
     7: C52A0200 A7190200   -987102720  -1491533312
     8: 36610200 F6380200    912327168   -164101632
     9: B8E80200 6BAB0200  -1192754688   1806369280
     A: B72B0300 E9280300  -1221917952   -383253760
     B: 684F0300 B04C0300   1750008576  -1337195776
     C: FA6F0300 F15D0300    -93388032   -245562624
     D: A9B80300 BE8D0300  -1447558400  -1098054912
     E: 06CF0300 5FCF0300    114230016   1607402240
     F: DCF90300 EDF00300   -587660544   -303037696
    10: FF300400 2F0A0400    -13630464    789185536
    11: D2DB0400 D5830400   -757398528   -712834048
    12: 97060500 53CD0400  -1761213184   1405944832
    13: E3100500 77160500   -485489408   1997931776
    14: 0E2E0500 3E1B0500    237896960   1041958144
    15: FA460500 2F490500    -96074496    793314560
    16: 2A860500 D0650500    713426176   -798685952
    17: 4BCC0500 8C950500   1271661824  -1936390912
    18: 860F0600 1DED0500  -2045835776    502072576
    19: 3B810600 3C710600    998311424   1014040064
    1A: E09B0600 E9840600   -526711296   -377223680
    1B: 58FE0600 56AF0600   1493042688   1454310912
    1C: 44160700 CBF90600   1142294272   -872872448
    1D: F9240700 DA1F0700   -115079424   -635500800
    1E: 79910700 64700700   2039547648   1685063424
    1F: 98FD0700 DFA10700  -1728248064   -543095040
    20: 44450800 8E0E0800   1145374720  -1911683072
    21: 1E4D0800 8B570800    508364800  -1957230592
    22: 317D0800 52670800    830277632   1382483968
    23: 20A30800 BE830800    547555328  -1098708992
    24: CFAE0800 FCAA0800   -810678272    -55965696
    25: AED30800 00B40800  -1361901568     11798528
    26: 37150900 1D070900    924125440    487000320
    27: 37570900 EE210900    928450816   -299824896
    28: 8F9C0900 21740900  -1885599488    561252608
    29: 729D0900 38960900   1922894080    949356800
    2A: 8C270A00 10D20900  -1943598592    282200320
    2B: A5460A00 163C0A00  -1522136576    373033472
    2C: 93540A00 2E470A00  -1823208960    776407552
    2D: FF7A0A00 19650A00     -8779264    426052096
    2E: BCEA0A00 A09A0A00  -1125512704  -1600517632
    2F: 94210B00 76F80A00  -1809773824   1995966976
    30: 2E5B0B00 38310B00    777718528    942738176
    31: 0BF70B00 27610B00    200739584    660671232
    32: CB8B0C00 EA5D0C00   -880079872   -363000832
    33: 2AA20C00 D59B0C00    715262976   -711259136
    34: 2AB00C00 38AB0C00    716180480    950733824
    35: 79DB0C00 DFC60C00   2044398592   -540668928
    36: A21B0D00 5D0E0D00  -1575285504   1561201920
    37: 05370D00 84190D00     87493888  -2078733056
    38: 58A90D00 4FAA0D00   1487473920   1336544512
    39: 26EF0D00 BAB10D00    653200640  -1162801920
    3A: EA030E00 E0F50D00   -368898560   -520811264
    3B: 960B0E00 A5090E00  -1777660416  -1526133248
    3C: 12410E00 2F140E00    306253312    789843456
    3D: 785B0E00 47490E00   2019233280   1195970048
    3E: 017C0E00 5D7B0E00     24907264   1568345600
    3F: 87B40E00 4D9A0E00  -2018243072   1301941760
    40: 83E20E00 7ACB0E00  -2082337280   2060127744
    41: 11110F00 85E70E00    286330624  -2048455168
    42: F3270F00 AE130F00   -215544064  -1374482688
    43: 19540F00 E8390F00    424939264   -398913792
    44: F4630F00 2C5D0F00   -194834688    744296192
    45: 66780F00 997A0F00   1719144192  -1720054016
    46: D1AE0F00 0AA50F00   -777122048    178589440
    47: 96C00F00 65AD0F00  -1765798144   1705840384
    48: 4ECB0F00 D8C40F00   1321930496   -658239744
    49: 4CF90F00 F1DF0F00   1291390720   -237039872
    4A: 6A1C1000 44171000   1780224000   1142362112
    4B: 62E21000 80841000   1658982400  -2138828800
    4C: 6F2B1100 26141100   1865093376    638849280
    4D: 3D481100 DB351100   1028133120   -617279232
    4E: A04F1100 F3521100  -1605431040   -212725504
    4F: 02BC1100 06881100     45879552    109580544
    50: 1DD71100 73D41100    500633856   1943277824
    51: 0CE51100 E7DF1100    216338688   -404811520
    52: BEFA1100 8DEA1100  -1090907904  -1914040064
    53: F6341200 DB2F1200   -164359680   -617672192
    54: 99541200 54331200  -1722543616   1412633088
    55: 10931200 01711200    278073856     24187392
    56: 1BAC1200 DBA21200    464261632   -610135552
    57: 7EF11200 37F01200   2129728000    938480128
    58: 7FFB1200 40F51200   2147160576   1089802752
    59: 84811300 38791300  -2071915776    947458816
    5A: 6DCA1300 98B11300   1841959680  -1733225728
    5B: 0A001400 64F01300    167777280   1693455104
    5C: 00000000 00000000            0            0
    5D: 00000000 00000000            0            0
    5E: 00000000 00000000            0            0
    5F: 00000000 00000000            0            0
    60: 00000000 00000000            0            0
    61: 00000000 00000000            0            0
    62: 00000000 00000000            0            0
    63: 00000000 00000000            0            0
    64: 00000000 00000000            0            0
    65: 00000000 00000000            0            0
    66: 00000000 00000000            0            0
    67: 00000000 00000000            0            0
    68: 00000000 00000000            0            0
    69: 00000000 00000000            0            0
    6A: 00000000 00000000            0            0
    6B: 00000000 00000000            0            0
    6C: 00000000 00000000            0            0
    6D: 00000000 00000000            0            0
    6E: 00000000 00000000            0            0
    6F: 00000000 00000000            0            0
    70: 00000000 00000000            0            0
    71: 00000000 00000000            0            0
    72: 00000000 00000000            0            0
    73: 00000000 00000000            0            0
    74: 00000000 00000000            0            0
    75: 00000000 00000000            0            0
    76: 00000000 00000000            0            0
    77: 00000000 00000000            0            0
    78: 00000000 00000000            0            0
    79: 00000000 00000000            0            0
    7A: 00000000 00000000            0            0
    7B: 00000000 00000000            0            0
    7C: 00000000 00000000            0            0
    7D: 00000000 00000000            0            0
    7E: 00000000 00000000            0            0
    7F: 000000B8 00000000          184            0
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 13, 2015, 03:57:29 AM
 #212

Well that neoscrypt is quiet complicated. I can't even got it compiled as I think it needs more defines than just WORKSIZE alone. Some day I gonna chack it from a closer view as it is interesting...

Now I have Cat 14.12 omega (whatever it is) now. Asm kernel is unchanged, original Ocl kernel is 15% faster than cat 14.9 but it is still way too bad.

I've compared your diamondTahiti compilation with my Capeverde one. The differences are not that complicated:
- In the ELF's header the 'archtype' field is 3FF vs. 3FD
- In the small binary info section (outer elf)   2x bytes are different: 9F vs. 9C
- In the small binary info section (inner elf)   one byte difference: 1C vs. 1A
- In the text ARG section the only difference is the strings: capeverde vs. tahiti

So if I collect all these constants/strings I can convert from one to another. But Capeverde and Tahiti are identical chips. It's possible that the binary of Hawaii is much more different.
And yet the two binary (capeverde and tahiti) are almost the same, the clBuildKernel() checks for hardware ids and refuses to load it.
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 13, 2015, 04:12:32 AM
 #213

ROFL u just had to try 14.12 hahahaha ... now back to 14.7RC3 LOL
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 13, 2015, 04:19:05 AM
 #214

* I've updated the main page with benchmark data I've collected: http://realhet.wordpress.com/gcn-asm-groestl-coin-kernel/

30 Mh/s is for the r9 290, 290x does 33.

mitache365
Hero Member
*****
Offline Offline

Activity: 731
Merit: 500


View Profile
January 13, 2015, 08:05:07 AM
Last edit: January 13, 2015, 08:18:29 AM by mitache365
 #215

Thanks, I wasnt disabling the intel in UEFI which was my problem. its working now at 26.5MH/s per card which is amazing.
I'm using 14.7r3, xI 2048, 1100/150, -w 256 undervolted to 1.00 and getting 23.38 MH/s.  What's your config?

same here. 23.4
witch is the right miner?

BTC
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 13, 2015, 08:49:56 AM
 #216

Thanks, I wasnt disabling the intel in UEFI which was my problem. its working now at 26.5MH/s per card which is amazing.
I'm using 14.7r3, xI 2048, 1100/150, -w 256 undervolted to 1.00 and getting 23.38 MH/s.  What's your config?

same here. 23.4
witch is the right miner?
Yer not gonna get 30-33MHs right out of the box on 290/290x, you will have to tune intensity, gpu clock, mem clock (lowest possible).  Pallas can help with these cards if he's in right mood.
On 280x (1180/150) I was able to use my tuning from previous kernel to get 26.0MHs only because it was already maxed out Smiley Volt modded, vbios modded etc.  Info about these techniques is in the thread if you look about ...
As far as miner ... sgminer 4.1.0 (sph) is what I use ...
mitache365
Hero Member
*****
Offline Offline

Activity: 731
Merit: 500


View Profile
January 13, 2015, 10:00:59 AM
 #217

I am at stock core 1070 and mem 1100. this is the difference maybe.

BTC
physixz
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
January 13, 2015, 10:03:36 AM
 #218

Can I ask what software you are using to change the values as I'm using msi after burner 4.1 but it wont change the memory clock and the core clock is always lower than what I set?

These changes are really pushing my cards now as they normally sat at 50'C but are now over 60'C (They are watercooled). I will put up power usage when i next reboot and plug the power meter in.
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 13, 2015, 10:26:04 AM
 #219

Can I ask what software you are using to change the values as I'm using msi after burner 4.1 but it wont change the memory clock and the core clock is always lower than what I set?

These changes are really pushing my cards now as they normally sat at 50'C but are now over 60'C (They are watercooled). I will put up power usage when i next reboot and plug the power meter in.
Not sure if you can do this to a 290/290x card because vbios likely to be quite different.  You will have to do some research before you attempt my method usiing VBE7.0.0.7b.exe it is a video bios editor u can use to change voltages, clocks at board level.  If I remember correctly it was only for Tahiti cards ... do your research, then u flash vbios with atiwinflash.  There may be programs like msi afterburner but I did my card at low level Smiley Again check  out if it will work on your card before you do it or u can "brick" your card haha
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 13, 2015, 10:38:42 AM
 #220

Wow I got Best share: 702K
Was it a block?Huh :-D

Now let's get serious: I finally have a little time to write some considerations on the ocl and asm kernels.
I believe we should pursue the asm path for a number or reasons:

- currently the OCL kernel is a little faster on hawaii but not on all other cards and I don't think it can be improved in this respect
- the OCL kernel has been tweaked and optimized for months, while the asm one is new so there is probably much more room for improvement
- just by applying the first and last round optimization the asm kernel will probably be faster on hawaii as well; I'm sure that Realhet will find other asm tricks to apply
- with all these catalyst version problems, the best way to share kernels for the people to mine is by bin files, making the asm version and ocl equivalent (for distribution purposes); better yet would be a miner with all the bundled bin files (takes time)
- asm is cooler than ocl ;-)

what do you guys think?

Pages: « 1 2 3 4 5 6 7 8 9 10 [11] 12 13 14 15 16 17 18 19 20 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!