Bitcoin Forum
April 24, 2024, 10:12:29 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 [200] 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 ... 843 »
  Print  
Author Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.1  (Read 5805212 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic. (3 posts by 1+ user deleted.)
Diapolo
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 17, 2012, 06:39:08 AM
 #3981

Con, maybe we can talk about integrating global offset parameter support into CGMINER?

Take a short look at http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNDRangeKernel.html and the global_work_offset parameter. All that has to be taken into consideration from the kernel-side is in DiaKGCN. OpenCL 1.1 detection is in your code, too, which is needed, but I can't do the other required changes without a compiler.

In short, the nonce-base is not supplied via the base parameter, if GOFFSET is enabled, but instead via the global_work_offset parameter and used via the global work-item ID in the kernel. This saves a few instructions and can give us a small boost.

Dia
I tried writing code to send nonce as the global offset parameter and your code returned duplicate work on 2 or more vectors. Looking at your code:
Code:
u nonce = ((uint)get_global_id(0) << 1) + (u)(0, 1);
This won't be right as global id will now be the global thread id + the global offset parameter so doubling it will give random results.

Hey Con,

The global_work_offset value you pass to clEnqueueNDRangeKernel is used as the starting value for get_global_id(0). If global_work_offset would be 10 and we had 5 work-items, the nonces generated would be 20, 21, 22, 23, 24, 25, 26, 27, 28 and 29 so we loose nonces from 10 to 19 and use ones, that should not be base ... would that create duplicate work? You pass -D GOFFSET to the kernel, right?

Edit: Oh and I have to warn you, yesterday I tested a new AMD driver, which has a new OpenCL runtime and I was unable to use CGMINER on Windows, so perhaps AMD "fixed" the binary generation but this breaks your solution from a few days ago, because the old message that no .bin could be generated was back. I switched back to a former runtime :-(. Seems to suck!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
1713996749
Hero Member
*
Offline Offline

Posts: 1713996749

View Profile Personal Message (Offline)

Ignore
1713996749
Reply with quote  #2

1713996749
Report to moderator
1713996749
Hero Member
*
Offline Offline

Posts: 1713996749

View Profile Personal Message (Offline)

Ignore
1713996749
Reply with quote  #2

1713996749
Report to moderator
You get merit points when someone likes your post enough to give you some. And for every 2 merit points you receive, you can send 1 merit point to someone else!
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713996749
Hero Member
*
Offline Offline

Posts: 1713996749

View Profile Personal Message (Offline)

Ignore
1713996749
Reply with quote  #2

1713996749
Report to moderator
1713996749
Hero Member
*
Offline Offline

Posts: 1713996749

View Profile Personal Message (Offline)

Ignore
1713996749
Reply with quote  #2

1713996749
Report to moderator
blandead
Newbie
*
Offline Offline

Activity: 46
Merit: 0


View Profile
February 17, 2012, 06:49:19 AM
 #3982

I thought the NDRangeKernel went like this...

cl_int clEnqueueNDRangeKernel (
cl_command_queue command_queue,
cl_kernel kernel, cl_uint work_dim,
const size_t *global_work_offset,
const size_t *global_work_size,
const size_t *local_work_size,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)

If you tell it the global_work_size and local_work_size, why would the global_work_offset be the starting value for get_global_id(0)? Or am I completely off the ball?

What runtime are you using? I'm using newest 12.2 preview runtime with no problems after adding opencl 1.2 preview drivers

Edit: Oh and Dia, in your CommandQueue try adding this property, "cl.command_queue_properties.OUT_OF_ORDER_EXEC_MODE_ENABLE"

I think it would be beneficial to start incorporating out of order execution in the kernels, especially if you are moving to the NDRangeKernel. After adding this in and a couple other changes, VECTORS8 is actually running faster than VECTORS4.. Getting very high output on my nonces
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 07:15:30 AM
 #3983

I thought the NDRangeKernel went like this...Edit: Oh and Dia, in your CommandQueue try adding this property, "cl.command_queue_properties.OUT_OF_ORDER_EXEC_MODE_ENABLE"
https://github.com/ckolivas/cgminer/blob/master/ocl.c#L710

Code:
clState->commandQueue = clCreateCommandQueue(clState->context, devices[gpu],
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, &status);

cgminer has used this for a very long time.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
Diapolo
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 17, 2012, 07:15:43 AM
 #3984

I thought the NDRangeKernel went like this...

cl_int clEnqueueNDRangeKernel (
cl_command_queue command_queue,
cl_kernel kernel, cl_uint work_dim,
const size_t *global_work_offset,
const size_t *global_work_size,
const size_t *local_work_size,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)

If you tell it the global_work_size and local_work_size, why would the global_work_offset be the starting value for get_global_id(0)? Or am I completely off the ball?

What runtime are you using? I'm using newest 12.2 preview runtime with no problems after adding opencl 1.2 preview drivers

Edit: Oh and Dia, in your CommandQueue try adding this property, "cl.command_queue_properties.OUT_OF_ORDER_EXEC_MODE_ENABLE"

I think it would be beneficial to start incorporating out of order execution in the kernels, especially if you are moving to the NDRangeKernel. After adding this in and a couple other changes, VECTORS8 is actually running faster than VECTORS4.. Getting very high output on my nonces

OoE is not supported on AMD GPUs afaik. But I'm sure Con has code to try to use it with the command queue, if available.

The driver, which contains the problematic Runtime is this one: http://support.amd.com/us/kbarticles/Pages/hd7700series7support.aspx

Global worksize is the number of global work-items that are processed in one kernel execution and the local worksize is the work-group size of work-items, that are executed in parallel and share __local memory.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 17, 2012, 07:25:47 AM
 #3985

I thought the NDRangeKernel went like this...Edit: Oh and Dia, in your CommandQueue try adding this property, "cl.command_queue_properties.OUT_OF_ORDER_EXEC_MODE_ENABLE"
https://github.com/ckolivas/cgminer/blob/master/ocl.c#L710

Code:
clState->commandQueue = clCreateCommandQueue(clState->context, devices[gpu],
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, &status);

cgminer has used this for a very long time.

As I wrote, I think OoE mode is not supported on AMD GPUs ... is there a debug or vebose message, if that mode was successfully activated?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 07:26:03 AM
 #3986

Hey Con,

The global_work_offset value you pass to clEnqueueNDRangeKernel is used as the starting value for get_global_id(0). If global_work_offset would be 10 and we had 5 work-items, the nonces generated would be 20, 21, 22, 23, 24, 25, 26, 27, 28 and 29 so we loose nonces from 10 to 19 and use ones, that should not be base ... would that create duplicate work? You pass -D GOFFSET to the kernel, right?
I suspect the problem is to do with overflow on 32 bit unsigned integers. Imagine a value close to 2^32. Since the values are doubled, and what happens on overflowing 32 bits is undefined, you may well get repeated ranges of nonces checked.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 07:26:46 AM
 #3987

I thought the NDRangeKernel went like this...Edit: Oh and Dia, in your CommandQueue try adding this property, "cl.command_queue_properties.OUT_OF_ORDER_EXEC_MODE_ENABLE"
https://github.com/ckolivas/cgminer/blob/master/ocl.c#L710

Code:
clState->commandQueue = clCreateCommandQueue(clState->context, devices[gpu],
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, &status);

cgminer has used this for a very long time.

As I wrote, I think OoE mode is not supported on AMD GPUs ... is there a debug or vebose message, if that mode was successfully activated?
It is successfully activated on windows and linux, but osx fails. It does not improve throughput with current GPUs but is harmless to enable for if/when they do.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 07:30:05 AM
 #3988

Edit: Oh and I have to warn you, yesterday I tested a new AMD driver, which has a new OpenCL runtime and I was unable to use CGMINER on Windows, so perhaps AMD "fixed" the binary generation but this breaks your solution from a few days ago, because the old message that no .bin could be generated was back. I switched back to a former runtime :-(. Seems to suck!
I don't think this is the old bug but a simple failure to produce a binary because of compilation of the kernel instead. Try running with -T -verbose and it will spew out any kernel compilation errors.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
TheHarbinger
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250


Why is it so damn hot in here?


View Profile
February 17, 2012, 07:42:24 AM
 #3989

Oh well, i guess i was wrong, i read that BAMT didn't work with cgminer, maybe they just mean it's not part of the original download which is fine. it didn't make sense to em either, but i fiugured it had something to do with some of the programming.

So i guess you just download BAMT and then download cgminer and you're all set?

what drivers and sdk does BAMT use?

BAMT already has cgminer installed and it is integrated into BAMT other tools (mgpumon, web monitor, gpumon, etc).

It is only cgminer 2.1.2 I believe but you can install newer copy if you like.  SDK is 2.4, I am not sure the driver.  It doesn't have 100% bug.

So it is write BAMT to flash drive.  Run fixer to grab latest updates, change 2 config files (1 for BAMT, 1 for cgminer) and you are mining.  You can then take that flash drive, record the image, put that image on 8 flash drives, put them in 8 rigs make a few changes via SSH and power up 20 GH/s farm in a few minutes.

For sure on the 2.4  Sad  I'm still trying to get it downgraded back to 2.1, Dropping my memclocks from 330 to 150 saves me a good amount of wattage on my 5@5870 rig.  If you figure out a way to do it, please share the info.

12Um6jfDE7q6crm1s6tSksMvda8s1hZ3Vj
Diapolo
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 17, 2012, 07:43:32 AM
 #3990

Edit: Oh and I have to warn you, yesterday I tested a new AMD driver, which has a new OpenCL runtime and I was unable to use CGMINER on Windows, so perhaps AMD "fixed" the binary generation but this breaks your solution from a few days ago, because the old message that no .bin could be generated was back. I switched back to a former runtime :-(. Seems to suck!
I don't think this is the old bug but a simple failure to produce a binary because of compilation of the kernel instead. Try running with -T -verbose and it will spew out any kernel compilation errors.

As the kernel compiles fine with AMDs KernelAnalyzer, I doubt it's an compilation error, I looked through the verbose messages yesterday and only saw clBuildProgram has failed (not the exact message), but got no real hint as to why this happens. Is there a check for a specific runtime version, which could cause that behaviour, because new runtime version is unknown ... we can debug this later, as I don't have access to my PC currently.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 17, 2012, 07:48:56 AM
 #3991

Hey Con,

The global_work_offset value you pass to clEnqueueNDRangeKernel is used as the starting value for get_global_id(0). If global_work_offset would be 10 and we had 5 work-items, the nonces generated would be 20, 21, 22, 23, 24, 25, 26, 27, 28 and 29 so we loose nonces from 10 to 19 and use ones, that should not be base ... would that create duplicate work? You pass -D GOFFSET to the kernel, right?
I suspect the problem is to do with overflow on 32 bit unsigned integers. Imagine a value close to 2^32. Since the values are doubled, and what happens on overflowing 32 bits is undefined, you may well get repeated ranges of nonces checked.

That would prove true for a base of (2147483647 - global worksize) because doubled it's over 2^32. How big are the nonce bases and what would be the global worksize for -I == 14 (as this is the maximum)?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 07:55:39 AM
 #3992

Hey Con,

The global_work_offset value you pass to clEnqueueNDRangeKernel is used as the starting value for get_global_id(0). If global_work_offset would be 10 and we had 5 work-items, the nonces generated would be 20, 21, 22, 23, 24, 25, 26, 27, 28 and 29 so we loose nonces from 10 to 19 and use ones, that should not be base ... would that create duplicate work? You pass -D GOFFSET to the kernel, right?
I suspect the problem is to do with overflow on 32 bit unsigned integers. Imagine a value close to 2^32. Since the values are doubled, and what happens on overflowing 32 bits is undefined, you may well get repeated ranges of nonces checked.

That would prove true for a base of (2147483647 - global worksize) because doubled it's over 2^32. How big are the nonce bases and what would be the global worksize for -I == 14 (as this is the maximum)?

Dia

global worksize = 2^(15 + intensity) so it's 29 max and it's double that for 2 vectors and so on...

cgminer always tries to test the entire range of nonces up to 2^32 so it will *always* get to a value above 2^31 where it will wrap with a doubling of the global id regardless of what intensity it's at.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
Diapolo
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 17, 2012, 07:59:49 AM
 #3993

Hey Con,

The global_work_offset value you pass to clEnqueueNDRangeKernel is used as the starting value for get_global_id(0). If global_work_offset would be 10 and we had 5 work-items, the nonces generated would be 20, 21, 22, 23, 24, 25, 26, 27, 28 and 29 so we loose nonces from 10 to 19 and use ones, that should not be base ... would that create duplicate work? You pass -D GOFFSET to the kernel, right?
I suspect the problem is to do with overflow on 32 bit unsigned integers. Imagine a value close to 2^32. Since the values are doubled, and what happens on overflowing 32 bits is undefined, you may well get repeated ranges of nonces checked.

That would prove true for a base of (2147483647 - global worksize) because doubled it's over 2^32. How big are the nonce bases and what would be the global worksize for -I == 14 (as this is the maximum)?

Dia

global worksize = 2^(15 + intensity) so it's 29 max and it's double that for 2 vectors and so on...

cgminer always tries to test the entire range of nonces up to 2^32 so it will *always* get to a value above 2^31 where it will wrap with a doubling of the global id regardless of what intensity it's at.

For -v 2 it should halve the global worksize (global Threads) number, so doubling this in the kernel should be no problem at all, right?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 08:05:40 AM
 #3994

For -v 2 it should halve the global worksize (global Threads) number, so doubling this in the kernel should be no problem at all, right?
global_id == global_work_offset  + global_worksize

You're doubling the global id, not the worksize.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 08:07:06 AM
 #3995

Having said all of that it may just be the value I'm passing since it expects an array and I'm passing a single value...

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
Diapolo
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 17, 2012, 08:09:28 AM
 #3996

For -v 2 it should halve the global worksize (global Threads) number, so doubling this in the kernel should be no problem at all, right?
global_id == global_work_offset  + global_worksize

You're doubling the global id, not the worksize.


Global ID without global offset is: 0 till (global worksize - 1) queried in the kernel via get_global_size(0). So for 2-component vectors we need a global worksize / 2 (as argument in the enqueue kernel call), because there are 2 nonces processed in each work-item.

Global ID with global offset is: global offset till (global worksize + global offset - 1).

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 17, 2012, 08:12:21 AM
 #3997

Having said all of that it may just be the value I'm passing since it expects an array and I'm passing a single value...

Right, should be an array, because the ND-range can be 3-dimensional and we only use 1-dimension.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 08:15:37 AM
 #3998

For -v 2 it should halve the global worksize (global Threads) number, so doubling this in the kernel should be no problem at all, right?
global_id == global_work_offset  + global_worksize

You're doubling the global id, not the worksize.


Global ID without global offset is: 0 till (global worksize - 1) queried in the kernel via get_global_size(0). So for 2-component vectors we need a global worksize / 2 (as argument in the enqueue kernel call), because there are 2 nonces processed in each work-item.

Global ID with global offset is: global offset till (global worksize + global offset - 1).

Dia
I still think it should be something like:
      u nonce = ((uint)get_global_id(0) + get_global_size(0) * (0,1,2,3));
for vectors4 etc.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
blandead
Newbie
*
Offline Offline

Activity: 46
Merit: 0


View Profile
February 17, 2012, 08:40:28 AM
 #3999

I thought the NDRangeKernel went like this...Edit: Oh and Dia, in your CommandQueue try adding this property, "cl.command_queue_properties.OUT_OF_ORDER_EXEC_MODE_ENABLE"
https://github.com/ckolivas/cgminer/blob/master/ocl.c#L710

Code:
clState->commandQueue = clCreateCommandQueue(clState->context, devices[gpu],
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, &status);

cgminer has used this for a very long time.

As I wrote, I think OoE mode is not supported on AMD GPUs ... is there a debug or vebose message, if that mode was successfully activated?
It is successfully activated on windows and linux, but osx fails. It does not improve throughput with current GPUs but is harmless to enable for if/when they do.

I saw a significant increase in average nonces being found and a 3 Mhash/sec higher throughput.


Modified from Dia's code I used the following...

self.commandQueue = cl.CommandQueue(self.context, self.device, cl.command_queue_properties.OUT_OF_ORDER_EXEC_MODE_ENABLE)

Then I saw that read buffer was set to this > cl.enqueue_read_buffer(self.commandQueue, self.output_buf, self.output, is_blocking=True)

Since OoE Mode will NOT work if is_blocking=True I set them all to false, and re-enabled self.commandQueue.finish()

Similarly I changed the write buffer

cl.enqueue_write_buffer(self.commandQueue, self.output_buf, self.output, is_blocking=False)

on the cl.output_buf I changed mem flags to cl.mem_flags.READ_WRITE | cl.mem_flags.COPY_HOST_PTR | cl.mem_flags.ALLOC_HOST_PTR

For Async to work the 11.11 AMD drivers tell you to add environmental variable to your system. GPU_ASYNC_MEM_COPY=2

Again, this might only be a 69xx feature, but for my 6970 I turn off BFI_INT and GOFFSET and increased my Memory speed and VECTORS8 was running at over 446 MHash/s. Now it'll find between 5-14 nonces per minute without choking up or freezing system. Before it was struggling to find 5 nonces per minute if at all.

Next, I want to add the Async functions

event_t async_work_group_copy (__local T *dst, const __global T *src,
size_t num_gentypes, event_t event)

event_t async_work_group_copy (__global T *dst, const __local T *src,
size_t num_gentypes, event_t event)

One is for global and other is for local work groups

Then create a prefetch for global cache

void prefetch (const __global T *p, size_t num_gentypes)
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 08:45:30 AM
 #4000

I thought the NDRangeKernel went like this...Edit: Oh and Dia, in your CommandQueue try adding this property, "cl.command_queue_properties.OUT_OF_ORDER_EXEC_MODE_ENABLE"
https://github.com/ckolivas/cgminer/blob/master/ocl.c#L710

Code:
clState->commandQueue = clCreateCommandQueue(clState->context, devices[gpu],
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, &status);

cgminer has used this for a very long time.

As I wrote, I think OoE mode is not supported on AMD GPUs ... is there a debug or vebose message, if that mode was successfully activated?
It is successfully activated on windows and linux, but osx fails. It does not improve throughput with current GPUs but is harmless to enable for if/when they do.

I saw a significant increase in average nonces being found and a 3 Mhash/sec higher throughput.


Modified from Dia's code I used the following...

self.commandQueue = cl.CommandQueue(self.context, self.device, cl.command_queue_properties.OUT_OF_ORDER_EXEC_MODE_ENABLE)

Then I saw that read buffer was set to this > cl.enqueue_read_buffer(self.commandQueue, self.output_buf, self.output, is_blocking=True)

Since OoE Mode will NOT work if is_blocking=True I set them all to false, and re-enabled self.commandQueue.finish()

Similarly I changed the write buffer

cl.enqueue_write_buffer(self.commandQueue, self.output_buf, self.output, is_blocking=False)

on the cl.output_buf I changed mem flags to cl.mem_flags.READ_WRITE | cl.mem_flags.COPY_HOST_PTR | cl.mem_flags.ALLOC_HOST_PTR

For Async to work the 11.11 AMD drivers tell you to add environmental variable to your system. GPU_ASYNC_MEM_COPY=2

Again, this might only be a 69xx feature, but for my 6970 I turn off BFI_INT and GOFFSET and increased my Memory speed and VECTORS8 was running at over 446 MHash/s. Now it'll find between 5-14 nonces per minute without choking up or freezing system. Before it was struggling to find 5 nonces per minute if at all.

Next, I want to add the Async functions

event_t async_work_group_copy (__local T *dst, const __global T *src,
size_t num_gentypes, event_t event)

event_t async_work_group_copy (__global T *dst, const __local T *src,
size_t num_gentypes, event_t event)

One is for global and other is for local work groups

Then create a prefetch for global cache

void prefetch (const __global T *p, size_t num_gentypes)

Again, you're not remotely talking about cgminer:

Code:
	status = clEnqueueReadBuffer(clState->commandQueue, clState->outputBuffer, CL_FALSE, 0,
BUFFERSIZE, thrdata->res, 0, NULL, NULL);
Please... seriously... I could take your advice if you were talking about how it relates here, but the stuff you're saying is not set is not cgminer...

edit: That's not to say you have nothing useful to add, but the signal to noise ratio gets low when you're talking about other code first and foremost.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
Pages: « 1 ... 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 [200] 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 ... 843 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!