Bitcoin Forum
May 31, 2024, 01:56:46 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
  Home Help Search Login Register More  
  Show Posts
Pages: « 1 ... 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 [515] 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 ... 570 »
10281  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 09:03:58 AM
For -v 2 it should halve the global worksize (global Threads) number, so doubling this in the kernel should be no problem at all, right?
global_id == global_work_offset  + global_worksize

You're doubling the global id, not the worksize.


Global ID without global offset is: 0 till (global worksize - 1) queried in the kernel via get_global_size(0). So for 2-component vectors we need a global worksize / 2 (as argument in the enqueue kernel call), because there are 2 nonces processed in each work-item.

Global ID with global offset is: global offset till (global worksize + global offset - 1).

Dia
I still think it should be something like:
      u nonce = ((uint)get_global_id(0) + get_global_size(0) * (0,1,2,3));
for vectors4 etc.

Anyway I ended up trying it both ways with your nonce code or mine and neither led to any improvement (actually detriment if anything).
10282  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 08:45:30 AM
I thought the NDRangeKernel went like this...Edit: Oh and Dia, in your CommandQueue try adding this property, "cl.command_queue_properties.OUT_OF_ORDER_EXEC_MODE_ENABLE"
https://github.com/ckolivas/cgminer/blob/master/ocl.c#L710

Code:
clState->commandQueue = clCreateCommandQueue(clState->context, devices[gpu],
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, &status);

cgminer has used this for a very long time.

As I wrote, I think OoE mode is not supported on AMD GPUs ... is there a debug or vebose message, if that mode was successfully activated?
It is successfully activated on windows and linux, but osx fails. It does not improve throughput with current GPUs but is harmless to enable for if/when they do.

I saw a significant increase in average nonces being found and a 3 Mhash/sec higher throughput.


Modified from Dia's code I used the following...

self.commandQueue = cl.CommandQueue(self.context, self.device, cl.command_queue_properties.OUT_OF_ORDER_EXEC_MODE_ENABLE)

Then I saw that read buffer was set to this > cl.enqueue_read_buffer(self.commandQueue, self.output_buf, self.output, is_blocking=True)

Since OoE Mode will NOT work if is_blocking=True I set them all to false, and re-enabled self.commandQueue.finish()

Similarly I changed the write buffer

cl.enqueue_write_buffer(self.commandQueue, self.output_buf, self.output, is_blocking=False)

on the cl.output_buf I changed mem flags to cl.mem_flags.READ_WRITE | cl.mem_flags.COPY_HOST_PTR | cl.mem_flags.ALLOC_HOST_PTR

For Async to work the 11.11 AMD drivers tell you to add environmental variable to your system. GPU_ASYNC_MEM_COPY=2

Again, this might only be a 69xx feature, but for my 6970 I turn off BFI_INT and GOFFSET and increased my Memory speed and VECTORS8 was running at over 446 MHash/s. Now it'll find between 5-14 nonces per minute without choking up or freezing system. Before it was struggling to find 5 nonces per minute if at all.

Next, I want to add the Async functions

event_t async_work_group_copy (__local T *dst, const __global T *src,
size_t num_gentypes, event_t event)

event_t async_work_group_copy (__global T *dst, const __local T *src,
size_t num_gentypes, event_t event)

One is for global and other is for local work groups

Then create a prefetch for global cache

void prefetch (const __global T *p, size_t num_gentypes)

Again, you're not remotely talking about cgminer:

Code:
	status = clEnqueueReadBuffer(clState->commandQueue, clState->outputBuffer, CL_FALSE, 0,
BUFFERSIZE, thrdata->res, 0, NULL, NULL);
Please... seriously... I could take your advice if you were talking about how it relates here, but the stuff you're saying is not set is not cgminer...

edit: That's not to say you have nothing useful to add, but the signal to noise ratio gets low when you're talking about other code first and foremost.
10283  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 08:15:37 AM
For -v 2 it should halve the global worksize (global Threads) number, so doubling this in the kernel should be no problem at all, right?
global_id == global_work_offset  + global_worksize

You're doubling the global id, not the worksize.


Global ID without global offset is: 0 till (global worksize - 1) queried in the kernel via get_global_size(0). So for 2-component vectors we need a global worksize / 2 (as argument in the enqueue kernel call), because there are 2 nonces processed in each work-item.

Global ID with global offset is: global offset till (global worksize + global offset - 1).

Dia
I still think it should be something like:
      u nonce = ((uint)get_global_id(0) + get_global_size(0) * (0,1,2,3));
for vectors4 etc.
10284  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 08:07:06 AM
Having said all of that it may just be the value I'm passing since it expects an array and I'm passing a single value...
10285  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 08:05:40 AM
For -v 2 it should halve the global worksize (global Threads) number, so doubling this in the kernel should be no problem at all, right?
global_id == global_work_offset  + global_worksize

You're doubling the global id, not the worksize.
10286  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 07:55:39 AM
Hey Con,

The global_work_offset value you pass to clEnqueueNDRangeKernel is used as the starting value for get_global_id(0). If global_work_offset would be 10 and we had 5 work-items, the nonces generated would be 20, 21, 22, 23, 24, 25, 26, 27, 28 and 29 so we loose nonces from 10 to 19 and use ones, that should not be base ... would that create duplicate work? You pass -D GOFFSET to the kernel, right?
I suspect the problem is to do with overflow on 32 bit unsigned integers. Imagine a value close to 2^32. Since the values are doubled, and what happens on overflowing 32 bits is undefined, you may well get repeated ranges of nonces checked.

That would prove true for a base of (2147483647 - global worksize) because doubled it's over 2^32. How big are the nonce bases and what would be the global worksize for -I == 14 (as this is the maximum)?

Dia

global worksize = 2^(15 + intensity) so it's 29 max and it's double that for 2 vectors and so on...

cgminer always tries to test the entire range of nonces up to 2^32 so it will *always* get to a value above 2^31 where it will wrap with a doubling of the global id regardless of what intensity it's at.
10287  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 07:30:05 AM
Edit: Oh and I have to warn you, yesterday I tested a new AMD driver, which has a new OpenCL runtime and I was unable to use CGMINER on Windows, so perhaps AMD "fixed" the binary generation but this breaks your solution from a few days ago, because the old message that no .bin could be generated was back. I switched back to a former runtime :-(. Seems to suck!
I don't think this is the old bug but a simple failure to produce a binary because of compilation of the kernel instead. Try running with -T -verbose and it will spew out any kernel compilation errors.
10288  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 07:26:46 AM
I thought the NDRangeKernel went like this...Edit: Oh and Dia, in your CommandQueue try adding this property, "cl.command_queue_properties.OUT_OF_ORDER_EXEC_MODE_ENABLE"
https://github.com/ckolivas/cgminer/blob/master/ocl.c#L710

Code:
clState->commandQueue = clCreateCommandQueue(clState->context, devices[gpu],
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, &status);

cgminer has used this for a very long time.

As I wrote, I think OoE mode is not supported on AMD GPUs ... is there a debug or vebose message, if that mode was successfully activated?
It is successfully activated on windows and linux, but osx fails. It does not improve throughput with current GPUs but is harmless to enable for if/when they do.
10289  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 07:26:03 AM
Hey Con,

The global_work_offset value you pass to clEnqueueNDRangeKernel is used as the starting value for get_global_id(0). If global_work_offset would be 10 and we had 5 work-items, the nonces generated would be 20, 21, 22, 23, 24, 25, 26, 27, 28 and 29 so we loose nonces from 10 to 19 and use ones, that should not be base ... would that create duplicate work? You pass -D GOFFSET to the kernel, right?
I suspect the problem is to do with overflow on 32 bit unsigned integers. Imagine a value close to 2^32. Since the values are doubled, and what happens on overflowing 32 bits is undefined, you may well get repeated ranges of nonces checked.
10290  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 07:15:30 AM
I thought the NDRangeKernel went like this...Edit: Oh and Dia, in your CommandQueue try adding this property, "cl.command_queue_properties.OUT_OF_ORDER_EXEC_MODE_ENABLE"
https://github.com/ckolivas/cgminer/blob/master/ocl.c#L710

Code:
clState->commandQueue = clCreateCommandQueue(clState->context, devices[gpu],
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, &status);

cgminer has used this for a very long time.
10291  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 06:00:33 AM
Nonce of that code makes much sense, but I think Dia is suggesting that he wants to use the natively built-in global_work_offset parameter instead of the one you reference by using clEnqueueNDRangeKernel
Umm... I actually do understand the code  Wink I was explaining what was wrong with it.
10292  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 05:42:37 AM
Con, maybe we can talk about integrating global offset parameter support into CGMINER?

Take a short look at http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNDRangeKernel.html and the global_work_offset parameter. All that has to be taken into consideration from the kernel-side is in DiaKGCN. OpenCL 1.1 detection is in your code, too, which is needed, but I can't do the other required changes without a compiler.

In short, the nonce-base is not supplied via the base parameter, if GOFFSET is enabled, but instead via the global_work_offset parameter and used via the global work-item ID in the kernel. This saves a few instructions and can give us a small boost.

Dia
I tried writing code to send nonce as the global offset parameter and your code returned duplicate work on 2 or more vectors. Looking at your code:
Code:
u nonce = ((uint)get_global_id(0) << 1) + (u)(0, 1);
This won't be right as global id will now be the global thread id + the global offset parameter so doubling it will give random results.
10293  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 05:24:29 AM
cgminer sets clocks all back to default on exit... if it exits cleanly, and of course on windows it's a miracle when it does.

Sadly this is not the case.

windows machine, pair of 6950's, set to 850/1300 for normal operation, in cgminer they are set to 700-880/300... when cgminer exits it leaves the cards at 880/300.

even updated to 12.1 drivers, both 2.4 and 2.6 SDK.


Instead windows decides to blow your balls off and feed them back to you by collecting up the splatter, mincing it, putting it into a glass and forcing you to drink it through a straw via  your left nostril.
I'll try and code a workaround for this windows fail next time by stopping mining and resetting device values and pausing for a bit before letting windows crash cgminer when it tries to exit.
10294  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 04:05:33 AM
cgminer sets clocks all back to default on exit... if it exits cleanly, and of course on windows it's a miracle when it does.

Sadly this is not the case.

windows machine, pair of 6950's, set to 850/1300 for normal operation, in cgminer they are set to 700-880/300... when cgminer exits it leaves the cards at 880/300.

even updated to 12.1 drivers, both 2.4 and 2.6 SDK.


Instead windows decides to blow your balls off and feed them back to you by collecting up the splatter, mincing it, putting it into a glass and forcing you to drink it through a straw via  your left nostril.
10295  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 17, 2012, 12:15:42 AM
Okay so I tested the fastest diablominer has to offer on 7970 and current cgminer is 1.5 MHash faster with defaults, so I'm pleased Smiley. I guess I should keep working on my kernel Wink
10296  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 16, 2012, 08:52:47 PM
Con, maybe we can talk about integrating global offset parameter support into CGMINER?

Take a short look at http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNDRangeKernel.html and the global_work_offset parameter. All that has to be taken into consideration from the kernel-side is in DiaKGCN. OpenCL 1.1 detection is in your code, too, which is needed, but I can't do the other required changes without a compiler.

In short, the nonce-base is not supplied via the base parameter, if GOFFSET is enabled, but instead via the global_work_offset parameter and used via the global work-item ID in the kernel. This saves a few instructions and can give us a small boost.
Yes I like that idea. But 8 and 16 vectors perform shithouse followed by appalling so it's not worth pursuing those.
10297  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 16, 2012, 11:29:52 AM
You are but 1 Mhash off my current poclbm kernel with that Wink Diakgcn 716.5 versus ck-poclbm 717.5
Sounds not too bad Cheesy. I will try -w 128 and compare results on my machine ... will report back, which one is faster for me at default clocks and on Win7 x64. I'm happy now!
There is something unusual about it running diakgcn and the hashrate appears to be more unstable, rising and falling more so it takes a while to get a reasonable grasp for what the hashrate really is. Since you're running 2 vectors, it effectively makes the hashrate update half as often as running my poclbm kernel since I use no vectors. Either way, the hashrates are really close.
10298  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 16, 2012, 11:20:01 AM
With Diapolo's help we finally got the diakgcn kernel working on cgminer. I've just committed code to the git tree which makes it work. Alas at the same engine and clock speeds on the 7970, diakgcn gives me 699 MHash while my customised kernel gives me 717 MHash. But now that it's working, he may be able to tweak it further...

Did you use -v 2 with DiaKGCN for your test? I'm not sure why, but it has always been faster on Phoenix to use it with -v 2.

Dia
Tried it, much slower.

Did you use the code from the latest commits, here on Windows it's definitely faster with -v 2.

-I 9 -k diakgcn -d 0 -v 1 -w 256: ~524 MH/s
-I 9 -k diakgcn -d 0 -v 2 -w 256: ~539 MH/s
Indeed it is faster now, and with -v 2, it is fastest allow cgminer to choose worksize which is ends up being 128 - it queries the "preferred worksize" and divides that by number of vectors.

You are but 1 Mhash off my current poclbm kernel with that Wink Diakgcn 716.5 versus ck-poclbm 717.5
10299  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 16, 2012, 10:58:16 AM
Thanks for your help with integrating DiaKGCN Con!
And thank you for the code. Now what advantage is there to reordering the variables passed to the kernel? In commit d86a38d1e75090e1ffb9df9e68aa13b1c8dcf9ec you shuffled arguments which appear to be mostly cosmetic. Would that be right?

That's true for the most part when reordering kernel-arguments to be in line with usage of them in the code, but I think it's part of a clean code, don't you think so?

So you created a new DiaKGCN branch, will this last so I can switch my local repo to that one Smiley?

Dia
Yes, please do, and if things go well, we can have a new default kernel for GCN Wink
10300  Bitcoin / Mining software (miners) / Re: CGMINER GPU bitforce overclock monitor fanspeed GCN RPC linux/windows/osx 2.2.6 on: February 16, 2012, 10:28:47 AM
Thanks for your help with integrating DiaKGCN Con!
And thank you for the code. Now what advantage is there to reordering the variables passed to the kernel? In commit d86a38d1e75090e1ffb9df9e68aa13b1c8dcf9ec you shuffled arguments which appear to be mostly cosmetic. Would that be right?
Pages: « 1 ... 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 [515] 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 ... 570 »
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!