Bitcoin Forum
April 25, 2024, 11:47:37 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 [201] 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 ... 843 »
  Print  
Author Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.1  (Read 5805212 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic. (3 posts by 1+ user deleted.)
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 09:03:58 AM
 #4001

For -v 2 it should halve the global worksize (global Threads) number, so doubling this in the kernel should be no problem at all, right?
global_id == global_work_offset  + global_worksize

You're doubling the global id, not the worksize.


Global ID without global offset is: 0 till (global worksize - 1) queried in the kernel via get_global_size(0). So for 2-component vectors we need a global worksize / 2 (as argument in the enqueue kernel call), because there are 2 nonces processed in each work-item.

Global ID with global offset is: global offset till (global worksize + global offset - 1).

Dia
I still think it should be something like:
      u nonce = ((uint)get_global_id(0) + get_global_size(0) * (0,1,2,3));
for vectors4 etc.

Anyway I ended up trying it both ways with your nonce code or mine and neither led to any improvement (actually detriment if anything).

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
According to NIST and ECRYPT II, the cryptographic algorithms used in Bitcoin are expected to be strong until at least 2030. (After that, it will not be too difficult to transition to different algorithms.)
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
blandead
Newbie
*
Offline Offline

Activity: 46
Merit: 0


View Profile
February 17, 2012, 09:07:22 AM
 #4002

If you're trying to use vectors then there is a type mis-match either stick with putting a (u) in front or use (uint4) and the (0, 1, 2, 3) should be on the outside parenthesis.

Here is a float4 example...

float4 f = (float4)(1.0f, 2.0f, 3.0f, 4.0f);

Also in the _kernal void search if you keep (_global uint * output) then you're not really utilizing vectors correctly

And, sorry was just trying to provide some general feedback with Out of Order Execution, wasn't trying to offend you, I'm just not sure how to edit cgminer directly.
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 09:27:57 AM
 #4003

If you're trying to use vectors then there is a type mis-match either stick with putting a (u) in front or use (uint4) and the (0, 1, 2, 3) should be on the outside parenthesis.

Here is a float4 example...

float4 f = (float4)(1.0f, 2.0f, 3.0f, 4.0f);

Also in the _kernal void search if you keep (_global uint * output) then you're not really utilizing vectors correctly

And, sorry was just trying to provide some general feedback with Out of Order Execution, wasn't trying to offend you, I'm just not sure how to edit cgminer directly.
Thanks.

Are you saying the existing code is losing shares with __global uint * output? 99% of users on cgminer are currently using 2 vectors. Again I doubt that's the case.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
Diapolo
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 17, 2012, 09:29:18 AM
 #4004

For -v 2 it should halve the global worksize (global Threads) number, so doubling this in the kernel should be no problem at all, right?
global_id == global_work_offset  + global_worksize

You're doubling the global id, not the worksize.


Global ID without global offset is: 0 till (global worksize - 1) queried in the kernel via get_global_size(0). So for 2-component vectors we need a global worksize / 2 (as argument in the enqueue kernel call), because there are 2 nonces processed in each work-item.

Global ID with global offset is: global offset till (global worksize + global offset - 1).

Dia
I still think it should be something like:
      u nonce = ((uint)get_global_id(0) + get_global_size(0) * (0,1,2,3));
for vectors4 etc.


Thinking loud again:

get_global_id(0) == ranges from global_offset for the 1st work-item till (global_offset + (global_worksize - 1)) for the last work-item
get_global_size(0) == global_worksize (constant value)

global_offset == nonce-base, that results in:

nonce.x = nonce-base + global_worksize * 0;
nonce.y = nonce-base + global_worksize * 1;
nonce.z = nonce-base + global_worksize * 2;
nonce.w = nonce-base + global_worksize * 3;

Let's consider 10 as nonce-base and 4 as global_worksize. This leads to the following nonces that get checked during 1 kernel execution:

Work-Item 0:
10 + 4 * 0 = 10
10 + 4 * 1 = 14
10 + 4 * 2 = 18
10 + 4 * 3 = 22

Work-Item 1:
11 + 4 * 0 = 11
11 + 4 * 1 = 15
11 + 4 * 2 = 19
11 + 4 * 3 = 23

Work-Item 2:
12 + 4 * 0 = 12
12 + 4 * 1 = 16
12 + 4 * 2 = 20
12 + 4 * 3 = 24

Work-Item 0:
13 + 4 * 0 = 13
13 + 4 * 1 = 17
13 + 4 * 2 = 21
13 + 4 * 3 = 25

So we have nonces from 10 to 25

Now if we divide the passed global worksize by 4 (because of 4-component vector usage in your example) and use 1 for it this leads to:

Work-Item 0:
10 + 1 * 0 = 10
10 + 1 * 1 = 11
10 + 1 * 2 = 12
10 + 1 * 3 = 13

So I guess your code works, if you divide the global worksize by the vec-size before passing that argument to clEnueueNDRangeKernel.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 09:40:43 AM
 #4005

For -v 2 it should halve the global worksize (global Threads) number, so doubling this in the kernel should be no problem at all, right?
global_id == global_work_offset  + global_worksize

You're doubling the global id, not the worksize.


Global ID without global offset is: 0 till (global worksize - 1) queried in the kernel via get_global_size(0). So for 2-component vectors we need a global worksize / 2 (as argument in the enqueue kernel call), because there are 2 nonces processed in each work-item.

Global ID with global offset is: global offset till (global worksize + global offset - 1).

Dia
I still think it should be something like:
      u nonce = ((uint)get_global_id(0) + get_global_size(0) * (0,1,2,3));
for vectors4 etc.


Thinking loud again:

get_global_id(0) == ranges from global_offset for the 1st work-item till (global_offset + (global_worksize - 1)) for the last work-item
get_global_size(0) == global_worksize (constant value)

global_offset == nonce-base, that results in:

nonce.x = nonce-base + global_worksize * 0;
nonce.y = nonce-base + global_worksize * 1;
nonce.z = nonce-base + global_worksize * 2;
nonce.w = nonce-base + global_worksize * 3;

Let's consider 10 as nonce-base and 4 as global_worksize. This leads to the following nonces that get checked during 1 kernel execution:

Work-Item 0:
10 + 4 * 0 = 10
10 + 4 * 1 = 14
10 + 4 * 2 = 18
10 + 4 * 3 = 22

Work-Item 1:
11 + 4 * 0 = 11
11 + 4 * 1 = 15
11 + 4 * 2 = 19
11 + 4 * 3 = 23

Work-Item 2:
12 + 4 * 0 = 12
12 + 4 * 1 = 16
12 + 4 * 2 = 20
12 + 4 * 3 = 24

Work-Item 0:
13 + 4 * 0 = 13
13 + 4 * 1 = 17
13 + 4 * 2 = 21
13 + 4 * 3 = 25

So we have nonces from 10 to 25

Now if we divide the passed global worksize by 4 (because of 4-component vector usage in your example) and use 1 for it this leads to:

Work-Item 0:
10 + 1 * 0 = 10
10 + 1 * 1 = 11
10 + 1 * 2 = 12
10 + 1 * 3 = 13

So I guess your code works, if you divide the global worksize by the vec-size before passing that argument to clEnueueNDRangeKernel.

Thanks. My issue with your code being:
Code:
u nonce = ((uint)get_global_id(0) << 2) + (u)(0, 1, 2, 3);
as I said is it really won't be testing the nonce range we're asking it to test. If "base" is 2^31 and worksize is 2^24 (intensity 9), then get_global_id(0) will return 2^31 for the very first thread. Then if we shift it << 2 it's going to be undefined and in most implementations will just be zero again. Which means we'll be repeating 2^24 operations on nonces 0 - 2^24, which we would have done initially on getting that work item.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
Diapolo
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 17, 2012, 09:55:10 AM
Last edit: February 17, 2012, 10:17:36 AM by Diapolo
 #4006

Quote from: ckolivas
Thanks. My issue with your code being:
Code:
u nonce = ((uint)get_global_id(0) << 2) + (u)(0, 1, 2, 3);
as I said is it really won't be testing the nonce range we're asking it to test. If "base" is 2^31 and worksize is 2^24 (intensity 9), then get_global_id(0) will return 2^31 for the very first thread. Then if we shift it << 2 it's going to be undefined and in most implementations will just be zero again. Which means we'll be repeating 2^24 operations on nonces 0 - 2^24, which we would have done initially on getting that work item.

get_global_id(0) for the very first thread is simply base, if passed as global_work_offset parameter. So range is from "base" till "base + (2^24 - 1)". If base is 2^31 and we shift left by 2 for Vec4, you are right and we are undefined here. Code is easy for no vectors, but wrong for vectors in it's current form.

Edit: Would that work for Vec2 (base: 10 / global-worksize: 4)?
Code:
u nonce = (uint)get_global_id(0) + (u)(0, (uint)get_global_size(0));

base 10 the nonces would be: 10, 14, 11, 15, 12, 16, 13, 17

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 10:00:14 AM
 #4007

Code:
#if defined VECTORS4
#ifdef GOFFSET
u nonce = (uint)get_global_id(0) + (u)(0, get_global_size(0), get_global_size(0) << 1, get_global_size(0) * 3);
#else
u nonce = ((uint)get_group_id(0) * (uint)get_local_size(0) << 2) + ((uint)get_local_id(0) << 2) + base;
#endif
#elif defined VECTORS2
#ifdef GOFFSET
u nonce = (uint)get_global_id(0) + (u)(0, get_global_size(0));
#else
u nonce = ((uint)get_group_id(0) * (uint)get_local_size(0) << 1) + ((uint)get_local_id(0) << 1) + base;
#endif
#else
should do it

and cgminer already takes vectors into account when increasing nonce value to pass to base on the next pass. This doesn't change it. cgminer effectively sends twice as much work when vectors go from 1 to 2 so the intensity is effectively different at different vectors.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 10:05:02 AM
 #4008

Btw, the performance of it is pretty average, after all that discussion...

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
Diapolo
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 17, 2012, 10:20:02 AM
 #4009

Btw, the performance of it is pretty average, after all that discussion...

Perhaps the changes needed to make it work ate the small benefits the solution offers ... but I had to LOL when I saw we came up with the same solution ^^. I posted and read your version after that and they look equal for VEC2 Cheesy.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 10:48:19 AM
 #4010

Btw, the performance of it is pretty average, after all that discussion...

Perhaps the changes needed to make it work ate the small benefits the solution offers ... but I had to LOL when I saw we came up with the same solution ^^. I posted and read your version after that and they look equal for VEC2 Cheesy.
Cheesy I'd say you're right. Oh well, always other things to try.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 11:23:20 AM
 #4011

Btw, the performance of it is pretty average, after all that discussion...

Perhaps the changes needed to make it work ate the small benefits the solution offers ... but I had to LOL when I saw we came up with the same solution ^^. I posted and read your version after that and they look equal for VEC2 Cheesy.
Cheesy I'd say you're right. Oh well, always other things to try.
You know I could make cgminer "skip" nonce ranges when it's using goffset so that the code can work with less ops. This will drop efficiency though since it will decrease the amount of work a device gets before it needs new work.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 11:32:44 AM
 #4012

Meh, it ended up being of no advantage for unnecessary complexity.
* ckolivas forgets all about goffset for now.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
Proofer
Member
**
Offline Offline

Activity: 266
Merit: 36


View Profile
February 17, 2012, 02:09:17 PM
 #4013

... 99% of users on cgminer are currently using 2 vectors. ...

We p2pool users were advised to use -v 1; are we the 1%?
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 02:12:34 PM
 #4014

... 99% of users on cgminer are currently using 2 vectors. ...

We p2pool users were advised to use -v 1; are we the 1%?
No, you were advised to use -g 1
I don't recall saying to use one vector for p2pool and if anyone did say that, they're wrong.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
-ck (OP)
Legendary
*
Offline Offline

Activity: 4088
Merit: 1631


Ruu \o/


View Profile WWW
February 17, 2012, 03:14:54 PM
 #4015

More AMD breakage coming up. As Diapolo hinted earlier, there is a new AMD driver 12.2 with an SDK that claims to be sdk 2.6 but  comes up with the version number 898.1. It breaks cgminer completely making it unable to build any binaries.  Angry I have yet to investigate why but please do not upgrade unless you already have .bin files that work. I'm going to start a collection of bin files that people may be able to download and they'll be housed here:

http://ck.kolivas.org/apps/cgminer/bins/

Notably there are Tahiti (7970) .bins for 32 bit (long4) and 64 bit (long8) as these depend on sdk2.6 and people may well get a nasty surprise if they try to get it working now with the latest drivers.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
Diapolo
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 17, 2012, 03:21:28 PM
 #4016

More AMD breakage coming up. As Diapolo hinted earlier, there is a new AMD driver 12.2 with an SDK that claims to be sdk 2.6 but  comes up with the version number 898.1. It breaks cgminer completely making it unable to build any binaries.  Angry I have yet to investigate why but please do not upgrade unless you already have .bin files that work. I'm going to start a collection of bin files that people may be able to download and they'll be housed here:

http://ck.kolivas.org/apps/cgminer/bins/

Notably there are Tahiti (7970) .bins for 32 bit (long4) and 64 bit (long8) as these depend on sdk2.6 and people may well get a nasty surprise if they try to get it working now with the latest drivers.

As I said, bad stuff incoming ... here are some version strings from Windows:

platform version: OpenCL 1.1 AMD-APP (898.1)

device infos (verified to be equal on Tahiti and BeaverCreek):
OpenCL software driver version: CAL 1.4.1703 (VM)
supported OpenCL version (FULL_PROFILE): OpenCL 1.1 AMD-APP (898.1)

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1079


Gerald Davis


View Profile
February 17, 2012, 03:28:05 PM
 #4017

More AMD breakage coming up. As Diapolo hinted earlier, there is a new AMD driver 12.2 with an SDK that claims to be sdk 2.6 but  comes up with the version number 898.1. It breaks cgminer completely making it unable to build any binaries.  Angry I have yet to investigate why but please do not upgrade unless you already have .bin files that work. I'm going to start a collection of bin files that people may be able to download and they'll be housed here:

http://ck.kolivas.org/apps/cgminer/bins/

Notably there are Tahiti (7970) .bins for 32 bit (long4) and 64 bit (long8) as these depend on sdk2.6 and people may well get a nasty surprise if they try to get it working now with the latest drivers.

Alternatively you could install upgrade but (in windows) select custom install and UNCHECK SDK.  Not sure if 12.2 has any notable changes compared to 12.1 but if it does that is way to get "improved" (Huh with AMD deproved) and keep existing SDK installation.

Diapolo
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 17, 2012, 03:38:22 PM
 #4018

More AMD breakage coming up. As Diapolo hinted earlier, there is a new AMD driver 12.2 with an SDK that claims to be sdk 2.6 but  comes up with the version number 898.1. It breaks cgminer completely making it unable to build any binaries.  Angry I have yet to investigate why but please do not upgrade unless you already have .bin files that work. I'm going to start a collection of bin files that people may be able to download and they'll be housed here:

http://ck.kolivas.org/apps/cgminer/bins/

Notably there are Tahiti (7970) .bins for 32 bit (long4) and 64 bit (long8) as these depend on sdk2.6 and people may well get a nasty surprise if they try to get it working now with the latest drivers.

Alternatively you could install upgrade but (in windows) select custom install and UNCHECK SDK.  Not sure if 12.2 has any notable changes compared to 12.1 but if it does that is way to get "improved" (Huh with AMD deproved) and keep existing SDK installation.



Yes, for now one should uncheck OpenCL Runtime during Catalyst upgrade until CGMINER is fixed.
My first look made me scream on another fact, they did heavy work on their OpenCL compiler, which tends to behave again very differently compared to former versions ... seems like more work in the future Cheesy (it looks like they preffer vector GPRs over scalar GPRs with the new runtime as there are massive shifts in GPR usage).

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
JWU42
Legendary
*
Offline Offline

Activity: 1666
Merit: 1000


View Profile
February 17, 2012, 03:50:28 PM
 #4019

The latest 12.2 pre has 898.1 SDK (2/16/12 release).  I had to manually remove the .dll files and run the 12.1 installer to get the older 854.1 SDK.

dlasher
Sr. Member
****
Offline Offline

Activity: 467
Merit: 250



View Profile WWW
February 17, 2012, 03:57:51 PM
 #4020

cgminer sets clocks all back to default on exit... if it exits cleanly, and of course on windows it's a miracle when it does.

Sadly this is not the case.

windows machine, pair of 6950's, set to 850/1300 for normal operation, in cgminer they are set to 700-880/300... when cgminer exits it leaves the cards at 880/300.

even updated to 12.1 drivers, both 2.4 and 2.6 SDK.



Is it exiting or crashing?

I run cgminer from a .bat file in windows, and when I hit the "Q" button, it gives me the 2-page summary of stats, then sits for about 3 seconds and then closes the dos window. I'm not getting any complaints from windows about it crashing/hanging/etc.

ckolivas: I know you have the impossible task of trying to make an app play nice with windows. I'm HAPPY to run any sort of debug load/options you might have to give better/more-detailed information.



Pages: « 1 ... 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 [201] 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 ... 843 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!