Latest posts of: -ck

Quote from: DutchBrat on February 25, 2012, 02:01:07 AM

Hi Ckolivas,

Downloaded the new version, let it run for a while while I was watching some videos on my pc (Windows XP), running the new version with Dynamic Intensity (1 thread automatically disabled). Then I changed the Intensity to 8 (running a 5800) and the following weird message came up:

G[P2U0 01:2 2-8002.-92 /5 2 0810:.530 M:h0/4s] | T Ah:r5e7a9d R 1: b1e HiWn:g0
re U-:e3n.a7b1l/emd I
8

Yeah the curses interface just scrambles output occasionally. Harmless.

Quote from: Vbs on February 25, 2012, 01:21:58 AM

min(x,y) http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/commonMin.html
gets implemented low-level as

Code:

w: MIN_UINT    R0.w,  R0.x,  PV1350.y

, which *should* (I know, AMD... Roll Eyes

) be rather stable. The big problem with the alternative (&) is the huge number of false positives, since it's bitwise, like 01010011 & 10101100 = 00000000, which is bad for the branch predictor. I'm testing now with a conservative approach (just this one change from default),

Code:

#elif defined VECTORS2
	bool result = min(W[117].x,W[117].y);
	if (!result) {
		if (!W[117].x)
			output[FOUND] = output[NFLAG & W[3].x] = W[3].x;
		if (!W[117].y)
			output[FOUND] = output[NFLAG & W[3].y] = W[3].y;
	}

and got a slight (3~4MH/s) increase (5850, SDK 2.5 from Cat 11.11).

You can do the maths on false positives. You're greatly exaggerating the "HUGE NUMBER". It's about 1 share for 1 false positive. More so on 4 vectors (but no one uses them). That is not remotely common...

Increase eh?

Call me sceptical to the core.

EDIT: I will look into it, but I'm so terrified of unintentionally breaking shit like I did last time. It was in this code specifically where the slowdown was, so you can imagine why I'm so resistant.

Quote from: Dyaheon on February 25, 2012, 12:06:31 AM

SDK 2.4:
GPU 1: 51.5C 1569RPM | 375.7/375.7Mh/s | A: 98 R:0 HW:0 U: 4.86/m I:10
GPU 2: 55.0C 1569RPM | 375.7/375.7Mh/s | A: 97 R:0 HW:0 U: 4.81/m I:10

SDK 2.1:
GPU 0: 82.5C 3840RPM | 375.6/375.4Mh/s | A:457 R:2 HW:0 U: 5.27/m I:10
GPU 1: 82.5C 3840RPM | 375.4/375.4Mh/s | A:477 R:0 HW:0 U: 5.50/m I:10
So it seems that 2.4 is very slightly better at 300 memclocks and I 10, 1 thread.

I would say the difference is below noise levels, so I would say they perform identically on that hardware/software combo.

Quote from: someone703 on February 24, 2012, 10:08:05 PM

Alright thanks guys, figured it out sorta but it led me to a different question.

GPUs 0/1/2 are well withing acceptable VDDC temps at 825/150 @ 1.049v. This is the one (GPU3) that was hitting high VDDC temps and I downclocked that one in particular and lowered the voltage so that the temps are more acceptable as shown here (610/150 @ 0.899v):

So my question is: I downclocked GPU3 so shouldn't it be giving ~250 mhash/s instead of GPU2?

GPU order as returned by the display library doesn't always correlate with the device order and cgminer can't tell so it simply reports what it's told by the ATI display driver. You can try the --gpu-reorder switch with cgminer which makes cgminer try to correlate them based on the PCI BUS ID.

Quote from: Vbs on February 24, 2012, 07:40:49 PM

Thanks for this mate. This means that the probability of finding 2 hashes in the same vector is 1/(4.3e9*4.3e9)), which is infinitesimally close to 1/inf ~= 0. This allows for a further optimization of the code. Using a VECTORS2 example,

Code:

#elif defined VECTORS2
	bool result = min(W[117].x,W[117].y);
	if (!result) {
		if (!W[117].x)
			output[FOUND] = output[NFLAG & W[3].x] = W[3].x;
		else //if (!W[117].y)
			output[FOUND] = output[NFLAG & W[3].y] = W[3].y;
	}

Since min() takes care of the false positives, the 'else' branch is only true when W[117].y==0. The result in the KernelAnalyzer for a 5870 is:

Code:

phatk 120223 -> cycles: min:67.65, max:68.15, avg:67.82, alu:1363
phatk "new" -> cycles: min:67.65, max:67.90, avg:67.78, alu:1362

This looks okay but it's in the output path so not hit very often so unlikely to make a demonstrable performance change :\

Quote from: Vbs on February 24, 2012, 07:24:26 PM

Been testing some changes on phatk with the KernelAnalyzer and my own personal testing.

Using a VECTORS2 example,

Code:

bool result = W[117].x & W[117].y;

gives a lot of false positives, changing it to

Code:

bool result = min(W[117].x,W[117].y);

is guaranteed to give yummy results! Grin

(same ALU #ops and fetch, no false positives on the next 'if') Cool

See now this is dangerous. Do you REALLY know how fast the "min" function is on all SDKs? Don't expect AMD to do the right thing and to guarantee it's as fast as &.

Quote from: af_newbie on February 24, 2012, 03:24:08 PM

Quote from: ckolivas on February 24, 2012, 03:17:24 PM

Quote from: af_newbie on February 24, 2012, 03:15:30 PM

I wonder what settings are people running their 7970s with. Not for few hours, but days :-)

The weather's hot here at the moment, but..

GPU 0: 718.2 / 713.3 Mh/s | A:5180 R:16 HW:0 U:10.00/m I:11
74.0 C F: 79% (4532 RPM) E: 1200 MHz M: 1050 Mhz V: 1.170V A: 99% P: 5%
Last initialised: [2012-02-24 17:38:34]
Intensity: 11
Thread 0: 357.7 Mh/s Enabled ALIVE
Thread 1: 360.4 Mh/s Enabled ALIVE

Running flat out since the day I installed it a couple of weeks back (note the +5% powertune as well).

Thanks. I will try those. Fan is on auto, right?
Which kernel, what options?

You need to confirm your GPU will actually run at those speeds. Every card has different top stability levels.
--auto-gpu --auto-fan -I 11 --gpu-engine 450-1200 --gpu-memdiff -150 --gpu-powertune 5

This is driver 8.921 on Linux 64 bit with GL SYNC enabled. This means it ends up being -k poclbm -w 64 -v 1 . On windows you will not be able to run that high an intensity without running into high CPU usage issues (probably -I 9 is max), and there's no way to enable GL SYNC that anyone's aware of.

Quote from: af_newbie on February 24, 2012, 03:15:30 PM

I wonder what settings are people running their 7970s with. Not for few hours, but days :-)

Quote from: DeathAndTaxes on February 24, 2012, 02:51:22 PM

* stupidly limiting memclock on 6000 & 7000 series (I mean I understand a limit on overclock but underclock Huh

)

Believe it or not I can actually shed some light on this, and since I'm in a much better mood as someone just kindly donated some BTC, I'll even answer in less than my usual appalling AMD-induced tone of late.

One of the things about the 69xx and now the 79xx architectures is the ability internally to underclock memory relative to running speed. There is a power-usage war going on now between manufacturers, and this is one place where AMD is working very hard (unlike fixing drivers, SDKs and so on). Since the GPUs are running the ram double channel, if the RAM bandwidth is not in full use, they can shut down one of the channels. This is why the power usage does not appear to be directly proportional to the RAM speed. So even though you can only decrease your clock speed to say 900, it might be internally running at 450. This is also why when you flash the bios and underclock the ram, it might crash at apparently satisfactory rates and 6970s are virtually never stable below 300 whereas 5xxx can happily run down to 150. Sure there is more power to be saved if you can actually flash their bios and turn them down to 300 since you are guaranteed to never actually jump between 300 and 600, but it is not universally half the ram speed and power consumption. Bear in mind that most people do not touch the clock speed of their memory (except usually to increase it) but they do care about power consumption. This is also why it's so hard to pin down power usage on these things as they fluctuate wildly depending on the type of load rather than just the overall load. 100% GPU load could really mean anything and might or might not be high ram bandwidth.

cgminer is not doing any adjustment of anything. It sends the request to the driver. The driver says it has accepted the value for the profile. The hardware then gladly ignores you and although the profile now says the memory is 300, the GPU goes back to its default speed. This is why I made cgminer report back the actual values to you after you try to make a change. If it doesnt work it doesnt work. Nothing can make cgminer make it work because it doesnt have access to the special hardware backdoor commands that afterburner and co. can fuck the operating system up the arse with. AMD did not release a public library for anal reaming of GPUs.

Quote from: af_newbie on February 24, 2012, 12:52:42 PM

ckolivas,

Is there a way to downclock memory on 7970 lower than 900MHz with my setup

diakgcn kernel , -v 2 - w 256

Engine - 1050
Memory - 900 (cgminer cannot set it lower, ignores what is set by afterburner and others)
Fan Auto
Power Tune - 10%

I'm getting about 615 Mh/s, steady at 72 C, 50% fan.

Also I cannot lower voltage, every time I lower voltage, cgminer drops Engine speed to like 300.

Also, what is the best setting for "power tune" in my setup.

Thanks,
af_newbie

BTW, I've run this card all the way to 750 Mh/s (with max everything) but cgminer shuts it down engine to 300 MHz after a while.
Good job on the controls.

No. cgminer uses the ATI Display Library to set clock speeds. You need specific windows-only tools to clock memory lower than the engine -150 limit. I don't have access to those functions to use with cgminer.

Quote from: bitlane on February 24, 2012, 06:46:09 AM

Quote from: ckolivas on February 24, 2012, 06:31:17 AM

Redownload the cgminer-2.3.1-1 version please.

HUGE WIN !

5830, SDK 2.1, CAT 12.1...... only 5 MH/s slower per card than 2.2.7 (was 10+ MH/s slower using 2.3.0 & 2.3.1)

6950, SDK 2.4, CAT 12.1...... only 10 MH/s slower per card than 2.2.7 (was 70 MH/s slower using 2.3.0 & 2.3.1)

This is a huge help,
thanks

Well.. it's exactly the same kernel as 2.2.7 so it MUST be the same performance Wink

Anything else is just reporting hashrate differences. Thanks for testing.

I've upgrade the package to a 2.3.1-2 package, doing the same change to ALL the kernels in case someone else is affected.

Enjoy.

Quote from: Aion2n on February 24, 2012, 06:39:40 AM

Quote from: ckolivas on February 24, 2012, 03:35:24 AM

Well I'm exhausted but hopefully I've undone all the harmful aspects to 2.3.0...

Quick - mostly bugfix - update.

Version 2.3.1 - February 24, 2012

- Revert input and output code on diakgcn and phatk kernels to old style which
worked better for older hardware and SDKs.
- Add a vector*worksize parameter passed to those kernels to avoid one op.
- Increase the speed of hashrate adaptation.
- Only send out extra longpoll requests if we want longpolls.
- API implement addpool command
- API return the untouched Total MH also (API now version 1.3)
- Add enable/disablepool to miner.php example and reduce font size 1pt

EDIT: Note 2.3.1-1 package!

Performance down for 7970 with diakgcn from 550 to 410 Huh

Stop using diakgcn with 7970. -k poclbm is fastest.

Quote from: bitlane on February 24, 2012, 06:04:56 AM

Code:

C:\MINER>cgminer -n
[2012-02-23 22:56:58] CL Platform 0 vendor: Advanced Micro Devices, Inc.
[2012-02-23 22:56:58] CL Platform 0 name: ATI Stream
[2012-02-23 22:56:58] CL Platform 0 version: OpenCL 1.0 ATI-Stream-v2.1 (145)
[2012-02-23 22:56:58] Platform 0 devices: 5
[2012-02-23 22:56:58] GPU 0 ATI Radeon HD 5800 Series   hardware monitoring enabled
[2012-02-23 22:56:58] GPU 1 ATI Radeon HD 5800 Series   hardware monitoring enabled
[2012-02-23 22:56:58] GPU 2 ATI Radeon HD 5800 Series   hardware monitoring enabled
[2012-02-23 22:56:58] GPU 3 ATI Radeon HD 5800 Series   hardware monitoring enabled
[2012-02-23 22:56:58] GPU 4 ATI Radeon HD 5800 Series   hardware monitoring enabled
[2012-02-23 22:56:58] 5 GPU devices max detected

Win7 x64
Driver = 12.1 Cat, 2.1 SDK
5x HD5830 Cards

I start CGMiner with a BAT file and only add CLOCK, NETWORK and INTENSITY ( 8 ) settings. Everything else is default (no kernel specified, work size etc....nothing).
................................................
2.2.7 generates bin = phatk120213Cypressbitalignv2w128long4.BIN
2.3.1 generates bin = phatk120223Cypressv2w128l4.BIN
................................................

Using 2.3.1, Performance has been negatively affected compared to 2.2.7 (the same can be seen using 2.3.0-1). The most dramatic performance hit can be seen with my 6950's, by 70+MH/s per card loss of performance, same as was with 2.3.0-1 earlier.
Does the omission of 'bitalign' in the BIN file name have something to do with this ?

bitlane.

No it's just a name. If this is true then the thing causing the regression is the most unlikely thing in the universe and AMD is going into another dimension with its fail. I'm going to beat this mother fucking piece of shit if it's the last thing I ever do. Redownload the cgminer-2.3.1-1 version please.

LOL if this hasn't been a polarising release, nothing has. This is one of the reasons it got a minor version upgrade.

Thousands of hours of coding can't guarantee that I can work around every combination of AMD fail that's possible, and trust me, there are more than anyone can imagine. I'll get there in the end...

New cgminer release (2.3.0) supports icarus.

Quote from: The00Dustin on February 23, 2012, 10:05:23 PM

ck, was the change in 2.3.0-1 for phatk? I assumed it was, but noticed the kernel version number didn't change (unless I eyed it wrong), if my observation is correct, then anyone who downloaded 2.3.0 and then 2.3.0-1 would not get a new kernel if the 2.3.0(not-1) kernel was there already with the same name.

Yes, I wanted to rush out the new package to stop people from downloading the original one ASAP so I didn't even change the version number on the kernel which is bad of me I know.

cgminer reads the profile on startup and saves it. When quitting it's supposed to return it to that saved profile. It works on linux, but you know how it is with windows... sometimes it just won't do what you tell it.

Quote from: echris1 on February 23, 2012, 09:03:58 PM

cgminer + diablo + p2pool = awesome

oh, and also ANUBIS, that just makes everything pretty =)

Finally have all the control (fan, clock, etc) without losing any performance or having to fiddle with SDK versions.

Donation sent, keep up the good work.

Thanks much appreciated. Cheesy