Bitcoin Forum

Bitcoin => Bitcoin Technical Support => Topic started by: ovidiusoft on July 25, 2011, 03:48:27 PM



Title: cgminer + phatk hardware errors
Post by: ovidiusoft on July 25, 2011, 03:48:27 PM
Posting in newbies, because I can't post in the original cgminer thread at http://forum.bitcoin.org/index.php?topic=28402.440

I am running cgminer 1.4.1 (the binary already compiled by @ckolivas) on a Sapphire HD 5830 Xtreme, overclocked at 1040 Mhz (RAM at 325). I see a lot of hardware errors in the stats (HW), as much as 10-15 per 100 accepted shares. Kernel is phatk, the number of HW errors does not depend on the other settings used (-g -I -w -v).

After some testing, I came to the conclusion that the problem is with the phatk implemetation shipped with cgminer-1.4.1 (same thing with 1.4.0). My tests:

1. I assumed that my board it "too overclocked", brought it down to 1000 Mhz, ran it for a few hours, there is no significant drop in HW errors. I excluded overclocking. Update: redid the tests, hardware errors do disappear, but at much lower frequencies. The rate of in-/decrease is very steep.

2. Varied -g, -I, -w and -v parameters, no change in HW errors. I excluded configuration problems.

3. I also came to the same conclusion using phoenix with the modified phatk from here: http://forum.bitcoin.org/index.php?topic=25135.0 . As I commented in post #158 ( http://forum.bitcoin.org/index.php?topic=25135.msg383782#msg383782 ), the phatk version from 07-17 produces a lot of hardware errors, but version 07-11 does not. Other users have the same problem.

So it appears to me that the phatk version used in cgminer 1.4.1 is based on or has the same bug as the modified phatk from 07-17.

Now for the questions:

1. Did anyone else notice the same issue with cgminer+phatk?

2. If @ckolivas and/or @Diapolo are reading, would it be possible to port phatk 07-11 to cgminer-1.4.1 so we can confirm my conclusion?

10q and great miner, btw ;)


Title: Re: cgminer + phatk hardware errors
Post by: Graet on July 26, 2011, 02:29:12 AM
mm have you considered that "hardware erors" might actually be hardware errors and having something that works well recompiled just for you is a bit much to realistically ask of a guy that is doing an awesome job coding and supporting cgminer? as well as raising a family and having a full time job?


Title: Re: cgminer + phatk hardware errors
Post by: ovidiusoft on July 26, 2011, 05:22:09 AM
@Graet, if you will have the patience to (re)read what I wrote and what other users wrote in the phatk modified thread, you will understand that I did thought that the card was too overclocked and I already excluded it.

1. default phatk from phoenix-1.5.0 and all modified phatk kernels up to 07-11 do not produce hardware errors.
2. modified phatk from 07-17 does, as does the version shipped with cgminer-1.4.1 (and 1.4.0).
3. no miner based on poclbm produced hardware errors. I usually test up until 1000 accepted shares and I tested, if I remember correctly, 3 poclbm versions.

With that and what I wrote before, I came to the conclusion that there's a regression on phatk. A developer would usually be interested in such a report, and @ckolivas encouraged testing and bug reporting (please read the original cgminer thread).

Whether @ckolivas or @Diapolo will consider my report relevand and will want to dig in, remains to be seen. Worst case, they will ignore me and I will use phoenix with the phatk version that is best for my setup. However, I don't think that it's "a bit much to realistically ask of a guy that is doing an awesome job coding and supporting cgminer, as well as raising a family and having a full time job". I assume that since he is doing this and is asking for feedback, he has the time to do it and is interested in said feedback.

Oh, and FYI, porting phatk 07-11 to cgminer will not require any recompilation, just patching a text file. I would do it myself, and I already tried, but I don't understand the code that well so I failed.


Title: Re: cgminer + phatk hardware errors
Post by: -ck on July 27, 2011, 10:53:38 AM
I think you're grossly underestimating how much effect chance has on this. Even with a pool as large as btcguild with 3000Ghash, they can have days where their luck is down by as much as 40%! You simply cannot place value on the returned shares over 1500 shares in that way. My advice to you would be to do what you already know from advice is the right thing - decrease the frequency till the HW errors go away. I'm absolutely certain any difference you see in accepted rates between the different kernels is pure chance.


Title: Re: cgminer + phatk hardware errors
Post by: ovidiusoft on July 27, 2011, 11:24:22 AM
I thought about that too, after yesterday I mined +25% more than usual. Turned out that the pool got lucky (deepbit, now at 5+ thash). I already found out the best cgminer settings, so I will run it for a couple of days. I'll tell you what I found out in a few days.

I don't have 2 identical cards at the moment, I would like to do the tests in parallel to exclude pool luck and other random factors. But I hope to borrow another 5830 soon, and I'll come back with better tests.


Title: Re: cgminer + phatk hardware errors
Post by: ovidiusoft on August 06, 2011, 07:30:05 AM
As promised, I'm back with some more testing. For cgminer, I used the data reported by the interface. For phoenix, I redirected out put to a log file and calculated the time since the "connected to server" message and last "accepted" log for the sampling period and used that to determine the accepted shares/minute indicator.

And all this complicated math and monitoring gave me these conclusions:

* [expected] at the same GPU frequency, cgminer and phatk-0711 hashrates are about the same.

* [expected] as frequency increases, hardware errors increase at a greater rate in cgminer, than for phatk-0711.

* [UNexpected] increases in hardware errors don't affect hashrate increase by frequency increase. I would have expected that bumping the frequency in cgminer would bring me *less* hashrate increase than phath-0711. Rationale was that more hardware errors would "eat" from hashrate. But there's no visible effect, so bumping the frequency seems to be prefferable, and hardware errors can be ignored (well, unless the board melts :D).

* [expected] also, more hardware errors won't lead to more rejects. Rejects are more related to communication problems with the pool than local conditions. That's why I ignored them in all my calculations.

* [UNexpected] even for 10.000 accepted shares samples, hashrate/hardware errors/frequency has less than expected impact on accepted shares/minute. Variance is high and depends a lot on network conditions. It's impossible to assess that running the card at +5% frequency/hashrate will increase your accepted shares by 5%. Might even decrease it :)

Soooo.... yeah. Not exactly what I expected, but not unhappy either. It just means that one can run almost any kernel at any frequency and will get about the same results (more influenced by network conditions and pool luck rather than hardware settings of kernel optimisations). Some will prefer to keep it cool (the board, I mean) using a lower frequency and consuming less power, others will tweak the last ALU op and last Mhz our of the board. Just because they can. You know where I stand :)

On an even happier note, the latest phatk mod by @Diapolo, 08-04, brings back the hardware errors to 07-11 levels, so my original problem will go away when this kernel will be ported to cgminer. :D