Latest posts of: -ck

Updated tree - I've modified work submission to spawn a separate thread of its own now which prevents miner threads from stalling when submitting work to slow pools.

Quote from: m3ta on June 24, 2011, 10:00:48 AM

Any plans for MacOS capabilities? (either binary or compilation-wise)?

The intention is for the code to be portable. That said I'll need help to ensure it really is cause there are always surprises moving from one OS to another.

Quote from: figvam on June 24, 2011, 08:25:50 AM

Quote from: ckolivas on June 24, 2011, 06:26:06 AM

Further to that thought, I've committed a change to the tree which should prevent 32 bit overflows.

And looks like it fixed the drops in hashrate reporting. Thanks!

Great, thanks for testing. It would be interesting to see how the throughput on yours compares if you leave your machine idle for an hour or so. minerd is very sensitive to user activity.

Quote from: gmaxwell on June 24, 2011, 07:13:44 AM

Quote from: ckolivas on June 23, 2011, 11:20:13 PM

Per-GPU optimisations and bugfixes.

It's still segfaulting for me deep in the ati opencl code during the setup of the second card in a three card system.

Without doubt I'm still not setting up multiple cards properly yet. Thanks for testing.

Further to that thought, I've committed a change to the tree which should prevent 32 bit overflows.

Quote from: figvam on June 24, 2011, 05:31:04 AM

Thanks for the quick updates.

With the latest version the hashrate is still not optimal on my GPU, and it still drops to zero periodically when CPU threads are disabled:

Quote

...
[2011-06-24 09:06:39] [17.43 | 16.47 Mhash/s] [0 Accepted] [0 Rejected]
[2011-06-24 09:06:44] [17.41 | 16.49 Mhash/s] [0 Accepted] [0 Rejected]
[2011-06-24 09:06:49] [17.43 | 0.02 Mhash/s] [0 Accepted] [0 Rejected]
[2011-06-24 09:06:54] [17.42 | 0.35 Mhash/s] [0 Accepted] [0 Rejected]
[2011-06-24 09:06:59] [17.22 | 0.67 Mhash/s] [0 Accepted] [0 Rejected]
...

Ah. Question: When it drops to zero, do you ever see it still find blocks despite it reading zero? It may just be a 32 bit overflow because I just remembered you're on 32 bits.

Updated tree:
Ensure the GPU doesn't keep working on blocks longer than opt_scantime.

This makes for much less false blocks on slower GPUs.

I've also limited the max --intensity variable to 10, as higher values returned garbage from the GPU.

Per-GPU optimisations and bugfixes.

I've updated the code now to test every card for its ideal settings and it runs a kernel suitable for each card. This allows you to have different GPUs now and have them all work to their best.

As for the multiple threads question about Diablo's miner, the worker thread that hands out work to the GPU in minerd works asynchronously with very low overhead so it can keep the GPU busy just with the one thread. Ultimately the overhead of this approach and the lack of switching workloads on the GPU should be better.

Hmm yeah I still haven't figured out what the bug is there, but I'm still working on it.

Just for the record, this miner is now faster than any other miner for my hardware (1 x 6770) when left idle. I used to get 197 with phoenix+phatk and when cpu is done on top I got another 12 with CPU (total of 209), but with minerd and default settings I'm getting 216.

I've committed some changes to the kernel based on phatk's kernel's use of arrays instead of individual variables and this has afforded another speedup at least on my machine.

I fixed a bug which would make it segfault occasionally when BFI_INT patching.

I've updated the output log to display both a log-interval average and an all-time average.

I've added an option to configure the log interval in seconds with --log.

I think I also fixed the bug where it would stop doing work after a few minutes.

The output now looks like this:
[2011-06-23 22:26:17] [166.81 | 176.26 Mhash/s] [63 Accepted] [1 Rejected]

First entry is rolling average, 2nd is all time average.

Now I'll look some more into the internals and performance.

Quote from: figvam on June 23, 2011, 08:54:45 AM

I run minerd --protocol-dump, and the rate drops to zero immediately after the next getwork.

It could be the threads=0 option. Try allowing the CPUs to run, or at least one thread.

Quote from: figvam on June 23, 2011, 08:50:09 AM

Looks like other miners show the hash rate as an average over the last few seconds. I think showing the average over the whole run time may be confusing. Also, it's strange that the average reported rate would be zero at some point if it's calculated over the whole run time.

It does indeed sound like a bug. Perhaps having a rolling average and a total average would be helpful too.

The hash value returned is the total number of hashes done over the total time. It tends to rise to the real value over a few minutes. If it drops off, usually it's because the server has not responded and it's waiting for more work, but it's not without reason there are other bugs there. The vector count of other apps set to "2 way vectors" in phoenix is the same as an optimal vector count of 2. minerd detects the optimal according to what the card reports and does up to 4 way vectors (the most supported by any card currently).

Okay I've implemented some rudimentary testing of what the GPU reports as its maximum work size and preferred vector width and dynamic patching to make the most of those values. On my brief testing this provides the optimal throughput on the 2 cards I've tried it on (ati 6770 and nvidia GT 330). There is no scope for it coping with multiple different GPUs on the same machine at the moment. Please pull the latest tree and give it a try!

If you grab the latest version, you'll see it reports the preferred vector width. I'm planning on getting all the preferred details back from the cards to automatically set the best options.

Thanks, that's most helpful, especially since this currently uses 4 way vectors. That could explain why it's unhappy.

Thanks for that. I'm not sure why it's not working with 2.4. Do you know what settings (if any) help on that card you have with poclbm? I'm currently at work and won't be able to do anything major right now, but I'll look at implementing more tuneables and detection soon.

Try again please with the new git tree. You should be able to disable the CPU threads. The default optimisations are for 5x and 6x as I have no experience with the other cards. I've yet to implement other options to set on the command line to tweak performance

EDIT: Yes it does recompile the kernel each time it starts (for now).

Thanks for testing and reporting. I've just uploaded another commit to the git tree which should fix that. Please pull the latest changes.

I've now also added a command line tuneable to allow you to choose how hard to work the GPU:

--intensity
(-I) Intensity of scanning (0 - 16, default 5)

The higher you set it to, the greater the GPU throughput, but the more lag you'll get. Very large numbers can cause massive stalls without a huge improvement to throughput. Try them cautiously!