Show Posts
|
Pages: [1] 2 3 »
|
This was the best answer I've ever read on any of the forums.
1 BTC sent.
Thank you!
|
|
|
Thanks, Joel! So, my understanding is that if we change any of these 1 then we have to rehash all previous parts of the message (i.e. the block), thus changing the first three words in the share on which a miner is working, and 2 this is what a pool does when all valid values of the nonce have been tried. Correct? Is there anything other than w[3] (that all kernels are currently changing) that we can change on the client side without having to change w[0] through w[2]? (Joel - I'll send you a whole 1 BTC when you answer this, even if the answer is a no ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) thank you very much!)
|
|
|
Hi,
as I understand it, the miner goes through up to 2^32 values of the nonce field (which the fourth word in the message), calculates sha(sha(message)), and submits the proof-of-work to the pool if the last word in the hashing result equals to 0. Good.
Now the questions. 1 What other fields can we change in the message when mining, other than the nonce? 2 Where is the extraNonce located? 3 Where is the time (and can we change it as we see fit while mining, or are we required to keep it in a certain range)?
I've been reading through the protocol specs on wiki, but couldn't really figure it out.... Thanks!
-B
|
|
|
Well, I hope people will still continue looking into kernel improvements, but start clearly indicating how they've tested their changes and, most importantly, why they think their changes make a difference. Otherwise it is just guessing...
Also, rethaw, thanks for reposting my change to the proper board when I was a newbie and couldn't post! It made it into git/other kernels quite quickly, thanks for that!
|
|
|
Hi all, First of all, I'm very glad that there's lots of people now out there trying to improve the kernels. Thanks for doing this! Second, if I may ask this of you... could you please be very very careful when introducing these modifications? By careful, I mean test your changes for at least a day or so and see what happens. As well, looking at the disasm results in kernel analyzer never hurts ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) Third... Most of the changes that I saw deal with removing some adds and stuff. Normally, any modern compiler should be able to remove all useless adds, such as adding two constants, etc. So I saw no reason for these changes to actually help with the speed. However, some of them were helping, which warranted at least some looking into why they help. So... Fourth. Here's why I think they help. Most compilers will rip out expressions like (a&b)+c and replace it with a single constant, as long as a,b,c are constants. However. They won't do this if you're using an intrinsic in the expression. For instance, if you do a rotate as x<<(32-n) | x>>n, for constants x and n the whole expression will get replaced with just a single constant. However, if you use amd_bitalign for this, it will not get replaced with a constant (especially if you do it for Ch/Maj, which is patched - how can it?). Yet, if you do a rotate using bitshifts but x or n aren't constants, you'll be slowing things down because now the compiler can't optimize it and you've replaced a bitalign() with a bunch of ops. So, long story short, less intrinsics and more constant expressions is probably the reason your changes help. PcChip and I tested out this theory; we've replaced the rotates that use intrinsics with the rotate that uses shifts and or-s (i.e. the stuff the compiler can easily see through) for inputs that are constant. We've got 1-2% improvement, PcChip posted the kernel here http://pastebin.com/NPDTfAVd, but we've done it only for the rotates... if somebody could go through the kernel carefully and find other constants and constant expressions that could be removed, it would probably help even more. Fifth. Feel free to donate coins, but also consider donating to PcChip and the original authors of these kernels; these people put a lot of work into the miners that everybody is using, and they deserve donations more than some dude who happened to notice a 3% improvement to Ma() ![Wink](https://bitcointalk.org/Smileys/default/wink.gif) (also, I'm not being critical or overly judgmental, just sharing some stuff that I happen to know; it is all my personal opinion, I may be wrong, so feel free to criticize and/or disagree)
|
|
|
Good, but this
> - added: "u t1W" variable, which is used in sharound2() to avoid double execution of t1W()
may actually hurt the performance, in theory. If you're using more registers, at least some GPUs may not be able to run as many threads concurrently as they used to, thus slowing things down.
|
|
|
You may be right. May I suggest you run the kernel analyzer and see the disasm of both versions and see if the generated code is any different?
(This is because things like x=1; x=2; x=3; are easily converted by the compiler to just x=3; )
|
|
|
Well, I meant I haven't tested my min() for long enough... and I probably won't test it at all because the difference is not worth the effort (well, I'll test together with other kernel mods, if I have any). I haven't tried your changes yet. I honestly don't understand why it works with the local for anyone, but I like the constant thing you've done ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif)
|
|
|
Yeah, the min() seems to help, but it helps so little that I can't see the difference without a profiler ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) And since I haven't tested the change nearly enough...
|
|
|
Actually, *with* the min() used like I said earlier, the kernel compiles into something quite a lot shorter... I'm gonna *test* it overnight and claim the donations *if and only it works* unless you want to test it ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) For the local, search the board. EDIT - i'll claim the donations for the min anyways, if (and only if) it works and helps anyone ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif)
|
|
|
I've sent you a small donation for your hard work, but...
Do pools accept any hashes generated with your kernel? Really? For instance, the 'local' optimization was declared invalid (it messes up the calculation, so the thread got locked by the moderator), etc.
EDIT - As to why exit early on the if()-s... well, if you found a solution already, why do you need a second solution? Doing branches on the GPU is very expensive (threads may diverge, etc.), so two branches may and most likely will end up being worse than one. May I suggest if(min(x,y)==0) { output x; }? Assuming min can be done without branching, this is one branch if you don't have a solution in either x or y (if min is not 1 instruction, find another function to replace the min...), then try both x and (x+1) on the CPU side to figure out which one of these is the real solution.
|
|
|
I keep hearing that the same .cl kernels compiled with SDK 2.1 and 2.4 are giving different MH/sec results, and 2.4 seems to be giving worse results. So... is this true? Presumably, the same source code compiles into different machine code, but does anybody know what exactly is the difference that's causing this?
I have only 2.4 installed... if nobody knows what the difference is, would it be too much to ask somebody to compile phatk's kernel with 2.1 and post the disasm of the code to pastebin or some such? (just copy/paste it from AMD's kernel analyzer, and I dig through the stuff myself)
Thanks a lot!
|
|
|
Krypta - it looks like r0() and r1() aren't used; try deleting them if you like.
Note that they aren't used by the phatk kernel that I've got, and even though I can't think of circumstances in which r0() will compile correctly, still, be careful if your kernel is different!
|
|
|
Dude, awesome ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) so this accounts for 3% out of your 6.4% improvement, where are the other 3.4% coming from? I don't have 250 btc, in fact I only have ~2 that have been donated to me so far. (Please note - if you decide to share this, post it,don't pm me ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) ) I think I deserve the right to keep this remaining ~3% a secret of mine. I have contributed a lot to this community. AFAIK I was the... * first to show how to down-plug x16 cards in x1 slots by cutting PCIe slots and/or shorting pins A1-B17 http://blog.zorinaq.com/?e=42* first to use BFI_INT, and document how to patch a kernel with it: http://blog.zorinaq.com/?e=43* first to document the PCIe 12V rail redirection hack http://blog.zorinaq.com/?e=44* first to present a cheap solution to cool 4 double width cards http://blog.zorinaq.com/?e=47Now all the miners are using BFI_INT; shops are selling pre-modified PCIe extenders; etc. Enjoy :-) Sure, of course you do, dude! It was just a joke ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) sorry! By the way, I absolutely love your blog - I've read it a lot, just didn't know it was yours.
|
|
|
is it possible this doesnt work on nVidia cards? both with OpenCL miner & CUDA miner i'm still getting the exact same hashrate, even after restarting Guiminer.
or am i missing something? (using windows 7 btw)
Sorry, this only works for ATI cards that support BFI_INT. In general, nVidia cards are much worse for mining, so I'm not sure if there's effort at all going into making nVidia's stuff mine faster...
|
|
|
I've just tried this, didn't help me at all, but I've sent some BTCs your way anyways; thanks for trying to make it better!
|
|
|
I got ~10mhs increase on my 5870's and surprisingly my stale rate actually is a LOT LOT lower then before the mod.
I will certainly be donating as soon as I have some cleared funds bitless, thank you so much for your hard work and I look forward to your next modification you speak of previously. (Do you have any more details on that atm?)
Everything that I've heard so far suggests that the stale rate is completely unrelated to the mod - it is somewhat random (esp. given that hdminer uses something that's equivalent to the mod as well). However, I'm glad that it is low for you at the moment. No, I don't have any ETA on other modifications. They are complex and may never materialize into code... in which case I'll just post my thoughts on the subject. Thanks for the donation! ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif)
|
|
|
Bitless, sent my donation to you as well. Thank you for sharing your findings, giving to the community is what this should be about. There will be plenty of time for profits later, here's to hoping you'll be adequately rewarded for your finding and work.
Thanks a lot! I hope you'll figure out the issues that you've having with your mining after the patch. If I think of anything that can be of help, I'll post it too.
|
|
|
my GPU usage (and thus hashrate) has always fluctuated like a small sine wave from 99% - 90% , after this patch, they all stay a straight line 99% , so that's way more than a 3% increase in hashrate for me.
What's weird, mine went from 90%-96% to 96% straight but the added performance adds 1.5C to the temperature, I might have to clock down a bit to get tot he previous 90C stable line, which means I am forfeiting the benefits of the above optimization. At least I use less energy, the Earth will thank us all... Same here, actually. I was ok with 6870 overclocked to 1000 MHz, but after the change it became unstable so I had to go down to 975, which is kinda sad. Yet, I blame it on poor ventilation in my case, so may be I'll get it running at 1000 again, eventually ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif)
|
|
|
I was wondering when the community would discover this optimization... nice one bitless :-)
Makes me wonder how many other private modifications are in use right now ![Wink](https://bitcointalk.org/Smileys/default/wink.gif) Also makes me wonder why people don't post them for everyone to see. So, 3% or even 5% speed increase doesn't really matter all that much if you're just a regular guy with a couple of ATI cards, right? Or am I wrong? PcChip - give me your best advice, should I keep my next optimization to myself or post it? ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif)
|
|
|
|