Show Posts
|
Pages: « 1 [2] 3 »
|
I was wondering when the community would discover this optimization... nice one bitless :-) For the record, hdminer has implemented this maj() optimization since day 1: # ibit_extract patched to BFI_INT at runtime $code .= " ixor $tmp0, $a, $b\n". " ibit_extract $tmp0, $a, $c, $tmp0\n";
Phoenix is probably very close to hdminer's performance now, on HD 69xx. Dude, awesome ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) so this accounts for 3% out of your 6.4% improvement, where are the other 3.4% coming from? I don't have 250 btc, in fact I only have ~2 that have been donated to me so far. (Please note - if you decide to share this, post it,don't pm me ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) )
|
|
|
I see some Ma3() function in your kernel (I don't have it), which seems to be almost the same as the original Ma(), and my optimization could be applied to it as well. Why didn't you change this Ma3()? Any particular reason?
|
|
|
Seriously, the Ma() function is soo deeply burried, so that if it was wrong, by the principle of good hashing function like SHA256 is, all the hashes would fail.
So, if it can find some good results, all are, with 99.999999999999% probability correct, too.
More like 2^-128 if not less ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif)
|
|
|
There's a u W[128] at the beginning of the Search() function, would those be the same as poclbm's w0-w15 ?
TLDR - Probably yes, maybe not. Try it anyway ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) When mining, you're calculating SHA256( SHA256( w0,w1,... nonce ..., w15) ), where the w-s are some constants, most of which depend on the block you're trying to solve. The calculation of each SHA consists of two stages - the expansion (a.k.a. message schedule), which takes your input words w0 through w15, and expands them into 64 words, w0 through w63, and the compression, which iterates on these 64 expanded words 64 times, with each wi used only once, on iteration i. So.... There are at least two ways to carry out this calculation. 1 - expand the 16 original input words into 64 words, and then use them. 2 - expand these as necessary, i.e. calculate wi on the iteration i, when it is needed. My guess is that one kernel implements #1, and another implements #2. So, w[128] is probably the same as w0,w1... . It is also hard to tell which way is better, it depends on the compiler and your hardware (register pressure, scheduling and so on)... What I don't know is why they've got 128 words instead of 64; perhaps it is because of calculating the hash twice, or it could be because they're calculating two hashes at once, in parallel. Unfortunately, I don't have access to the source code at the moment, so all I can do is guess ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) I hope this helps.
|
|
|
Thank you, you are a scholar and a gentleman (for finding and sharing this) ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) I'll try it out tonight and report my results, will definitely tip you if it helps!
|
|
|
WoW ![Shocked](https://bitcointalk.org/Smileys/default/shocked.gif) 310 to 380 Mhash ( +18.42%) ! AWESOME! Thank you! Neat-o-rama! Somehow I still don't believe you (and my eyes) ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif)
|
|
|
can't do it ![Sad](https://bitcointalk.org/Smileys/default/sad.gif) Yes, you can ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) what you've done is the CH function, not MA. Now you can build MA on top of it. I don't think this belongs to this thread though, so you can pm me and we'll figure it out. Alternatively, you can ask someone around you for help - two heads are better than one.
|
|
|
Just tossing my confirmations out there: ... So, the increase appears to also cause additional rejects that should be discounted from total increase gain, cutting it down a full 1% in my case.
This is really not good. I honestly do not know why this would happen; are you sure it is related to the patch and isn't a result in the general randomness when searching for solutions? Perhaps someone else, more familiar with bitcoin mining than me, can chime in?
|
|
|
FYI, You can make the identical change to phoenix+poclbm with the same 2% increase in speed.
Do tell, If theres a simple file to replace that you can upload i'll definitly Tip you Winx86 GUIminer-poclbm at the office now, no access to the miners, I'll take a look once I get home. In the meantime, search for .cl files and see if you have any Ma (or, even better, amd_bytealign) strings in them and take it from there.
|
|
|
So m0mchill (or somebody?) already updated the code repository, and hasn't sent you a "thank you" PM yet ?
No, but it is me who should be thanking them for writing these awesome miners to begin with!
|
|
|
Gigabyte 5850 @ 1000/300 - from 390MHash/s to 400MHash/s with the mod applied. Got a question, which boolean operator is used to sum all the outputs? ((y),(x|z),(z&x))
It is the CH function; given 3 arguments, a,b,c, CH returns B when A==0, and C when A==1. This is the same as (a&b)|(~a&c), which is the way BFI_INT is defined in ATI's docs.
|
|
|
I HIGHLY encourage one of us to go through the truth table in my original post and the C code that was used to generate it to make sure there's no errors. That's why I posted it to begin with. It would really suck if it didn't work.
You posted the truth-table (and modified the code) for the MA-function, but your C-code is for the CH-function. But Your thruth-table for MA-function is correct. Sorry about the confusion. CH = BFI_INT, since BFI does the same thing as the CH function in the hash spec. MA is built using CH/BFI, so there's a define for CH (copied from and docs) and the MA built on top of CH. Thank you for verifying this stuff.
|
|
|
Thanks for your code. It's nice to see people get rewarded for their efforts (though it doesn't always happen). Well, I've tried it on my own miner (minerd) and have some very strange findings. https://forum.bitcoin.org/index.php?topic=21275.0First of all, minerd supports up to 4 vectors, and when I add this change to my kernel, it actually _slows down_ the 4 vector version. But when I override it to set 2 vectors, it speeds it up. However, once it's sped up, I then get runs of rejected shares. I tried it multiple times with and without and it does appear to be just this change that causes it, so I'm not sure what's going on. Also, a very silly thing here -> I've posted #define Ma(x, y, z) amd_bytealign( (z^x), (y), (x) ), but I really should have put all expressions in parentheses. So, ((z)^(x)) and not (z^x) etc.... you know ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) Regardless. You got me worried about rejected shares. I HIGHLY encourage one of us to go through the truth table in my original post and the C code that was used to generate it to make sure there's no errors. That's why I posted it to begin with. It would really suck if it didn't work.
|
|
|
Thanks for your code. It's nice to see people get rewarded for their efforts (though it doesn't always happen). Well, I've tried it on my own miner (minerd) and have some very strange findings. https://forum.bitcoin.org/index.php?topic=21275.0First of all, minerd supports up to 4 vectors, and when I add this change to my kernel, it actually _slows down_ the 4 vector version. But when I override it to set 2 vectors, it speeds it up. However, once it's sped up, I then get runs of rejected shares. I tried it multiple times with and without and it does appear to be just this change that causes it, so I'm not sure what's going on. I honestly do not know what's up with that, I saw ATI asm yesterday for the first time and can't tell you exactly what's wrong yet. All I know is that the truth tables match for the Ma() function with and without my modifications. Yet, here's a couple of ideas - 1 glancing through the doc, radeons are VLIW5 = 4+1, with 4 'normal' pipelines and one transcendental pipe, which can do a restricted set of instructions. I don't know where BFI_INT gets executed, but if it is only in the trans. pipe, then doing too many BFI's can hurt the performance by making that pipe a bottleneck. Check the docs and let us know, if you don't mind. 2 if (z^x) isn't already used in other places in your code, then it may be pushing up the register usage and you're running less threads in parallel. Again, I don't know much about ATI, but it would be the first thing I'd check if we were on nVidia/CUDA. 3 something else altogether... Not sure. Sorry. If I think of anything else, I'll post it ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) In the meantime, it would also suck if people started getting more rejected shares... hmmm. I don't, it does work for me, but I encourage everyone to check their results (the actual amount of accepted shares that they get).
|
|
|
So bitless, how far are you in Computer Science?
Naukop - Could the drop in the hashrate be due to overheating? I've noticed that my card heats up more as a result of the change (and then either it hangs or I have to overclock it less ![Sad](https://bitcointalk.org/Smileys/default/sad.gif) ), so could the same be happening to you? Opsamk - m.sc., then asm programming for 10 years or so; i'm not looking to be-rich myself on bitcoins, just playing around with these hashes and ATI's hardware ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif)
|
|
|
Everybody, thanks for trying this out. I guess it really does work for other people. Keep donating, I like it ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) Now that we know it works, it would be great to check this change into the source code repository somewhere so that everyone gets this change by default, without having to manually edit any files. How is this done? Also, how do I post into the non-newbie thread? I feel that this post it doesn't belong here.
|
|
|
FYI, You can make the identical change to phoenix+poclbm with the same 2% increase in speed.
Cool, thanks for trying it out! I figured the same logic of using BFI_INT to build the MAJ function would apply to other miners and kernels as well, I just haven't tried it, so thanks for trying it out ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif)
|
|
|
Worked for me - boosted around 2%. Very impressive. Where can I send donation?
Will
Yay! ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) Donate to 15igh5HkCXwvvan4aiPYSYZwJZbHxGBYwB if you feel like it. Your BTCs will (or may) help develop an even faster miner. Exact results, on my 2x5870 (each one overclocked slightly differently due to temperature profiles) 1st - 422.13 -> 430.25 (1.9% increase) 2nd - 411.80 -> 422.13 (2.5% increase) so around 2% increase. Will Excellent, thanks for the info... now it is back to the drawing board to me, I'll poke around and see if there's any other easy changes that can speed things up - so far everything I see will require a massive rewrite of the miner but ok, we'll see ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif)
|
|
|
Worked for me - boosted around 2%. Very impressive. Where can I send donation?
Will
Yay! ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) Donate to 15igh5HkCXwvvan4aiPYSYZwJZbHxGBYwB if you feel like it. Your BTCs will (or may) help develop an even faster miner.
|
|
|
|