Bitcoin Forum
May 25, 2024, 02:38:37 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
  Home Help Search Login Register More  
  Show Posts
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 [35] 36 37 38 39 40 41 42 »
681  Bitcoin / Mining software (miners) / Re: CGMINER GPU FPGA overc monit fanspd RPC stratum linux/windows/osx/mip/r-pi 2.8.3 on: October 16, 2012, 11:56:51 AM
Nice, but it doesn't actually achieve any demonstrable performance advantage.
You know something? You are quite right! We can write off all of this saga with a "human tries to beat compiler again"! Grin

Out of curiosity, I went to dig deeper on how AMD would do branch prediction (it doesn't), but it does branch predication for branches with few ops!
Basically, the instructions inside the if branches are only executed based on a comparison result (they are not done in paralell), each op inside gets a true/false predicate flag for execution. So the most efficient code for this part is simply removing the main "if" (using the phatk v2 example):
Code:
#elif defined VECTORS2
    if (!W[117].x)
        SETFOUND(W[3].x);
    if (!W[117].y)
        SETFOUND(W[3].y);

Each of the if(!...) starts with something as:
1351  x: PREDE_INT   ____,  R0.x,  0.0f      UPDATE_EXEC_MASK UPDATE_PRED
for updating the predicate flag

This means that if no nonce is found, the most common path, the penalty is only the execution of two PREDE_INT ops, with zero misfires for false positives (min path has the same exec time as with bitmasking). This also allows for simultaneous nonces in both vectors and allows for the nonce itself to be all zeros. Check the GPU ISA code, the execution path is clearer there! Smiley
682  Bitcoin / Mining software (miners) / Re: CGMINER GPU FPGA overc monit fanspd RPC stratum linux/windows/osx/mip/r-pi 2.8.3 on: October 16, 2012, 10:57:01 AM
I think I got your point, now. Thanks again. Smiley

You're welcome, it's always nice to have someone for debate! Grin
683  Bitcoin / Mining software (miners) / Re: CGMINER GPU FPGA overc monit fanspd RPC stratum linux/windows/osx/mip/r-pi 2.8.3 on: October 16, 2012, 01:22:31 AM
That's still only accounting for the simultaneous nonce problem, not the similar bitmask or zero bitmask cases. You certainly make a good argument for not using atomic ops.

No bitmasking needed, the C "?" operator has a specific GPU ISA op code. Smiley

Quote
OCL: (phatk vectors 2 example)
(...)
#elif defined VECTORS2
    uint result = W[117].x ? 0u:W[3].x;       //if(!W[117].x) result=W[3].x; else result=0;
           result = W[117].y ? result:W[3].y;  //if(!W[117].y) result=W[3].y; /*else result=result;*/
        if (result)                                               //result can only be 0 or a nonce: W[3].x or W[3].y
            SETFOUND(result);
(...)

GPU ISA:
(...)
1349    z: ADD_INT     T3.z,  PV1348.x,  T0.x      
          w: ADD_INT     ____,  PV1348.y,  T0.w      
1350  y: CNDE_INT    ____,  PV1349.w,  R18.w,  0.0f      
1351  x: CNDE_INT    R2.x,  T3.z,  R17.z,  PV1350.y
   
1352   x: PREDNE_INT  ____,  R2.x,  0.0f      UPDATE_EXEC_MASK UPDATE_PRED
66 JUMP  POP_CNT(1) ADDR(76)
(...)
684  Bitcoin / Mining software (miners) / Re: CGMINER GPU FPGA overc monit fanspd RPC stratum linux/windows/osx/mip/r-pi 2.8.3 on: October 16, 2012, 12:15:37 AM
Don't forget modern GPUs have up to 2048 shaders...
Thanks! Grin So, refactoring...
Code:
Assuming a worst case scenario, with each wavefront composed of 2048 threads with 4 vectors each:

lambda = 1/(2^32)*2048*4
Poisson(k) = (lambda^k)*(exp^(-lambda))/(k!)

Probability of finding 1 nonce in a wavefront:
P = Poisson(1) = 1.9073e-06
P = Binomial(1, 8192, 1/(2^32)) = 1.9073e-06

Probability of finding 2 nonces in a wavefront:
P = Poisson(2) = 1.8190e-12
P = Binomial(2, 8192, 1/(2^32)) = 1.8188e-12

So, if they all end at the same time (or during the non-atomic mem write op), for every ~1.000.000 found nonces you'll throw one away (overwritten). If using vectors 2, every 2.1e6, and vectors 1 every 4.2e6.

On a 1GH/s card, where finding a nonce takes about ~4.3s, it will take around 50 days of 24/7 runtime for the vectors 4 case to happen once. Smiley
685  Bitcoin / Mining software (miners) / Re: CGMINER GPU FPGA overc monit fanspd RPC stratum linux/windows/osx/mip/r-pi 2.8.3 on: October 15, 2012, 10:27:14 PM
This is what I don't understand too well (sorry to bother you).

Let's assume, as you wrote, that your device has 1GH/s of computing power.

So what T do you like to choose?

If you choose T=1 sec, then the probabilty to find one nonce in T is the mean, so is lambda. Lambda is 1/4.3=0.23. 23%.
The probability to find two nonces in T, according to Poisson distribution, is ((0.2325^2)*e^-0.2325)/2!=0.021. 2,1%.
Depends on what question you are trying to solve. The question you answered above is "What is the probability of finding k nonces in a period of 1s? And in a period of 2s?"

That was not the question I was answering. My question was "What is the probability of finding k nonces at the same time?"

If you choose T=1/1E9 sec, AKA the clock tick duration of your device, then I calculate the probability to simultaneously find two or more nonces this way:
a) if your device processes hashes sequentially (one thread), of course there cannot be simultaneous nonces if we consider T=1/1E9 sec =1/(GPU hashes per timeframe);
b) if your device processes more than one hash simultaneously (more threads), there can be simultaneous nonces, but every thread uses just a part of the device computing power. Let's say we have five threads. Each thread is capable of 200MH/s, so it finds a nonce in about 21.47 sec  (that is: 2^32 H / 200MH/s). In 1/1E9 sec each thread finds a mean of 1/21.47G nonces.
The problem is independent of thread execution time, because that is relatively constant. They all end at mostly the same time in a wavefront. The problem is: when they all end, how many have nonces?

The probability that at least N of our five threads find a nonce in the same clock is 1/21.47G^N. We can say zero, and we don't need any Poisson distribution for it.
When n->inf and p->0 the Poisson follows the Binomial, so either gives the same results: https://en.wikipedia.org/wiki/Poisson_limit_theorem

I can rewrite everything in another way, using another example:
Code:
Assuming each wavefront composed of 256 threads with 2 vectors each:

lambda = 1/(2^32)*256*2
Poisson(k) = (lambda^k)*(exp^(-lambda))/(k!)

Probability of finding 1 nounce in a wavefront:
P = Poisson(1) = 1.1921e-07
P = Binomial(1, 512, 1/(2^32)) = 1.1921e-07

Probability of finding 2 nounces in a wavefront:
P = Poisson(2) = 7.1054e-15
P = Binomial(2, 512, 1/(2^32)) = 7.0915e-15
Either way we look at the problem, the answer is always the same: the probability of finding 2 nonces at the same time is 10^8 smaller than for finding 1 nonce! In another words, a ~0.1 Exahash/s hardware will find 2 nonces simultaneously at about the same rate a 1GH/s card finds 1 nonce right now. Grin
686  Bitcoin / Mining software (miners) / Re: CGMINER GPU FPGA overc monit fanspd RPC stratum linux/windows/osx/mip/r-pi 2.8.3 on: October 15, 2012, 03:48:01 PM
Sorry, I haven't followed your whole discussion, but as soon as I read this message it looked to me a bit weird.
I apologize in advance if I have misread or misunderstood something.

In Poisson distribution, the period of time taken into account is fixed. Let's call it T.

Lambda is the mean of Poisson distribution (and its variance too). So with lambda=1/2^32 you're stating that you expect to find 1/2^32 nonces in T. That is: you expect to wait (2^32) Ts to find a single nonce.
R U sure? Shocked

And what do you mean with:
Quote
Poisson(2)*1E9
Besides, your calculation looks wrong. With lambda=1/2^32, P(2) is ... almost zero. P(2)*1E9 isn't much more, and we can round it to zero as well. Smiley

That's correct, you are expecting to find 1 nonce out of 2^32 cases in T=1/(GPU hashes per timeframe). On the example above, T=1/(1E9).

Each processed 32-bit hash has a probability of 1/(2^32) of being a nonce. So a card that processes 1.000.000 hashes/s has a probability of finding one each (2^32)/1E9 = 4.3 seconds.

My above Poisson math is for the case of 2 simultaneous nonce finds (that's why lambda=1/(2^32) and not lambda=1/(2^32)*1E9).
687  Bitcoin / Mining software (miners) / Re: CGMINER GPU FPGA overc monit fanspd RPC stratum linux/windows/osx/mip/r-pi 2.8.3 on: October 15, 2012, 01:36:50 PM
With bitmasked ones, if the bitmask is only 127, 7 bits need to be in common with anything out of the 4 billion nonces. Again rare but not 2^64 rare. With non bitmasked, 2 or more nonces need to be found in a "wavefront" concurrently and can race on the array variable in FOUND. They can be completely different nonces in that case since they're just trying to access exactly the same variable at the same time without protection. Instead of there being 2 nonces flagged as existing, it could be 0,1,2 or a much larger number, and the slots used for the nonces could be anywhere. In other words, the only "strictly correct" way to do it without there being any chance of error is with the atomic ops. Now how well implemented the atomic ops are in hardware and how much they depend on software is a totally different equation that I can't answer, and AMD is unlikely to tell us.

Thanks for the donation Smiley

Still, the odds for concurrency are very very low. Smiley

A nonce find can be modeled by a Poisson distribution with lambda=1/(2^32).
Code:
Taking the example of a 1GHS card (sum of all wavefronts speed/second):

lambda = 1/(2^32)
Poisson(k) = (lambda^k)*(exp^(-lambda))/(k!)

Probability of finding 2 nounces (at the same time) per second:
P = Poisson(2)*1E9 = 2.7105e-11 = ~.0000000027%
688  Bitcoin / Mining software (miners) / Re: CGMINER GPU FPGA overc monit fanspd RPC stratum linux/windows/osx/mip/r-pi 2.8.3 on: October 15, 2012, 09:51:45 AM
Code:
#elif defined VECTORS2
uint result = W[117].x ? 0u:W[3].x;
             result = W[117].y ? result:W[3].y;
if (result)
    SETFOUND(result);
No, you're not quite right there btw. There are a few issues that made me use the atomic ops instead.

There is no way to return a nonce value of 0.
Bitmasked nonce values can also be zero meaning they get lost.
It is not just vectors that find nonces at the same time, it's a whole wave front of threads finding nonces at the same time and corrupting both values.
Bitmasked nonce values from results found in the same global worksize can come out the same value and overwrite each other.
It's to consolidate the return values from different kernels and decrease the CPU usage of the return code that checks the nonce values.

Again, very small but far from 2^64. Since bitcoin mining is a game of odds, I didn't see the point of losing that - provided you don't drop the hashrate of course. It's unusual that some devices need higher memory speed just for one atomic op but clearly it's a massively memory intensive operation that affects the whole wave front. Considering increasing ram speed by 15 or 20 would not even register in terms of extra power usage and temperature generated, to me at least it seems a better option.

But the beauty of free software is you can do whatever you like to the code if you don't like the way I do it Wink

Thanks for the detailed explanation! Wink

Some more food for thought: I think the bitmasked stuff was probably the biggest problem (because the less 1's the nonce has, the bigger the probability of it being lost IIRC), that's why on the above code there is no bitmasking for checking for nonces, it uses the SETA op code IIRC (the C "?" operand gets a specific gpu isa op code).

A specific nonce value of 0 also happens at a rate of P = 1/(2^64) [P_finding_nonce = 1/(2^32), P_nonce_is_all_zeros = 1/(2^32)], so that's also fine by me. Grin

About the global worksize bitmasked problem, if not using bitmasks, the only way for overwrites to happen would be if 2 identical bitwise nonces were found, correct?

I will also try the tiny mem o/c to see the hashrate diference when using the atomic_add, I like to try all angles to solve a problem, thanks! Smiley

Btw, since I haven't yet, 1btc donation sent! Smiley I can only imagine the number of hours you spent writing and optimizing cgminer's code!
689  Bitcoin / Mining software (miners) / Re: CGMINER GPU FPGA overc monit fanspd RPC stratum linux/windows/osx/mip/r-pi 2.8.3 on: October 15, 2012, 12:16:51 AM
Yes that is interesting. I'm guessing you have underclocked your memory exceptionally low, as that was found to be an issue with use of atomic ops. Some people found a bump of 15 in memory was enough to correct it. Lack of atomic functions there could lead to HW errors and loss of shares. It's a tradeoff either way. The change was put in there to make sure no shares were lost, which can happen with the old opencl code (though it's only a very small number that would be lost).

Ah, ok! Thanks for the info. Yep, I'm at 150MHz mem clock. It's to prevent the case of simultaneous nonce finds on different vectors to overwrite the result on the same address, right?

I prefer the tradeoff tbh, I did the math a while ago on the probability of that happening (P=1/(2^32)*1/(2^32)=1/(2^64). On a 1GH/s card, that will happen on average once every ~585 years)

I'm still using that optimization tradeoff I posted for more than a year now! Grin
Code:
#elif defined VECTORS2
uint result = W[117].x ? 0u:W[3].x;
             result = W[117].y ? result:W[3].y;
if (result)
     SETFOUND(result);
690  Bitcoin / Mining software (miners) / Re: CGMINER GPU FPGA overc monit fanspd RPC stratum linux/windows/osx/mip/r-pi 2.8.3 on: October 14, 2012, 03:02:04 PM
I was running cgminer 2.5.0 till today, as the latest versions had a performance hit on my 5850's (win x64, SDK 2.5 on Cat 12.1), since the last phatk update (~400MH/s to ~320MH/s).

Since I had some free time today I went to check what had changed from phatk120223 to phatk120823 and found what was causing me that performance hit. Commenting the problematic lines fixed it:

Code:
//#if defined(OCL1)
#define SETFOUND(Xnonce) output[output[FOUND]++] = Xnonce
//#else
// #define SETFOUND(Xnonce) output[atomic_add(&output[FOUND], 1)] = Xnonce
//#endif

I still find it strange how the atomic_add could be responsible for that much of a mhash hit, since it will only be called on found nonces. Tongue

(Running 2.8.3 now, will keep monitoring performance for any issues)
691  Economy / Scam Accusations / Re: Nefario on: October 11, 2012, 05:02:17 PM
"I never attempt to make money on the stock market. I buy on the assumption that they could close the market the next day and not reopen it for five years." - Warren Buffet

Priceless! Grin
692  Economy / Scam Accusations / Re: Nefario on: October 10, 2012, 11:07:08 PM
It's UP! Go Go Go! Grin

EDIT: It seems the "AML" identification screen was removed, just asks for an updated email, btc address and a tick to choose to share with asset issuers that info. Next screen says the account is closed and that we'll be contated by email with further info.
693  Other / Off-topic / Re: [Announcement] Butterfly Labs on: September 24, 2012, 12:15:21 PM
My hand is not a BFL product, thank you very much.

But it could be! Ever thought of the huge possibilities? All that protein energy for mining BTC!!! Grin
694  Economy / Scam Accusations / Re: Scammer - Goat on: September 24, 2012, 01:10:24 AM
Don't think he's a scammer "per se", the correct "tag" would be: Level 2 of Morality Development! Grin

Quote
https://en.wikipedia.org/wiki/Lawrence_Kohlberg's_stages_of_moral_development
Stage two (self-interest driven) espouses the "what's in it for me" position, in which right behavior is defined by whatever is in the individual's best interest. Stage two reasoning shows a limited interest in the needs of others, but only to a point where it might further the individual's own interests. As a result, concern for others is not based on loyalty or intrinsic respect, but rather a "You scratch my back, and I'll scratch yours." mentality.[2] The lack of a societal perspective in the pre-conventional level is quite different from the social contract (stage five), as all actions have the purpose of serving the individual's own needs or interests. For the stage two theorist, the world's perspective is often seen as morally relative.
695  Economy / Scam Accusations / Re: Nefario GLBSE on: September 24, 2012, 01:02:33 AM
I don't really understand where the debate is. Because Nefario doesn't have the authority to honor his word means that he is 'all good'?

My understanding:

He acknowledged the shares were fake, but agreed to 'honor' them, making them real GLBSE stock

Goat then buys it, thinking that it will become real stock

Nefario cannot make it stock, because he assumed he had more power than he did. Obviously, he could have known he cannot just 'make more stock'

Now to clean it all up, the fake, but then real, but now fake again stock is being bought back for the initial IPO listing price?


However, at the same time, I feel that if it sounds too good to be true, it probably is. It would be like going to a car dealership and being offered a price that is too low. Being told You are paying $$$$, but the contract says $$$$$. Would the dealership be a 'scammer'?

It's more like: You go to a car dealership and the vendor, by mistake, offered you a price that was too low, let's say 10% of what it was really worth. Later, after seeing he made a mistake, he says he will honor the price. And you say: "Oh, yes? Then I'll buy 10!!!".

So, while he was trying to cover your "losses" out of good-will, you use that good-will to maximize your own profit at his expense. Sad
696  Economy / Scam Accusations / Re: Nefario GLBSE on: September 23, 2012, 02:45:35 PM
(...) Nefario has said the fake GLBSE shares will be removed from everyone's accounts and they will be given the IPO price (0.1 BTC). (...)

Personally, paying 0.1 is too much for them, it's like rewarding exploiters. Even after Nefario told those were fake shares, there are still individuals wanting to exploit his good-will for maximum profit. Incredible!
697  Economy / Scam Accusations / Re: Nefario GLBSE on: September 23, 2012, 01:34:52 PM
If Nefario takes my asset I will consider it theft.

So, let's recap:
  • Fake GLBSE shares are not actually real GLBSE shares
  • Nefario doesn't has the individual power to convert the fake shares to real shares without a motion (out of his control), so he cannot honor his word
  • Nefario didn't specify when he would do such conversion

Hmm. So the best solution seems to be: Freeze all fake GLBSE shares indefinitely. Grin
698  Economy / Securities / Re: [PRE IPO] McWhortle Enterprises, Inc. on: September 19, 2012, 11:28:18 PM

Wow! When I saw that picture the fake logo immediately popped up! The Securities and Exchange Commission (SEC) can't even get someone to photoshop that accurately?!?! Geez! Grin
699  Economy / Securities / Re: [GLBSE] Nyancat Financial: Your Friend for Life on: September 18, 2012, 09:29:24 PM
Quote
Why are you paying out to NYAN.B, let alone NYAN.C at all, until the NAV is repaired to 1?

By paying out dividends instead of retaining and rebuilding the NAV you are incentivizing investors to get the return of NYAN.C without bearing the appropriate amount of risk by redistributing the return that should be accuring to A & B in the form of retained earnings contained implicitly in the NYAN.C NAV. Not only does this introduce moral hazard but is also unfair and may constitute a breach of the implied terms of the NYAN group's purpose.

Only NYAN.A should be guaranteed dividends because it is guaranteed by CPA. NYAN.B should not be paying out dividends if its NAV is less than 1.

Well said.

Well said.

Well said! Grin
700  Economy / Securities / Re: [GLBSE] BFLS.RIG - BFL Hardware mining & Sales on: August 15, 2012, 03:01:56 PM
The fee per volume should be logarithmic, so that after a specific volume it "flattens out". Should be very easy to implement, if there's the will for it! Tongue
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 [35] 36 37 38 39 40 41 42 »
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!