Using 2016 blocks to determine each diff change is not highly accurate
if you calculate the probability of a 5% diff increase without an actual 5% hashrate increase in 2016 blocks, or in other words, what are the chances 2016 blocks take 5% less time to mine than the expected 10 mins without any real hashrate change, I think using the exponential CDF to get a clue, the probability is close to zero, ya it gets better if you were to assume that it's possible that 50% (2.5% of that increase was caused by an actual increase in hash power) while the other 2.5% was caused by pure randomness, but still the probability of that 2.5% is also close to zero.
I agree, there is no definite way to measure the network hashrate, nothing you can take to court and present as clear cut evidence, but generally speaking, a 5% increase in difficulty is most likely the result of an almost 5% increase in hashrate, ya it could be 5.1% or 4.9%, nobody knows, but 5% in 2016 blocks is fairly accurate.
Sigh.
So the actual calculation comes from the Erlang Distribution
https://en.wikipedia.org/wiki/Erlang_distributionIt allows you to ask questions like: what is the chance that 2016 events are 1% (or more) higher than the expected value.
i.e. they have a mean > 1.01 (and note that like the CDF, the '>' 'greater than' is important i.e. it could be any value > 1%)
I specifically display a stat related to this on my pool (alas no other pool does, prolly coz most pools have little understanding of statistics)
Feel free to try using wolfram's site if you want to plug numbers in yourself, but here's the actual answers:
So the CDF[Erl] (as I call it on my pool) for 2016 events with a mean > 1.01 is 0.67542 (in the gsl library this is called the cdf_gamma_P)
This means that there's a (1-0.67542)
32.5% chance it will be wrong by
more than 1%With 1.03 it's 0.910066 - so it's a 9% chance the value will be wrong by more than 3%
With 1.05 it's 0.986655 - so it's a 1.3% chance the value will be wrong by more than 5%
Now I'm not sure about you, but
more than 3% wrong is a really big number - and that happens with 9% of diff changes.
And to clarify that, 9% of diff changes will be wrong by more than 3%, not 'might' be wrong.
And 32.5% - almost a third - of diff changes will be wrong by more than 1%
Edit: I should also point out that the fewer blocks you use to estimate the network hash rate
(e.g. a tiny number like 144 in a day) the way more inaccurate it can be.
To give an example of one day = 144 blocks
CDF[Erl] 144,144,1.05 (i.e. > 5% wrong) is 26.8%