ProfMac
Legendary
Offline
Activity: 1092
Merit: 1000


August 24, 2015, 07:45:23 PM 

When the difficulty and network hash rate are in sync, what is the probability that 6 blocks will be found in 40 minutes?

I try to be respectful and informed.




Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.



grau


August 24, 2015, 08:36:15 PM 

In probability theory and statistics, the Poisson distribution (French pronunciation [pwasɔ̃]; in English usually /ˈpwɑːsɒn/), named after French mathematician Siméon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. From: https://en.wikipedia.org/wiki/Poisson_distributionMathematica says that: N[PDF[PoissonDistribution[4], 6]] is 10.4 % that is the prpbability of exactly 6 blocks per 40 min. The probability of 6 or more blocks in 40 minutes is: 1  N[CDF[PoissonDistribution[4], 6]] or 11% 1  N[CDF[PoissonDistribution[4], 5]] or 21% Below the plot of probabilty of n blocks per 40 min:




DannyHamilton
Legendary
Offline
Activity: 2198
Merit: 1371


August 24, 2015, 08:59:28 PM 

Of course, it is important to note that the numbers reported by grau assume that an arbitrary 40 minute period is chosen at random without selection bias.
If on the other hand you start with an already solved block and ask what the probability is that another 5 blocks will be solved within the 40 minutes immediately after seeing the first solved block, I believe the probability is higher. In that case you are essentially asking what the odds are that 5 (or more) blocks will be solved in the next 39 minutes and 59.999... seconds.




grau


August 24, 2015, 09:08:38 PM 

Yes, as you see on the plot I added in parallel to my first reply, the probability of 5 blocks within next 40 mins is 15%.




ProfMac
Legendary
Offline
Activity: 1092
Merit: 1000


August 25, 2015, 12:00:49 AM 

Thanks! I have read some of the WiKi page till my head got full. I'll go read more later.
I don't have access to Mathematica, and the documentation in R is a bit more than I want to tackle in the next short while.
Based on what I have seen, however, I wish I had another plot, this time the Poisson distribution as you have presented it, only for lambda = 18 (3 hour expected network block production), and running out to k = 36.
While I am wishing, I also want a numeric table of the cumulative distribution function for the same distribution.
If anyone has the R to do this, I would be really appreciative to see it. While I myself can code this correctly in R from the WiKi definitions, it will take some time.

I try to be respectful and informed.



johnnewbery
Newbie
Offline
Activity: 4
Merit: 0


August 25, 2015, 12:29:00 AM 

Small pedantic point to add here, in that the question in the OP and the thread title are slightly different:
OP: What is the probability that 6 blocks will be found in 40 minutes?  Poisson distribution is appropriate because block discoveries are independent events.
Thread title: Re: What is the probability of a 40 min 6 block streak?  I assume the word streak here means chain? 6 independently discovered blocks do not necessarily form a blockchain of height 6 due to orphaned blocks.
So to answer the thread title: the probability of a 6 block chain in 40 minutes is slightly less than (1  N[CDF[PoissonDistribution[4], 6]]) because one of those discovered blocks may become orphaned (due to block propagation speeds or other reasons).




grau


August 25, 2015, 06:32:56 AM 

Based on what I have seen, however, I wish I had another plot, this time the Poisson distribution as you have presented it, only for lambda = 18 (3 hour expected network block production), and running out to k = 36.
While I am wishing, I also want a numeric table of the cumulative distribution function for the same distribution.
here you are: 1. probability of exactly n block within 3 hours: 2. cummulative numeric, that is probability to have <= n blocks within 3 hours: { {0, 1.523E8}, {1, 2.8937E7}, {2, 2.75663E6}, {3, 0.0000175602}, {4, 0.0000841761}, {5, 0.000323993}, {6, 0.00104345}, {7, 0.00289347}, {8, 0.00705601}, {9, 0.0153811}, {10, 0.0303663}, {11, 0.0548874}, {12, 0.0916692}, {13, 0.142598}, {14, 0.208077}, {15, 0.286653}, {16, 0.37505}, {17, 0.468648}, {18, 0.562245}, {19, 0.650916}, {20, 0.73072}, {21, 0.799124}, {22, 0.85509}, {23, 0.89889}, {24, 0.93174}, {25, 0.955392}, {26, 0.971766}, {27, 0.982682}, {28, 0.9897}, {29, 0.994056}, {30, 0.996669}, {31, 0.998187}, {32, 0.99904}, {33, 0.999506}, {34, 0.999752}, {35, 0.999879}, {36, 0.999942}, {37, 0.999973}, {38, 0.999988} }




grau


August 25, 2015, 06:44:18 AM 

 snip  the probability of 5 blocks within next 40 mins is 15%.
Do I understand it correctly then if I say that the probability of 5 or more blocks within the next 40 minutes (and therefore within the 40 minutes immediately following the broadcast of a block) is 26%? The probability of 5 or more blocks within the next 40 minutes is 1  N[CDF[PoissonDistribution[4], 4]] = 37%, I made a correction in my first reply, see striketrhough. Since events are independent, it does not matter if you are just after a block or not. Probability of 5 or more blocks includes probabilty of 5, probability of 6 .... The way you deal with this is using cummulative probability function as I enumerated in the previous post. Using that table you can compute the probability of any block number range withing 3 hours. Examples: The probability of less or equal 15 blocks is 28.6 % The probability of more than 20 blocks in 3 hours is 10.73072 = 26.9% The probability of 1620 blocks is 0.73072  0.286653 = 44.4%





ProfMac
Legendary
Offline
Activity: 1092
Merit: 1000


August 25, 2015, 04:44:02 PM 

Is there a tool that can take a block number, and a block count, and return the number of minutes between successive blocks. While I can build this by hand from blockchain.info, there is a certain tediousness to it.
For example, I would like to start at block 370944, the beginning of the current epoch, and continue for some small number, perhaps 18 or 24.

I try to be respectful and informed.



deepceleron
Legendary
Offline
Activity: 1512
Merit: 1000


August 26, 2015, 08:34:41 AM 

Is there a tool that can take a block number, and a block count, and return the number of minutes between successive blocks. While I can build this by hand from blockchain.info, there is a certain tediousness to it.
For example, I would like to start at block 370944, the beginning of the current epoch, and continue for some small number, perhaps 18 or 24.
Each bitcoin block has a timestamp, but it is added by the miner (by the local time on the bitcoind machine or on the pool server) when the block was generated to be hashed. There are many blocks that have a negative timestamp offset compared with the previous block due to differences in computer clocks. It may be more reliable to have a listening node monitor the time that new blocks are published on the network (which propagate everywhere within seconds) if you are not doing historical analysis. In Bitcoin Core, you can get the timestamps out of your local blockchain, but it requires chaining two RPC commands: one to get the block hash, and one to dump that block using the hash. Here's a post I wrote describing a script to do this. Replace "bitcoind" with "bitcoincli" when using the latest Bitcoin software. Here's a PM I wrote someone else with the details of wot to do. I have a CSV of block times: https://bitcointalk.org/index.php?topic=135982.msg1453722#msg1453722I dumped them on Windows with this "dumptime.cmd" in the bitcoind directory (and then added some more spreadsheet columns to make epoch time readable time), here it dumps times from block 5000099999: @echo off setlocal enableextensions set /a height=50000 rem echo start > timeout.txt :beg for /f "tokens=* delims=:" %%a in ( 'bitcoind getblockhash %height%' ) do ( set hash=%%a )
for /f "tokens=*" %%a in ( 'bitcoind getblock %hash% ^ find "time"' ) do ( set blktim=%%a ) echo %height%: %blktim% echo %height%: %blktim% >> timeout.txt
set /a height = height + 1 IF %height% LEQ 99999 goto beg
endlocal blocknum,epochtime,blocksec,datetime 0,1231006505,0,20090103T18:15:05Z 1,1231469665,0,20090109T02:54:25Z 2,1231469744,79,20090109T02:55:44Z 3,1231470173,429,20090109T03:02:53Z 4,1231470988,815,20090109T03:16:28Z 5,1231471428,440,20090109T03:23:48Z 6,1231471789,361,20090109T03:29:49Z 7,1231472369,580,20090109T03:39:29Z 8,1231472743,374,20090109T03:45:43Z 9,1231473279,536,20090109T03:54:39Z 10,1231473952,673,20090109T04:05:52Z 11,1231474360,408,20090109T04:12:40Z 12,1231474888,528,20090109T04:21:28Z 13,1231475020,132,20090109T04:23:40Z 14,1231475589,569,20090109T04:33:09Z




ProfMac
Legendary
Offline
Activity: 1092
Merit: 1000


August 26, 2015, 08:19:45 PM 

I'm still looking at some of the blockchain timestamp data. I typed in the block numbers, 370,944 to 371,087, and the block times from blockchain.info and saved it as a .csv file. This is the start of the current difficulty epoch continuing for approximately 24 hours. I can post the whole file somewhere if someone has a suggestion. > temp[ c(1:5,140:144), ] block mon day year hr min sec 1 370944 8 22 2015 0 49 43 2 370945 8 22 2015 1 4 59 3 370946 8 22 2015 1 10 29 4 370947 8 22 2015 2 5 5 5 370948 8 22 2015 2 10 31 140 371083 8 22 2015 22 7 41 141 371084 8 22 2015 22 21 10 142 371085 8 22 2015 22 24 50 143 371086 8 22 2015 22 33 5 144 371087 8 22 2015 22 35 2
The blocks following these blocks show a negative time increment. It might be interesting to see if these pairs of blocks are over represented by any particular miner. I don't know how to find who mined a particular block. > blocktimes[ delta[] < 0, ] block mon day year hr min sec time 7 370950 8 22 2015 2 38 33 9513 21 370964 8 22 2015 6 11 29 22289 34 370977 8 22 2015 7 18 53 26333 50 370993 8 22 2015 9 22 24 33744 114 371057 8 22 2015 19 7 28 68848 131 371074 8 22 2015 20 51 29 75089
I manipulated this data in R with commands similar to these. These are from notes made not exactly from the log file... I don't know yet if it is algorithmically possible to "fit" a Poisson to the distribution data. temp < read.csv("Documents/blockchain calctimes.csv", header=T) blocktimes < 60*(60*temp[,"hr"]+temp[,"min"])+temp[,"sec"] delta < blocktimes[ 2:144, "time" ]  blocktimes[ 1:143, "time"] #note: min blocktimes is 711 png(filename="blockchainpoisson.png") plot( tb1 < table( cut( delta+711, seq(0, 3276, 300), right=FALSE)), ylim=c(0, 50)) n < 9; x < c( 0:n ); y < dpois( x, 2.0 ); points( 2+x, 136*y, ylim=c(0, 0.5), col="red") dev.off()
I wasn't able to link to the image. I put it on Google+ as https://plus.google.com/u/0/photos/115426745065196075335/albums/6187408748966855121/6187408753859123554I think that everyone will agree that two consecutive timestamps that show a negative interval have an incorrect timestamp somewhere. I am pretty sure I can repair the data by modifying one of those two timestamps to give data that is much closer to a realistic Poisson distribution. I try to be very conservative when I repair data. I haven't explored that process yet.

I try to be respectful and informed.



DannyHamilton
Legendary
Offline
Activity: 2198
Merit: 1371


August 26, 2015, 11:25:10 PM 

I think that everyone will agree that two consecutive timestamps that show a negative interval have an incorrect timestamp somewhere.
The timestamps in the blocks are not intended to be completely accurate. I believe they can vary by plus or minus a few hours. I think I've read that some miners (and/or mining pools) will use the block timestamp as a an extra nonce so that they don't need to rebuild the merkle root as often. The timestamp is only intended to be used for calculating the new difficulty every 2016 blocks. A variation of 7200 seconds (2 hours) over the course of 2016 blocks works out to only about 3.6 seconds per block. That's relatively insignificant when compared to the natural variations that will occur due to the random nature of the prooforwork process. I'm not sure what you are investigating, or what you are trying to determine, but modifying unreliable data to make it fit some preconceived expectation is typically a bad idea.




organofcorti
Donator
Legendary
Offline
Activity: 2058
Merit: 1000
Poor impulse control.


August 28, 2015, 01:31:13 AM 

As is noted in the thread, the timestamps are inaccurate. Trying to make timestamps accurate requires the assumption that blocks are generated a particular way  but this is what you're testing, so you can't do that.
I have a source for more accurate "timestamps" (actually the first time a block has been recorded by a well connected monitor), but this doesn't fix the problem.
The problem is that blocks appear wrt to time as a nonhomogenous Poisson process rather than a homogenous (usual type) Poisson process. They are only a homogenous Poisson process with respect to hashes.
This is not usually an issue unless considered over many days, but if there are sudden changes in hashrate the block rate will be affected in a significantly nonhomogenous way. For example, I've noticed that block durations aren't actually exponentially distributed even if you try to normalise the data to account for the nonhomogenous nature of the process. I *think* this has something to do with miner hashrate changes at the start of a block, but it's hard to prove.
I don't doubt that some of what you see is the effect of the generation process being nonhomogenous. However, it might be that in the relatively small sample you took which look nonPoisson might actually be ok. You could use R package dgof to do some discrete goodness of fit tests, or you could find the confidence intervals for the histogram bins and see if the bins are either under or overfilled, or within the expected range (if bins are the same size they should have a binomial distribution where p = 1/ number of bins)




spin


September 03, 2015, 11:55:47 AM 

As is noted in the thread, the timestamps are inaccurate. Trying to make timestamps accurate requires the assumption that blocks are generated a particular way  but this is what you're testing, so you can't do that.
I have a source for more accurate "timestamps" (actually the first time a block has been recorded by a well connected monitor), but this doesn't fix the problem.
The problem is that blocks appear wrt to time as a nonhomogenous Poisson process rather than a homogenous (usual type) Poisson process. They are only a homogenous Poisson process with respect to hashes.
This is not usually an issues unless considered over many days, but if there are sudden changes in hashrate the block rate will be affected in a significantly nonhomogenous way. For example, I've noticed that block durations aren't actually exponentially distributed even if you try to normalise the data to account for the nonhomogenous nature of the process. I *think* this has something to do with miner hashrate changes at the start of a block, but it's hard to prove.
I don't doubt that some of what you see is the effect of the generation process being nonhomogenous. However, it might be that in the relatively small sample you took that although it looks nonPoisson, it might actually be ok. You could use R package dgof to do some discrete goodness of fit tests, or you could find the confidence intervals for the histogram bins and see if the bins are either under or overfilled, or within the expected range (if bins are the same size they should have a binomial distribution where p = 1/ number of bins)
Great post. I found that quite interesting. What is your thinking on the hash rate at the "start of a block". Do you mean the "orphaned hash rate" due to miners working on the old headers before learning of a new block? Is it homogenous w.r.t. the hashes? I'm assuming the hash rate is continuosly changing? I also had a look at your site. Great work b.t.w. If you don't mind answering here, or is there a thread on your site, but your CI for the forecast appears narrower than the CI of the the hash rate estimate.

If you liked this post buy me a beer. Beers are quite cheap where I live! 194YjsiwmGm3hcbPcJWWyzRAS9CQLX1fJL



organofcorti
Donator
Legendary
Offline
Activity: 2058
Merit: 1000
Poor impulse control.


September 09, 2015, 11:12:39 PM 

As is noted in the thread, the timestamps are inaccurate. Trying to make timestamps accurate requires the assumption that blocks are generated a particular way  but this is what you're testing, so you can't do that.
I have a source for more accurate "timestamps" (actually the first time a block has been recorded by a well connected monitor), but this doesn't fix the problem.
The problem is that blocks appear wrt to time as a nonhomogenous Poisson process rather than a homogenous (usual type) Poisson process. They are only a homogenous Poisson process with respect to hashes.
This is not usually an issue unless considered over many days, but if there are sudden changes in hashrate the block rate will be affected in a significantly nonhomogenous way. For example, I've noticed that block durations aren't actually exponentially distributed even if you try to normalise the data to account for the nonhomogenous nature of the process. I *think* this has something to do with miner hashrate changes at the start of a block, but it's hard to prove.
I don't doubt that some of what you see is the effect of the generation process being nonhomogenous. However, it might be that in the relatively small sample you took which look nonPoisson might actually be ok. You could use R package dgof to do some discrete goodness of fit tests, or you could find the confidence intervals for the histogram bins and see if the bins are either under or overfilled, or within the expected range (if bins are the same size they should have a binomial distribution where p = 1/ number of bins)
Great post. I found that quite interesting. What is your thinking on the hash rate at the "start of a block". Do you mean the "orphaned hash rate" due to miners working on the old headers before learning of a new block? Is it homogenous w.r.t. the hashes? I'm assuming the hash rate is continuously changing? I think yes to both questions, but that's opinion based on old data. It seemed to be the case before stratum, not sure if it's a significant effect now.I hope to get time to look at that again soon. I also had a look at your site. Great work b.t.w. If you don't mind answering here, or is there a thread on your site, but your CI for the forecast appears narrower than the CI of the the hash rate estimate.
Thanks for the kind words They're about the same, but one is offset wrt the other. It's annoying and I think it's because the forecast method makes assumptions about residuals for the forecast that is not the case. I'm not sure how to fix that.




