Great analysis. Could you factor in the timestamp/the timing received by blockchair into the graph? It would be a lot clearer to see the timings between the empty blocks and the block before it.
I could but there are two problems, the first one is scraping the data of blockchain,took me forever to extract the data for empty blocks which is nothing compared to the total number of blocks, but if someone has a better way of getting those data, put them in a table format like excel and send them to me, I can do it and even more perhaps.
The second problem is that time-stamping isn't exactly accurate, they are accurate enough to be used to adjust the difficulty, but not accurate enough in a sense that we can use them to reach to any conclusions, maybe if we find empty blocks coming after a long enough time (enough to actually validate the transactions of the previous block) and still come in empty, we would confirm that a certain pool did actually have enough time but did not include transactions on purpose, however, since the time-stamp is actually put in the header of the block by whoever mines it, they can manipulate it, say I find a block 60 seconds after the last block was propagated (nobody knows about it yet), I need 30 seconds to validate the previous transactions and reconstruct the pool, but I decide not to include any transactions, I can conveniently lie and say I found the block in 10 seconds, who is going to stop me from doing so?
JUST a fast check of three shows a big difference in time 6 to 32 seconds.
I don't think anyone knows exactly how long it takes a mining pool to fully download and validate the previous block, but I am pretty sure it's less than 32 seconds, of course, more complex blocks take slightly longer, but 32 seconds can't be it, why do I think so?
Applying cumulative distribution function, the probability of blocks being found between 1 and 32 seconds can be derived by:
exp(−1/600)−exp(−32/600) = 5.36% of blocks would be empty, in other would we would "in general" have an empty block every 18.6 blocks, that's nearly 8 blocks a day which would result in 2920 blocks per year, which never happened even when was well-intended.
so really 32 seconds is more than enough to validate the previous block, deal with the memepool and reconstruct the next block WITH transactions in it, but where did the 32 seconds come from?
1- That node's memepool was empty. > very unlikely.
2- The time reported is actually "wrong" > Very likely.
I would still do the analysis, but really the results will be
meaningless
people scream empty blocks are pools being bad actors
is it simply the pools are really big?
The fact that the time to mine those blocks is within seconds and the number of transaction is '1' (
Coinbase TX) tell that those are probably mined with 'Covert ASICBoost'.
Because not including any transaction is the easiest way to generate a Merkle Root Hash (
hashMerkleRoot) with fixed last 4 bytes.
But there's no evidence so I'll leave it with "
probably"
I don't think anybody uses Covert Asicboost anymore, finding blocks in the next second at any given time always has a probability of 0.166%, if everyone was mining with a pen and paper, people would probably still mine empty blocks.
To explain even further, for the past 4 months of 2020 we had exactly 14 empty blocks every month, that's a bit less than 0.5 blocks a day, this is below normal as far as statistics and probability are concerned, applying the above formula and assuming for the sake of it that 5 seconds is more than enough to validate the transactions it is still OKAY to mine an empty block once every day, I think the number of empty blocks for the past 2 years is pretty normal.