I'll see what I can do. I kinda want to just scrape everything, and make a huge csv. That'll come in handy for other analyses too.
If that's not too much to scape I would encourage you to do that, I am also interested in block-size analysis, kind of want to see if we are really fully utilizing the block-size we have now or we aren't and all these calls for block size increment are unlogical, further analysis like how often do we really get unusual block time such as 1 or 2 hours block is also interesting, different studies require different data, so scrapping everything especially from blockchair.com will be really useful, good luck and please keep me updated.
Yesterday, I saw that link
https://gz.blockchair.com/bitcoin/blocks/ but I gave up when see their message that all files require over 1Tb. If it is what you will have to scrape everything, it is good to download their files to save time.
1TB is TOO MUCH, those files probably contain everything about the blocks and more, the current blockchain size less than 250gb, the data we need shouldn't be larger than 1mb in a text/excel format, I most certainly can't process a 1TB worth of data.
I still think include blocks that have same amount of generation and reward BTC is good because we will know there are how many percent of fake empty blocks with same generation and reward BTC.
The generation is the same for every 210,000 blocks and they have nothing to do with a block being empty or full, the generations are not needed in this study and most likely not in any other analysis, I can simply populate them manually if I had to since we know the first block had 50 btc and then 210,000 later it was 25 btc and then 210,000 later it became 12.5 btc, I don't know how to explain it better, but really we don't need them.
for the reward, it's merely generation + fees, if you include the fee(btc) column you will see that, now similar fees for more than one block proves nothing, they are pretty normal.
Your current conditions result in less than 90k results!
That simply means we had less than 90k empty blocks since Satoshi mined the first block.
It would be cool to see that a huge amount of empty blocks by antpool were 30-60 seconds
vs 1-10 seconds as it would expose a pattern of bad acting that they are accused of.
Phill, I will probably still do that analysis for you and the other person who requested it, but I can tell you beforehand that it will prove NOTHING, bitmain can find blocks in 30 seconds and fake the time in the block header to be 3 seconds unless you could hack into their database and assuming they do keep such records, the study will prove nothing, you are counting on their stupidity, integrity, and kindness, and we both know bitmain doesn't have any of those three aspects.