While not terribly useful, this scatter plot of # of blocks to confirm
I think thats exactly the opposite... fee rate vs time is not very useful: Time just adds in the noise of a possion process with no influence by fees.
If you have a good estimator based on block count you can simply multiply it's predictions by the exponential distribution of interblock intervals and get a good estimator of time... but to take data that has been smeared by random block times and construct good estimates is hard due to all the injected noise.
In effect, your time data is heavily biased by random correlations with higher and lower fee intervals with luckier or less lucky block finding.
This remains true so long as people aren't turning hashrate on and off based on fees-- and so far as I know it, no one is today.
An interesting chart is a grid over n-blocks-wait and fee-rate, then for each cell set a value of what percentage of transactions paying at least that fee rate were confirmed in at that number or fewer blocks.
How does your data handle transaction replacement? How do you compute feerates for CPFP transactions? One possibility is to only consider transactions which are not dependent on unconfirmed transactions and which have no children; and similarly do not consider replacement or replaced transactions.
I understand blocks to confirm is more useful but just looking at that graph, I could see key information wasn't visible in 2d black and white. That's what I meant by that graph not being terribly useful. I like the grid chart idea.
A question that comes to mind with blocks is, has anyone done analysis to detect patterns in luck or transaction inclusion week by week? I suppose not many people are turning miners on and off due to weather, variable energy pricing over the day but I'd be interested to look into it.
The data doesn't take into account CPFP or dependencies within unconfirmed transactions at this point but it was a thought I had.
I've only started collecting some of this data and want to see what people throw on the wall in terms of analysis, then iterate on the data that's being collected to learn more. I feel like diversity in fee estimation strategies is important to defend against some entity trying to game those strategies.