I am not sure what level of access you have to the AWS account sponsoring your site.
Just root access to loyce.club, but
addresses.loyce.club and
alladdresses.loyce.club aren't hosted at AWS. This month so far, they've passed 1 TB of traffic, so it was a good call not to use AWS (this would cost $90).
However, it is possible to setup a storage bucket so that anyone can access it, but that the requestors IP address is among the IP addresses of the same region the files are stored in.
That seems like overkill for this.
Using a database will not solve this problem. There are some things a DB can do to make sorting go from O^2 to O^2/n, but this is still exponential growth.
For a database it would only mean checking and adding 750k addresses per day, instead of sorting the entire data again. I expect
sort to take less long too when the majority of ("old") data is already sorted, but haven't tested for speed differences.
AWS is very reliable.
I have never experienced any downtime with AWS, unlike all VPS providers I've ever used. Those "external projects" don't have much priority to me, if it's down I don't lose scraping data.
This works out to approximately a 24-minute download. I measured a download speed of ~125 Mbps using a colab instance.
It's doing the biweekly data update, that probably slowed it down too.