LoyceV (OP)
Legendary
Offline
Activity: 3584
Merit: 18196
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
August 26, 2020, 05:48:55 PM |
|
Sample: addresses.txt.gz: all addresses in chronological order, with duplicates ( Warning: 31 GB): 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa 12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX 1HLoD9E4SDFFPDiYfNYnkBLQ85Y51J3Zb1 ....... 3GFfFQAFgXKiA1qqUK6rqBpEpG4vZDos6t 3Mbtv47gZ2eN6Fy7owpgHHwSLYHS42P56P 38JyF2RQknBUMETyRT2yGndDJFYSp6hJNg Due to limitations on disk space, I'm considering removing this file. Unless anyone has a need for it, so: can anyone tell me what this can be used for? I know it can be used to make a Top 100 of addresses with most receiving transactions. Instead of this list, I want to make a new list without duplicates, but still in order of first appearance of each address. Thanks to bob123, I can do that now! I'll also keep the sorted list, because that list is very convenient to find matches on a list.
I need some time to process all data. When done, I'll rewrite some of my posts.
|
| | Peach BTC bitcoin | │ | Buy and Sell Bitcoin P2P | │ | . .
▄▄███████▄▄ ▄██████████████▄ ▄███████████████████▄ ▄█████████████████████▄ ▄███████████████████████▄ █████████████████████████ █████████████████████████ █████████████████████████ ▀███████████████████████▀ ▀█████████████████████▀ ▀███████████████████▀ ▀███████████████▀ ▀▀███████▀▀
▀▀▀▀███████▀▀▀▀ | | EUROPE | AFRICA LATIN AMERICA | | | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
███████▄█ ███████▀ ██▄▄▄▄▄░▄▄▄▄▄ █████████████▀ ▐███████████▌ ▐███████████▌ █████████████▄ ██████████████ ███▀███▀▀███▀ | . Download on the App Store | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
▄██▄ ██████▄ █████████▄ ████████████▄ ███████████████ ████████████▀ █████████▀ ██████▀ ▀██▀ | . GET IT ON Google Play | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ |
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3584
Merit: 18196
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
August 30, 2020, 01:10:30 PM |
|
Sample: unique_addresses.txt.gz: all Bitcoin addresses ever used, without duplicates, sorted by address ( Warning: 15 GB) I didn't have enough disk space to process the 31 GB file the way I want it, so I've (temporarily) removed this file. After I'm done with that, I'll restore the missing file. Give it a few days. Since I got no response to my question above, I'll go with 2 versions: - All addresses ever used, without duplicates, in order of first appearance.
- All addresses ever used, without duplicates, sorted.
The first file feels nostalgic, the second file will be very convenient to match addresses with a list of your own.
|
| | Peach BTC bitcoin | │ | Buy and Sell Bitcoin P2P | │ | . .
▄▄███████▄▄ ▄██████████████▄ ▄███████████████████▄ ▄█████████████████████▄ ▄███████████████████████▄ █████████████████████████ █████████████████████████ █████████████████████████ ▀███████████████████████▀ ▀█████████████████████▀ ▀███████████████████▀ ▀███████████████▀ ▀▀███████▀▀
▀▀▀▀███████▀▀▀▀ | | EUROPE | AFRICA LATIN AMERICA | | | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
███████▄█ ███████▀ ██▄▄▄▄▄░▄▄▄▄▄ █████████████▀ ▐███████████▌ ▐███████████▌ █████████████▄ ██████████████ ███▀███▀▀███▀ | . Download on the App Store | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
▄██▄ ██████▄ █████████▄ ████████████▄ ███████████████ ████████████▀ █████████▀ ██████▀ ▀██▀ | . GET IT ON Google Play | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ |
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3584
Merit: 18196
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
September 08, 2020, 11:33:39 AM Last edit: November 28, 2020, 03:01:20 PM by LoyceV |
|
Sample: unique_addresses.txt.gz: all Bitcoin addresses ever used, without duplicates, sorted by address ( Warning: 15 GB) I didn't have enough disk space to process the 31 GB file the way I want it, so I've (temporarily) removed this file. After I'm done with that, I'll restore the missing file. Give it a few days. Well, that didn't go as planned  Although I can keep all unique addresses in order of first appearance, it turns out 100 GB disk space is not enough for the temporary space it needs. Because of the large data traffic, I don't want to use loyce.club's AWS hosting for this, and I'm not sure yet if I should get another VPS just for this. An alternative would be to run it from my home PC, but the heavy writing will just wear out my SSD. So this project is on hold for now. Daily updates still continue.
|
| | Peach BTC bitcoin | │ | Buy and Sell Bitcoin P2P | │ | . .
▄▄███████▄▄ ▄██████████████▄ ▄███████████████████▄ ▄█████████████████████▄ ▄███████████████████████▄ █████████████████████████ █████████████████████████ █████████████████████████ ▀███████████████████████▀ ▀█████████████████████▀ ▀███████████████████▀ ▀███████████████▀ ▀▀███████▀▀
▀▀▀▀███████▀▀▀▀ | | EUROPE | AFRICA LATIN AMERICA | | | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
███████▄█ ███████▀ ██▄▄▄▄▄░▄▄▄▄▄ █████████████▀ ▐███████████▌ ▐███████████▌ █████████████▄ ██████████████ ███▀███▀▀███▀ | . Download on the App Store | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
▄██▄ ██████▄ █████████▄ ████████████▄ ███████████████ ████████████▀ █████████▀ ██████▀ ▀██▀ | . GET IT ON Google Play | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ |
|
|
|
NotATether
Legendary
Offline
Activity: 1876
Merit: 7907
Wheel of Whales 🐳
|
 |
October 22, 2020, 10:47:29 AM |
|
@LoyceV
Are you downloading Blockchair dumps at the slow rate? I just contacted Blockchair for an API key, which enables people to download at the fast rate, and a support rep told me they cost $500/month.
If network bandwidth is a problem I'm able to host this on my hardware if you like.
|
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3584
Merit: 18196
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
October 22, 2020, 06:21:22 PM |
|
Are you downloading Blockchair dumps at the slow rate? Yes. But 100 kB/s isn't a problem anymore: the initial download took a long time, but for daily updates it doesn't take that long. I just contacted Blockchair for an API key, which enables people to download at the fast rate, and a support rep told me they cost $500/month. I thought they'd offer it for free for certain users, but this makes sense from a business point of view. If network bandwidth is a problem I'm able to host this on my hardware if you like. Just this month I'm at 264 GB for this project, and 174 GB for all Bitcoin addresses with a balance. That means this full list is only downloaded a few times per month, but the funded addy list is downloaded a few times per day. I'm more in need for more disk space for sorting this data, but I haven't decided yet where to host it. 100 GB disk space isn't enough.
|
| | Peach BTC bitcoin | │ | Buy and Sell Bitcoin P2P | │ | . .
▄▄███████▄▄ ▄██████████████▄ ▄███████████████████▄ ▄█████████████████████▄ ▄███████████████████████▄ █████████████████████████ █████████████████████████ █████████████████████████ ▀███████████████████████▀ ▀█████████████████████▀ ▀███████████████████▀ ▀███████████████▀ ▀▀███████▀▀
▀▀▀▀███████▀▀▀▀ | | EUROPE | AFRICA LATIN AMERICA | | | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
███████▄█ ███████▀ ██▄▄▄▄▄░▄▄▄▄▄ █████████████▀ ▐███████████▌ ▐███████████▌ █████████████▄ ██████████████ ███▀███▀▀███▀ | . Download on the App Store | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
▄██▄ ██████▄ █████████▄ ████████████▄ ███████████████ ████████████▀ █████████▀ ██████▀ ▀██▀ | . GET IT ON Google Play | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ |
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3584
Merit: 18196
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
November 28, 2020, 07:47:21 PM |
|
Just yesterday, I got a good deal on a new VPS (more memory, more disk, more CPU and more bandwidth). It's dedicated to only this project (and I have no idea how reliable it's going to be). I've updated the OP.
There's a problem though. There are: 756,494,121 addresses according to addresses_in_order_of_first_appearance.txt.gz 756,524,407 addresses according to addresses_sorted.txt.gz Obviously, these numbers should be the same. I haven't scheduled automated updates yet, I first want to recreate this data from scratch to see which number is correct.
|
| | Peach BTC bitcoin | │ | Buy and Sell Bitcoin P2P | │ | . .
▄▄███████▄▄ ▄██████████████▄ ▄███████████████████▄ ▄█████████████████████▄ ▄███████████████████████▄ █████████████████████████ █████████████████████████ █████████████████████████ ▀███████████████████████▀ ▀█████████████████████▀ ▀███████████████████▀ ▀███████████████▀ ▀▀███████▀▀
▀▀▀▀███████▀▀▀▀ | | EUROPE | AFRICA LATIN AMERICA | | | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
███████▄█ ███████▀ ██▄▄▄▄▄░▄▄▄▄▄ █████████████▀ ▐███████████▌ ▐███████████▌ █████████████▄ ██████████████ ███▀███▀▀███▀ | . Download on the App Store | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
▄██▄ ██████▄ █████████▄ ████████████▄ ███████████████ ████████████▀ █████████▀ ██████▀ ▀██▀ | . GET IT ON Google Play | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ |
|
|
|
therealbtcdave
Newbie
Offline
Activity: 12
Merit: 3
|
 |
November 28, 2020, 08:08:49 PM |
|
Just yesterday, I got a good deal on a new VPS (more memory, more disk, more CPU and more bandwidth). It's dedicated to only this project (and I have no idea how reliable it's going to be). I've updated the OP.
There's a problem though. There are: 756,494,121 addresses according to addresses_in_order_of_first_appearance.txt.gz 756,524,407 addresses according to addresses_sorted.txt.gz Obviously, these numbers should be the same. I haven't scheduled automated updates yet, I first want to recreate this data from scratch to see which number is correct.
Thanks for the update the last .gz you had I think was from September.
|
|
|
|
PrimeNumber7
Copper Member
Legendary
Offline
Activity: 1694
Merit: 1904
Amazon Prime Member #7
|
 |
November 29, 2020, 06:01:21 AM |
|
Some results: The awk-thing uses just over 1 GB memory for 10 million addresses. So for 1.5 billion addresses, a 256 GB server should be enough. At AWS, that would cost a few dollars per hour. As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket that can be accessed by a server. If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources. Separately, sorting lists are not scalable, period. There are some things you can do to increase the speed, such as keep the list in RAM, or cutting the number of instances the entire list is reviewed, but you ultimately cannot sort an unordered very large list.
|
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3584
Merit: 18196
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
November 29, 2020, 09:03:47 AM Last edit: March 26, 2022, 07:20:14 PM by LoyceV |
|
Thanks for the update the last .gz you had I think was from September. Correct (August 6 and September 2). As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket that can be accessed by a server. Amazon charges $0.09 per GB outgoing data, that's ridiculous for this purpose (my current 5 TB bandwidth limit would cost $450 per month when maxed out). And Amazon wants my creditcard instead of Bitcoin. If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources. Still, that's quite excessive for just 2 files that are barely used. Separately, sorting lists are not scalable, period. Actually, sort performs quite well. I've tested: 10M lines: 10 seconds (fits in RAM) 50M lines: 63 seconds (starts using temporary files) 250M lines: 381 seconds (using 2 GB RAM and temporary files) So a 5 times larger file takes 6 times longer to sort. I'd say scalability is quite good. It just takes a while because it uses temporare disk storage. Given enough RAM, it can utilize multiple cores. There are some things you can do to increase the speed, such as keep the list in RAM, or cutting the number of instances the entire list is reviewed, but you ultimately cannot sort an unordered very large list. The 256 GB RAM server idea would cost a few dollars per hour, so I'll do with less.
|
| | Peach BTC bitcoin | │ | Buy and Sell Bitcoin P2P | │ | . .
▄▄███████▄▄ ▄██████████████▄ ▄███████████████████▄ ▄█████████████████████▄ ▄███████████████████████▄ █████████████████████████ █████████████████████████ █████████████████████████ ▀███████████████████████▀ ▀█████████████████████▀ ▀███████████████████▀ ▀███████████████▀ ▀▀███████▀▀
▀▀▀▀███████▀▀▀▀ | | EUROPE | AFRICA LATIN AMERICA | | | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
███████▄█ ███████▀ ██▄▄▄▄▄░▄▄▄▄▄ █████████████▀ ▐███████████▌ ▐███████████▌ █████████████▄ ██████████████ ███▀███▀▀███▀ | . Download on the App Store | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
▄██▄ ██████▄ █████████▄ ████████████▄ ███████████████ ████████████▀ █████████▀ ██████▀ ▀██▀ | . GET IT ON Google Play | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ |
|
|
|
NotATether
Legendary
Offline
Activity: 1876
Merit: 7907
Wheel of Whales 🐳
|
 |
November 29, 2020, 12:31:11 PM |
|
As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket that can be accessed by a server. If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources. That may save on local resources but you will be paying a lot of money per month if people download several hundred gigabytes each month particularly if the files are large like the files hosted in the OP. If you have the network capacity then it's better to just serve it locally (except, AWS bills your upload traffic too  )
|
|
|
|
PrimeNumber7
Copper Member
Legendary
Offline
Activity: 1694
Merit: 1904
Amazon Prime Member #7
|
 |
November 29, 2020, 10:23:57 PM |
|
As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket that can be accessed by a server. Amazon charges $0.09 per GB outgoing data, that's rediculous for this purpose (my current 5 TB bandwidth limit would cost $450 per month when maxed out). And Amazon wants my creditcard instead of Bitcoin. I had used AWS as an example because I believed you used it for some of your other projects. Yes, transferring data to the internet is very expensive. You can use a CDN (content delivery network) to reduce costs a little bit. 5 TB of data is a lot. Separately, sorting lists are not scalable, period. Actually, sort performs quite well. I've tested: 10M lines: 10 seconds (fits in RAM) 50M lines: 63 seconds (starts using temporary files) 250M lines: 381 seconds (using 2 GB RAM and temporary files) So a 5 times larger file takes 6 times longer to sort. I'd say scalability is quite good. I think you are proving my point. The more input you have, the more time it takes to process one additional input. To put it another way, it takes 1 unit of time to sort a list with a length of 2, it takes 1 + a units of time to sort a list with a length of 3, it takes 1 + a + b units of time to sort a list with a length of 4, and so on. The longer the list, the longer it will take to sort one additional line. As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket that can be accessed by a server. If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources. That may save on local resources but you will be paying a lot of money per month if people download several hundred gigabytes each month particularly if the files are large like the files hosted in the OP. If you have the network capacity then it's better to just serve it locally (except, AWS bills your upload traffic too  ) Your local ISP might not like it very much if you are uploading that much data.
|
|
|
|
Vod
Legendary
Offline
Activity: 3976
Merit: 3208
Licking my boob since 1970
|
 |
November 29, 2020, 11:01:11 PM |
|
Your local ISP might not like it very much if you are uploading that much data.
Quickseller, most ISPs have a download bottleneck - not upload. So few people upload more than they download that most ISPs don't even restrict uploads. What ISP does LoyceV use that does not like uploading?
|
|
|
|
NotATether
Legendary
Offline
Activity: 1876
Merit: 7907
Wheel of Whales 🐳
|
 |
November 29, 2020, 11:06:07 PM |
|
~snip If you have the network capacity then it's better to just serve it locally (except, AWS bills your upload traffic too  ) Your local ISP might not like it very much if you are uploading that much data. Sorry, when I said locally, I meant on a VPS with another cloud provider with unmetered traffic, such as Hetzner. I guess I have been doing too much of my work on the cloud to tell the difference anymore.
|
|
|
|
PrimeNumber7
Copper Member
Legendary
Offline
Activity: 1694
Merit: 1904
Amazon Prime Member #7
|
 |
November 30, 2020, 03:42:59 AM |
|
~snip If you have the network capacity then it's better to just serve it locally (except, AWS bills your upload traffic too  ) Your local ISP might not like it very much if you are uploading that much data. Sorry, when I said locally, I meant on a VPS with another cloud provider with unmetered traffic, such as Hetzner. I guess I have been doing too much of my work on the cloud to tell the difference anymore. Ahh, gotcha. I was under the impression that traffic out of the AWS network (for AWS) will count as egress traffic, and will be billed accordingly. Migrating your data from AWS to GCS will incur a charge from AWS for the amount of your data. There might be ways around this, I'm not sure.
|
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3584
Merit: 18196
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
November 30, 2020, 12:26:10 PM |
|
I had used AWS as an example because I believed you used it for some of your other projects. Correct, loyce.club runs on AWS (sponsored). Yes, transferring data to the internet is very expensive. You can use a CDN (content delivery network) to reduce costs a little bit. 5 TB of data is a lot. I highly doubt I'd find a cheaper deal  I hope not to use the full 5 TB though, I expect some overselling and don't want to push it to the limit. I think you are proving my point. The more input you have, the more time it takes to process one additional input. An exponential increase in processing time is to be expected. I consider the increase acceptable for scaling: if the number of addresses is 5 times larger than it is now (20 years from now?), it takes only 6 times more processing power. The longer the list, the longer it will take to sort one additional line. At some point a database might beat raw text sorting, but for now I'm good with this  Your local ISP might not like it very much if you are uploading that much data. I should add a storage VPS to my shopping list. I now indeed have to transfer a large amount of data through my local internet, and it's terrible compared to server performance. I meant on a VPS with another cloud provider with unmetered traffic, such as Hetzner. I'm not using anything with "unmetered" traffic.
Still working on restoring all data from scratch. I'm curious to see if it matches any of the 2 existing files. I don't really get the focus on data traffic though, right after I got a good deal on a new VPS. I'm good for now  I was under the impression that traffic out of the AWS network (for AWS) will count as egress traffic, and will be billed accordingly. AWS charges $0.09/GB, and especially since this one is sponsored, I don't want to abuse it. I love how stable the server is though, it has never been down.
|
| | Peach BTC bitcoin | │ | Buy and Sell Bitcoin P2P | │ | . .
▄▄███████▄▄ ▄██████████████▄ ▄███████████████████▄ ▄█████████████████████▄ ▄███████████████████████▄ █████████████████████████ █████████████████████████ █████████████████████████ ▀███████████████████████▀ ▀█████████████████████▀ ▀███████████████████▀ ▀███████████████▀ ▀▀███████▀▀
▀▀▀▀███████▀▀▀▀ | | EUROPE | AFRICA LATIN AMERICA | | | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
███████▄█ ███████▀ ██▄▄▄▄▄░▄▄▄▄▄ █████████████▀ ▐███████████▌ ▐███████████▌ █████████████▄ ██████████████ ███▀███▀▀███▀ | . Download on the App Store | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
▄██▄ ██████▄ █████████▄ ████████████▄ ███████████████ ████████████▀ █████████▀ ██████▀ ▀██▀ | . GET IT ON Google Play | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ |
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3584
Merit: 18196
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
December 02, 2020, 03:49:36 PM Last edit: December 02, 2020, 05:58:45 PM by LoyceV |
|
There's a problem though. There are: 756,494,121 addresses according to addresses_in_order_of_first_appearance.txt.gz 756,524,407 addresses according to addresses_sorted.txt.gz Obviously, these numbers should be the same. I haven't scheduled automated updates yet, I first want to recreate this data from scratch to see which number is correct. After recreating this data, I now have 757,437,766 unique addresses (don't click this link unless you want to download 18 GB). My next step would be to add a few days of data, and count addresses again. Next, I'll recreate all data "from scratch", and see if I end up with the same numbers. I don't know why there's a difference, and I don't like loose ends in my data.
|
| | Peach BTC bitcoin | │ | Buy and Sell Bitcoin P2P | │ | . .
▄▄███████▄▄ ▄██████████████▄ ▄███████████████████▄ ▄█████████████████████▄ ▄███████████████████████▄ █████████████████████████ █████████████████████████ █████████████████████████ ▀███████████████████████▀ ▀█████████████████████▀ ▀███████████████████▀ ▀███████████████▀ ▀▀███████▀▀
▀▀▀▀███████▀▀▀▀ | | EUROPE | AFRICA LATIN AMERICA | | | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
███████▄█ ███████▀ ██▄▄▄▄▄░▄▄▄▄▄ █████████████▀ ▐███████████▌ ▐███████████▌ █████████████▄ ██████████████ ███▀███▀▀███▀ | . Download on the App Store | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
▄██▄ ██████▄ █████████▄ ████████████▄ ███████████████ ████████████▀ █████████▀ ██████▀ ▀██▀ | . GET IT ON Google Play | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ |
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3584
Merit: 18196
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
December 15, 2020, 11:28:03 AM |
|
It took a while, and the new VPS got a lot slower by now, but I've enabled updates again: Downloads are fast, I've seen 20-100 MB/s. Enjoy  My latest count: 764,534,424 Bitcoin addresses have been used.
|
| | Peach BTC bitcoin | │ | Buy and Sell Bitcoin P2P | │ | . .
▄▄███████▄▄ ▄██████████████▄ ▄███████████████████▄ ▄█████████████████████▄ ▄███████████████████████▄ █████████████████████████ █████████████████████████ █████████████████████████ ▀███████████████████████▀ ▀█████████████████████▀ ▀███████████████████▀ ▀███████████████▀ ▀▀███████▀▀
▀▀▀▀███████▀▀▀▀ | | EUROPE | AFRICA LATIN AMERICA | | | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
███████▄█ ███████▀ ██▄▄▄▄▄░▄▄▄▄▄ █████████████▀ ▐███████████▌ ▐███████████▌ █████████████▄ ██████████████ ███▀███▀▀███▀ | . Download on the App Store | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
▄██▄ ██████▄ █████████▄ ████████████▄ ███████████████ ████████████▀ █████████▀ ██████▀ ▀██▀ | . GET IT ON Google Play | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ |
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3584
Merit: 18196
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
December 21, 2020, 10:22:11 AM |
|
I'm glad to see this service is being used too:  I'd love to hear feedback (because I'm curious): what are you guys using this for?
|
| | Peach BTC bitcoin | │ | Buy and Sell Bitcoin P2P | │ | . .
▄▄███████▄▄ ▄██████████████▄ ▄███████████████████▄ ▄█████████████████████▄ ▄███████████████████████▄ █████████████████████████ █████████████████████████ █████████████████████████ ▀███████████████████████▀ ▀█████████████████████▀ ▀███████████████████▀ ▀███████████████▀ ▀▀███████▀▀
▀▀▀▀███████▀▀▀▀ | | EUROPE | AFRICA LATIN AMERICA | | | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
███████▄█ ███████▀ ██▄▄▄▄▄░▄▄▄▄▄ █████████████▀ ▐███████████▌ ▐███████████▌ █████████████▄ ██████████████ ███▀███▀▀███▀ | . Download on the App Store | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
▄██▄ ██████▄ █████████▄ ████████████▄ ███████████████ ████████████▀ █████████▀ ██████▀ ▀██▀ | . GET IT ON Google Play | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ |
|
|
|
PrimeNumber7
Copper Member
Legendary
Offline
Activity: 1694
Merit: 1904
Amazon Prime Member #7
|
 |
December 22, 2020, 03:28:37 AM |
|
I had used AWS as an example because I believed you used it for some of your other projects. Correct, loyce.club runs on AWS (sponsored). Yes, transferring data to the internet is very expensive. You can use a CDN (content delivery network) to reduce costs a little bit. 5 TB of data is a lot. I highly doubt I'd find a cheaper deal  I hope not to use the full 5 TB though, I expect some overselling and don't want to push it to the limit. I am not sure what level of access you have to the AWS account sponsoring your site. However, it is possible to setup a storage bucket so that anyone can access it, but that the requestors IP address is among the IP addresses of the same region the files are stored in. See this stack overflow discussion. You can also setup the storage bucket such that the requestor pays for egress traffic. The longer the list, the longer it will take to sort one additional line. At some point a database might beat raw text sorting, but for now I'm good with this  Using a database will not solve this problem. There are some things a DB can do to make sorting go from O^ 2 to O^ 2/n, but this is still exponential growth. You make the argument that your input size is sufficiently small such that having exponential complexity is okay, and you may have a point. I was under the impression that traffic out of the AWS network (for AWS) will count as egress traffic, and will be billed accordingly. AWS charges $0.09/GB, and especially since this one is sponsored, I don't want to abuse it. I love how stable the server is though, it has never been down. AWS is very reliable. I would not expect much downtime when using AWS or other major cloud providers. Egress traffic is very expensive though. Downloads are fast, I've seen 20-100 MB/s. Enjoy  This works out to approximately a 24-minute download. I measured a download speed of ~125 Mbps using a colab instance.
|
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3584
Merit: 18196
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
December 22, 2020, 09:16:24 AM |
|
I am not sure what level of access you have to the AWS account sponsoring your site. Just root access to loyce.club, but addresses.loyce.club and alladdresses.loyce.club aren't hosted at AWS. This month so far, they've passed 1 TB of traffic, so it was a good call not to use AWS (this would cost $90). However, it is possible to setup a storage bucket so that anyone can access it, but that the requestors IP address is among the IP addresses of the same region the files are stored in. That seems like overkill for this. Using a database will not solve this problem. There are some things a DB can do to make sorting go from O^2 to O^2/n, but this is still exponential growth. For a database it would only mean checking and adding 750k addresses per day, instead of sorting the entire data again. I expect sort to take less long too when the majority of ("old") data is already sorted, but haven't tested for speed differences. AWS is very reliable. I have never experienced any downtime with AWS, unlike all VPS providers I've ever used. Those "external projects" don't have much priority to me, if it's down I don't lose scraping data. This works out to approximately a 24-minute download. I measured a download speed of ~125 Mbps using a colab instance. It's doing the biweekly data update, that probably slowed it down too.
|
| | Peach BTC bitcoin | │ | Buy and Sell Bitcoin P2P | │ | . .
▄▄███████▄▄ ▄██████████████▄ ▄███████████████████▄ ▄█████████████████████▄ ▄███████████████████████▄ █████████████████████████ █████████████████████████ █████████████████████████ ▀███████████████████████▀ ▀█████████████████████▀ ▀███████████████████▀ ▀███████████████▀ ▀▀███████▀▀
▀▀▀▀███████▀▀▀▀ | | EUROPE | AFRICA LATIN AMERICA | | | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
███████▄█ ███████▀ ██▄▄▄▄▄░▄▄▄▄▄ █████████████▀ ▐███████████▌ ▐███████████▌ █████████████▄ ██████████████ ███▀███▀▀███▀ | . Download on the App Store | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ | ▄▀▀▀ █ █ █ █ █ █ █ █ █ █ █ ▀▄▄▄ |
▄██▄ ██████▄ █████████▄ ████████████▄ ███████████████ ████████████▀ █████████▀ ██████▀ ▀██▀ | . GET IT ON Google Play | ▀▀▀▄ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▀ |
|
|
|
|