Bitcoin Forum

Bitcoin => Project Development => Topic started by: LoyceV on August 01, 2020, 09:05:46 AM



Title: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: LoyceV on August 01, 2020, 09:05:46 AM
Background
To follow up on List of all Bitcoin addresses with a balance (https://bitcointalk.org/index.php?topic=5254914.0) and this post (https://bitcointalk.org/index.php?topic=5259621.msg54833270#msg54833270), I made a list of all Bitcoin addresses that have ever been used.

The data
See alladdresses.loyce.club (http://alladdresses.loyce.club/?C=M;O=D) (new location)
I now have the resources (RAM, CPU power and disk space) and code (https://bitcointalk.org/index.php?topic=5265993.msg55057504#msg55057504) to show unique addresses in their original order. Each address is only shown once. I have 2 large files:

1. All Bitcoin addresses ever used, in chronological order, without duplicates.
Sample: all_Bitcoin_addresses_ever_used_in_order_of_first_appearance.txt.gz (http://alladdresses.loyce.club/all_Bitcoin_addresses_ever_used_in_order_of_first_appearance.txt.gz): (Warning: 33 GB):
Code:
1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa
12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
1HLoD9E4SDFFPDiYfNYnkBLQ85Y51J3Zb1
.......
3GFfFQAFgXKiA1qqUK6rqBpEpG4vZDos6t
3Mbtv47gZ2eN6Fy7owpgHHwSLYHS42P56P
38JyF2RQknBUMETyRT2yGndDJFYSp6hJNg

2. All Bitcoin addresses ever used, sorted by address, without duplicates.
Sample: all_Bitcoin_addresses_ever_used_sorted.txt.gz (http://alladdresses.loyce.club/all_Bitcoin_addresses_ever_used_sorted.txt.gz): (Warning: 29 GB):
Code:
1111111111111111111114oLvT2
111111111111111111112BEH2ro
111111111111111111112xT3273
.......
s-ffd80dee5966fb23c1a483b28f6bfcbc
s-fff5d0faa9628c188e97661f0e185fce
s-ffff291613d413b4ac128df96a462294

Updates
Updates happen on Tuesday!
Sorting a list that doesn't fit in the server's RAM is slow. Therefore I only do weekly updates (for now). Check the file date here (http://alladdresses.loyce.club/?C=M;O=D) to see how old it is. If an update fails, please post here.
In between updates, I create daily updates: alladdresses.loyce.club/daily_updates/ (http://alladdresses.loyce.club/daily_updates/?C=S;O=D). These txt-files contain unique addresses (for that day) in order of appearance.
I won't keep older snapshots.

Bandwidth
This server should have enough bandwidth to support all my blockchain data projects. If things get crazy, I may have to resort to using torrents.

Credits
Blockchair Database Dumps (https://blockchair.com/dumps) has a staggering amount of data, easily accessible (at 10 kB/s (https://gz.blockchair.com/README.html) (or recently 100 kB/s)) with daily updates. All data presented in this topic comes from Blockchair.

No spam please.
Self-moderated against spam. Discussion and questions are welcome.

Q&A
Can you please clarify, what is the type of these d- and s- addresses?
This is how Blockchair.com shows OP_RETURN. From the main page the search field doesn't show them, but you can replace a Bitcoin address in the URL to find them: https://blockchair.com/bitcoin/address/d-d0d953f2e7043342540a1407243e49fe.

Tips and tricks
Some suggestions for Linux/VPS users:
Code:
wget http://alladdresses.loyce.club/addresses_sorted.txt.gz -O - | gunzip > addresses_sorted.txt
This doesn't save the .gz but extracts it while downloading.

Code:
comm -12 <(sort list.txt) addresses_sorted.txt
This outputs all Bitcoin addresses from "list.txt" that have ever been funded.

Code:
comm -12 <(sort list.txt) addresses_sorted.txt > output.txt
This does the same, but writes to output.txt instead of console.
This search is fast, even with millions of addresses in list.txt, it's mainly limited by how fast your computer can read from disk.



Related topics
Bitcoin block data available in CSV format (https://bitcointalk.org/index.php?topic=5246271.0)
List of all Bitcoin addresses with a balance (https://bitcointalk.org/index.php?topic=5254914.0)
List of all Bitcoin addresses ever used (https://bitcointalk.org/index.php?topic=5265993.0)
[~500 GB] Bitcoin block data: inputs, outputs and transactions (https://bitcointalk.org/index.php?topic=5307550.0)
[800 GB] Ethereum data (https://bitcointalk.org/index.php?topic=5307550.msg56043463#msg56043463)


Title: Re: List of all Bitcoin addresses ever used
Post by: LoyceV on August 01, 2020, 09:06:06 AM
Some interesting (?) statistics (updated until blockchair_bitcoin_outputs_20200719.tsv.gz (https://gz.blockchair.com/bitcoin/outputs/blockchair_bitcoin_outputs_20200719.tsv.gz))
Total address count: 1,484,589,749
1... address count: 1,039,899,708
3... address count: 343,485,961
bc1q... address count: 55,006,904
...-... (with a "dash") address count: 46,197,161

Unique address count: 693,180,830
1... address count: 470,943,308
3... address count: 167,941,821
bc1q... address count: 39,137,878
...-... (with a "dash") weird address count: 15,157,808

Addresses with most receiving transactions
This is the Top 100, the number in front of the address shows how many transactions it has received:
Code:
4467608 1HckjUpRGcrrRAtFaaCAUaGjsPx9oYmLaZ
1900428 1NxaBCFQwejSZbQfWcYNwgqML5wWoE3rK4
1601193 1dice8EMZmqKvrGE4Qc9bUFf9PX3xaYDp
1527471 1FoWyxwPXuj4C6abqwhjDWdz6D4PZgYRjA
1204787 1LuckyR1fFHEsXYyx5QK4UFzv3PEAepPMK
1105406 1dice97ECuByXAvqXpaYzSaQuPVvrtmz6
1021575 3CD1QW6fjgTwKq3Pj97nty28WZAVkziNom
1009836 1G47mSr3oANXMafVrR8UC4pzV7FEAzo3r9
 929737 3JXRVxhrk2o9f4w3cQchBLwUeegJBj6BEp
 872274 1J37CY8hcdUXQ1KfBhMCsUVafa8XjDsdCn
 859422 3422VtS7UtCvXYxoXMVp6eZupR252z85oC
 841967 168o1kqNquEJeR9vosUB5fw4eAwcVAgh8P
 832807 1P9RQEr2XeE3PEb44ZE35sfZRRW1JHU8qx
 782811 1VayNert3x1KzbpzMGt2qdqrAThiRovi8
 689574 37Tm3Qz8Zw2VJrheUUhArDAoq58S6YrS3g
 676674 1DUb2YYbQA1jjaNYzVXLZ7ZioEhLXtbUru
 663458 bc1qwqdg6squsna38e46795at95yu9atm8azzmyvckulcc7kytlcckxswvvzej
 631610 17kb7c9ndg7ioSuzMWEHWECdEVUegNkcGc
 595853 1dice9wcMu5hLF4g81u8nioL5mmSHTApw
 580565 1Po1oWkD2LmodfkBYiAktwh76vkF93LKnh
 573787 1LAnF8h3qMGx3TSwNUHVneBZUEpwE4gu3D
 520889 1NDyJtNTjmwk5xPNhjgAMu4HDHigtobu1s
 505956 13vHWR3iLsHeYwT42RnuKYNBoVPrKKZgRv
 448252 1Fi9J5TeaWPHdU5cTJ4e9jr3V58SrWtUuT
 437634 1dice7fUkz5h4z2wPc1wLMPWgB5mDwKDx
 406471 1MPxhNkSzeTNTHSZAibMaS8HS1esmUL1ne
 395663 1dice7W2AicHosf5EL3GFDUVga7TgtPFn
 394249 1LuckyY9fRzcJre7aou7ZhWVXktxjjBb9S
 389038 1D5bPm1YAdn9WvAAixht7PbACU3TtkqtJJ
 376310 17A16QmavnUfCW11DAApiJxp7ARnxN5pGX
 364311 3HNSiAq7wFDaPsYDcUxNSRMD78qVcYKicw
 363898 3MfN5to5K5be2RupWE8rjJHQ6V9L8ypWeh
 357641 3HRZjedwF2AJejNTtgznWnas4E6froNP5r
 354691 1LuckyG4tMMZf64j6ea7JhCz7sDpk6vdcS
 346986 366Dgw4pi3rnvu5zizVWZF6nijWxZWc6RA
 341430 1dice6YgEVBf88erBFra9BHf6ZMoyvG88
 326839 d-d0d953f2e7043342540a1407243e49fe
 325099 38jMiiZs2C5n5MPkyc5pSA7wwW6H4p6hPa
 293567 38ENmTr2AD1avJrmmi9iM7PfS6nZVmuMKf
 289070 d-0e9deef32abfc454392d21725f9defef
 285507 1N52wHoVR79PMDishab2XmRHsbekCdGquK
 282321 3PUuiYu5cFMsagkffArrKZzQFtWdHttU3x
 280691 367f4YWz1VCFaqBqwbTrzwi2b1h2U3w1AF
 280107 1FoxBitjXcBeZUS4eDzPZ7b124q3N7QJK7
 262539 d-73fd8c31c9fc1d084f44b301bb7adb6a
 262317 1Fi57hAqyYYwaQVdA7a9qSKfiukBbt31G3
 253795 1K2SXgApmo9uZoyahvsbSanpVWbzZWVVMF
 252344 1dice5wwEZT2u6ESAdUGG6MHgCpbQqZiy
 251282 3JnFBLxDCutY3bZEZsPTkHAaUA1bxmEMX2
 250862 1diceDCd27Cc22HV3qPNZKwGnZ8QwhLTc
 247797 352zT3Ts9piSDhZpBsDoZMvdtDmJioQNBo
 246472 12JYmnfYU2ghzjwUAspzJsSnmJtK9bZPYR
 243955 1x6YnuBVeeE65dQRZztRWgUPwyBjHCA5g
 240428 3A4U175prUGEn3B1gUDkz32u8fnF9Nx3Ly
 232303 357d4rAjQhDPaWhZrBAFY7aizVPkNSq2DH
 230290 18rdKmjrg1EawxgiVT3ikLExj6GWS2MNCk
 229128 3JjPf13Rd8g6WAyvg8yiPnrsdjJt1NP4FC
 226837 1HWqsgnSd12Gv8SpoUMi1Cj8hp79BTSpW7
 226259 1changemCPo732F6oYUyhbyGtFcNVjprq
 224451 138o15eFWEEPv2ayKW2CZCgVvv5ZaZvomP
 224217 d-752ed0099932a96fbc0a854a4d3a300f
 219697 bc1qnsupj8eqya02nm8v6tmk93zslu2e2z8chlmcej
 219174 s-e3b0c44298fc1c149afbf4c8996fb924
 215870 1Kr6QSydW9bFQG1mXiPNNu6WpJGmUa9i1g
 215691 37p9pUugydmoLpQyFLLqGAgjWmUFERa1Pq
 215520 19iVyH1qUxgywY8LJSbpV4VavjZmyuEyxV
 212059 1dice7EYzJag7SxkdKXLr8Jn14WUb3Cf1
 209001 1F89hmmrtonJfAQNAqDmeDadcw7AsZcvXG
 207701 1NDpZ2wyFekVezssSXv2tmQgmxcoHMUJ7u
 207697 1Bd5wrFxHYRkk4UCFttcPNMYzqJnQKfXUE
 207524 15fXdTyFL1p53qQ8NkrjBqPUbPWvWmZ3G9
 207499 14719bzrTyMvEPcr7ouv9R8utncL9fKJyf
 207424 18uvwkMJsg9cxFEd1QDFgQpoeXWmmSnqSs
 207385 1J4yuJFqozxLWTvnExR4Xxe9W4B89kaukY
 207376 1Bqm5MDo82m1FTxV3qYNUUEKnESPRhk9jd
 207256 1HVpyjYEPwQhvRQ3dL8tGe9kiydti616sX
 207228 17NKcZNXqAbxWsTwB1UJHjc9mQG3yjGALA
 207218 1HjDauL2kth6KJUz5vX198Nvp1xN1hgYRb
 207187 13h1DP2Boo9TAsenphroACxhNy7pGxDYXd
 207138 1MSzmVTBaaSpKDARK3VGvP8v7aCtwZ9zbw
 207053 1GoK6fv4tZKXFiWL9NuHiwcwsi8JAFiwGK
 207006 13HFqPr9Ceh2aBvcjxNdUycHuFG7PReGH4
 206834 1L4EThM6x3Rd2PjNbs1U136FpMq4Gmo3fJ
 206826 14ChPPM8rPYJeHnw6kMVUDnNNKx1KnjYW4
 206808 1AdN2my8NxvGcisPGYeQTAKdWJuUzNkQxG
 206760 1DpsR91YmHUDTtiuH1pPCuG3RqAkmg6YKB
 206707 1PeohaRGaTF8cSzDqP1yYfzDah66xiriEQ
 206664 1JmcV7G3r8k7ev2EkS84MmsvxGyhiRGP84
 206572 1HZHBnH2FbHNWieMxAh4xBPfgfuxW15UPt
 206469 18czPiA9PcCs7rFTBZnhvNAWuh1pEZRpGJ
 206346 12Cf6nCcRtKERh9cQm3Z29c9MWvQuFSxvT
 206344 1MPerpQzTABa1K2eXQxsQTDSZtDQHWf6vk
 206247 1dice1e6pdhLzzWQq7yMidf6j8eAg7pkY
 206243 18XSLnBZ8ydMUkaifU6sQBMJzmm7JvDeUp
 205690 bc1quq29mutxkgxmjfdr7ayj3zd9ad0ld5mrhh89l2
 203334 3QQB6AWxaga6wTs6Xwq8FYppgrGinGu15f
 201993 3M92sq9ssFaNbEwF47uteVKJsbw125juS7
 199135 1AScRhqdXMrJyxNmjEapMZi1PLFsqmLquG
 196271 18p9Ftp3m4435tdpZTvoBsm3yjUgkvTF2b
 193271 33fDiKKhr2F2uRv2jJzdKT3ECuK3wzCq5d


Title: Re: List of all Bitcoin addresses ever used
Post by: MrFreeDragon on August 17, 2020, 05:56:46 PM
Very interesting statistics, thank you!

-snip-
Addresses with most receiving transactions
This is the Top 100, the number in front of the address shows how many transactions it has received:
-snip-
 326839 d-d0d953f2e7043342540a1407243e49fe
...
 289070 d-0e9deef32abfc454392d21725f9defef
...
 262539 d-73fd8c31c9fc1d084f44b301bb7adb6a
...
 224217 d-752ed0099932a96fbc0a854a4d3a300f
...
 219174 s-e3b0c44298fc1c149afbf4c8996fb924
-snip-

Can you please clarify, what is the type of these d- and s- addresses?


Title: Re: List of all Bitcoin addresses ever used
Post by: Casdinyard on August 18, 2020, 09:12:54 AM
~

Can you also scrape all the Bitcoin Address used here in forum and the user that uses it? Yes, some users would have used the same wallet as they are just alts of someone (with a lot of investigation just to be proven correct). And I think it would help labeling the users and alt accounts throughout the entire forum, and would make it easier to detect which accounts are linked to each other and which are disobeying campaign rules and even forum rule (enrolling many accounts in a single bounty or sig campaign)


Title: Re: List of all Bitcoin addresses ever used
Post by: LoyceV on August 18, 2020, 09:43:57 AM
Can you also scrape all the Bitcoin Address used here in forum and the user that uses it?
I actually can :D I found this regexp on Stackoverflow (https://stackoverflow.com/questions/52000440/grep-bitcoin-address-with-regexp):
Code:
egrep --regexp="^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$" filename
With some slight changes it stops matching parts of Eth-addresses:
Code:
egrep -w --regexp="[13][a-km-zA-HJ-NP-Z1-9]{25,34}" *

I could run this code on 53 million archived posts (https://bitcointalk.org/index.php?topic=5167469.0), but the main problem will be excluding quotes. That's annoying and slow to do, and if I don't exclude them, it will completely mess up the data. On the other hand, quotes may still contain information that was deleted by the user who posted it.
Even without quotes, users still post Bitcoin addresses that aren't theirs, for instance when providing evidence on a scammer.

Quote
I think it would help labeling the users and alt accounts throughout the entire forum, and would make it easier to detect which accounts are linked to each other
A smart user would simply use different addresses. An even smarter user would use different wallets, so they don't create a blockchain trail when they make a payment.

As a quick test, 51 out of 9999 posts (https://loyce.club/archive/posts/5500/) contain at least one Bitcoin address (starting with 1 or 3, ignoring Bech32).

For now I won't go continue this search. If I ever do, I'll move this discussion to Reputation (https://bitcointalk.org/index.php?board=129.0).


Title: Re: List of all Bitcoin addresses ever used
Post by: Casdinyard on August 18, 2020, 11:01:17 AM
Can you also scrape all the Bitcoin Address used here in forum and the user that uses it?
I actually can :D I found this regexp on Stackoverflow (https://stackoverflow.com/questions/52000440/grep-bitcoin-address-with-regexp):
Code:
egrep --regexp="^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$" filename
With some slight changes it stops matching parts of Eth-addresses:
Code:
egrep -w --regexp="[13][a-km-zA-HJ-NP-Z1-9]{25,34}" *
I could run this code on 53 million archived posts (https://bitcointalk.org/index.php?topic=5167469.0), but the main problem will be excluding quotes. That's annoying and slow to do, and if I don't exclude them, it will completely mess up the data. On the other hand, quotes may still contain information that was deleted by the user who posted it.
Even without quotes, users still post Bitcoin addresses that aren't theirs, for instance when providing evidence on a scammer.

I think it would be possible if and only if you scraped the following boards:
  • Services (https://bitcointalk.org/index.php?board=52.0)
  • Bounties (https://bitcointalk.org/index.php?board=238.0)
  • Marketplace in general (both BTC and Alt)
  • And Marketplaces of all local boards if applicable/available

With that, detection with evidences on a scam wouldn't be a problem to the matter. And yes, it would be hard especially if threads/posts were deleted. But it mustn't be a problem as long as a list can be made to simply be a reference of which user had used nor mentioned any addresses throughout his post history.

Quote
I think it would help labeling the users and alt accounts throughout the entire forum, and would make it easier to detect which accounts are linked to each other
A smart user would simply use different addresses. An even smarter user would use different wallets, so they don't create a blockchain trail when they make a payment.
As a quick test, 51 out of 9999 posts (https://loyce.club/archive/posts/5500/) contain at least one Bitcoin address (starting with 1 or 3, ignoring Bech32).
For now I won't go continue this search. If I ever do, I'll move this discussion to Reputation (https://bitcointalk.org/index.php?board=129.0).

I'm looking forward to make it happen. Have I already mentioned my project on making an app (a BPIP ripoff) and such data would be helpful in it. I'm still on the planning stage to which should I go first and with many scraped data you've done, it would help me to make less scraping but rather make an API to just look up on your data.


Title: Re: List of all Bitcoin addresses ever used
Post by: TryNinja on August 18, 2020, 11:31:28 AM
I could run this code on 53 million archived posts (https://bitcointalk.org/index.php?topic=5167469.0), but the main problem will be excluding quotes. That's annoying and slow to do, and if I don't exclude them, it will completely mess up the data. On the other hand, quotes may still contain information that was deleted by the user who posted it.
Even without quotes, users still post Bitcoin addresses that aren't theirs, for instance when providing evidence on a scammer.
This is planned for my post archive. I had done that but only with ETH addresses and the 15m posts you sent me + the new scraped one.

I plan to scan all old posts + new ones for ETH and BTC addresses after everything is working fine (new bot + full database with the whole post archive).


Title: Re: List of all Bitcoin addresses ever used
Post by: LoyceV on August 18, 2020, 11:36:26 AM
This is planned for my post archive. I had done that but only with ETH addresses and the 15m posts you sent me + the new scraped one.
Great, saves me the trouble :)
Can I request a CSV of all the results? That makes it so much easier to use all data than getting them per address through your site.
Just something with (at least) "address,userID,msgID" would be great for further analysis.

I'm still on the planning stage to which should I go first and with many scraped data you've done, it would help me to make less scraping but rather make an API to just look up on your data.
I can get you a copy of all archived posts like I gave TryNinja if it helps. It beats scraping the forum again, although I didn't keep track of board names per topic.


Title: Re: List of all Bitcoin addresses ever used
Post by: TryNinja on August 18, 2020, 11:41:10 AM
Great, saves me the trouble :)
Can I request a CSV of all the results? That makes it so much easier to use all data than getting them per address through your site.
Just something with (at least) "address,userID,msgID" would be great for further analysis.
Of course. Once in the database, it's pretty easy to export them to the format I want.


Title: Re: List of all Bitcoin addresses ever used
Post by: BTCW on August 19, 2020, 11:03:18 PM

Updates
Sorting a list that doesn't fit in the server's RAM is very slow. Therefore I only update unique_addresses.txt.gz (http://alladdresses.loyce.club:20319/unique_addresses.txt.gz) twice a month (on the 6th and 21st). Check the file date here (http://alladdresses.loyce.club:20319/?C=M;O=D) to see how old it is. If an update fails, please post here.
In between updates, I create daily updates: alladdresses.loyce.club:20319/daily_updates/ (http://alladdresses.loyce.club:20319/daily_updates/?C=S;O=D). These txt-files contain unique addresses (for that day) in order of appearance.
Due to limitations in disk space, I don't do automatic updates for addresses.txt.gz (http://alladdresses.loyce.club:20319/addresses.txt.gz). It's complete until blockchair_bitcoin_outputs_20200719.tsv.gz (https://gz.blockchair.com/bitcoin/outputs/blockchair_bitcoin_outputs_20200719.tsv.gz).



This is a wonderful initiative! A comment: Sorting a very large list with little RAM is not necessarily a problem! Try:


Code:
mkdir tmp
cat unsorted.txt | sort -u -S 65% -T tmp > sorted.txt
rm -r tmp

-S will tell your machine to use at most 65% CPU; this is some sort of optimum, according to my experience
-T puts temporary files in a directory (here named "tmp") and not in RAM; if you have an SSD, the speed isn't too shabby

I have sorted huge lists (>80 GB) on budget laptops using these two arguments. Worth a shot! If you want better hosting, PM me.


Title: Re: List of all Bitcoin addresses ever used
Post by: LoyceV on August 20, 2020, 09:08:20 AM
Code:
cat unsorted.txt | sort -u -S 65% -T tmp > sorted.txt
I'm already using "sort", which uses /tmp by default.

I'll try "sort -u" though, it might need less temporary storage than "sort | uniq". The next update is scheduled for tomorrow, I'll see how it performs.

Quote
-S will tell your machine to use at most 65% CPU
I think you mean RAM, not CPU. This VM has only 256 MB, so I'll let "sort" figure it out on it's own.

Quote
-T puts temporary files in a directory (here named "tmp") and not in RAM; if you have an SSD, the speed isn't too shabby
That's default behaviour :) It doesn't have an SSD though, and I'm using "cputool (http://manpages.ubuntu.com/manpages/cosmic/man8/cputool.8.html)" to keep server load low. I'm okay without daily updates on this, I wouldn't want users to download this large file on a daily basis anyway.

Quote
I have sorted huge lists (>80 GB) on budget laptops using these two arguments. Worth a shot! If you want better hosting, PM me.
Since last year, I'm using an AWS server donated by suchmoon (https://bitcointalk.org/index.php?action=profile;u=234771) for loyce.club. However, since AWS charges $0.15/GB (https://aws.amazon.com/blogs/aws/aws-data-transfer-prices-reduced/), I'm not comfortable hosting very large files on suchmoon's server.
When I tested sorting data on AWS, it started throtting disk IO after a while, which made it very slow. I've also tested a pay-by-the-hour-VPS, and obviously it was a lot faster.

There's one thing on my wish list though: a method to show only unique addresses in order of appearance (without sorting them). It can be done with awk '!a[$0]++' (https://catonmat.net/awk-one-liners-explained-part-two), but this requires a lot of memory and doesn't use temporary files.


Title: Re: List of all Bitcoin addresses ever used
Post by: NotATether on August 20, 2020, 12:40:56 PM
Quote
-S will tell your machine to use at most 65% CPU
I think you mean RAM, not CPU. This VM has only 256 MB, so I'll let "sort" figure it out on it's own.

That is correct, the argument to -S is the amount of memory for sort(1) to use for its main buffer (manpage source (https://www.man7.org/linux/man-pages/man1/sort.1.html)). With a percentage it should calculate the amount of memory to reserve. But I think even a 256MB buffer is too small for the size of the dataset you're sorting, it will hit the disk too much.

Quote
-T puts temporary files in a directory (here named "tmp") and not in RAM; if you have an SSD, the speed isn't too shabby
That's default behaviour :) It doesn't have an SSD though, and I'm using "cputool (http://manpages.ubuntu.com/manpages/cosmic/man8/cputool.8.html)" to keep server load low. I'm okay without daily updates on this, I wouldn't want users to download this large file on a daily basis anyway.

Quote
I have sorted huge lists (>80 GB) on budget laptops using these two arguments. Worth a shot! If you want better hosting, PM me.
Since last year, I'm using an AWS server donated by suchmoon (https://bitcointalk.org/index.php?action=profile;u=234771) for loyce.club. However, since AWS charges $0.15/GB (https://aws.amazon.com/blogs/aws/aws-data-transfer-prices-reduced/), I'm not comfortable hosting very large files on suchmoon's server.
When I tested sorting data on AWS, it started throtting disk IO after a while, which made it very slow. I've also tested a pay-by-the-hour-VPS, and obviously it was a lot faster.

That's strange because all AWS servers have an SSD configured as the boot disk. If you are sorting in a VM, then all that sorting is done in a virtual hard disk, so not only are you moving memory into temporary host SSD space, it's being moved inside a virtual disk file inside said SSD and that puts extra strain on your hypervisor's emulated disk controller.

So, it's emulating all the disk controller calls that read and write data from the disk, updates disk cache and its other jobs while sort(1) moves data between its memory buffer in RAM and the hard disk (which is actually a file on your host). And it's doing that for the entire 31GB of addresses, and the algorithm sort uses needs an O(n log(n)) space, which I calculate to be 310GB for your data. All this while running emulated disk writes and reads. On top of that there is the hardware-accelerated reads and writes that the host does for the VM to it's disk file. That explains the poor performance while sorting.

You'll have better disk performance if you sort outside of a VM.


Title: Re: List of all Bitcoin addresses ever used
Post by: LoyceV on August 20, 2020, 02:17:52 PM
That's strange because all AWS servers have an SSD configured as the boot disk.
I guess it wasn't clear that alladdresses.loyce.club:20319 (http://alladdresses.loyce.club:20319) doesn't run at AWS. It uses HDD.

Quote
And it's doing that for the entire 31GB of addresses, and the algorithm sort uses needs an O(n log(n)) space, which I calculate to be 310GB for your data.
It takes many hours while keeping server load low, but it really isn't a problem.
If someone has enough RAM to experiment, I'd love to see the result of this (on the 31 GB file):
awk '!a[$0]++' (https://catonmat.net/awk-one-liners-explained-part-two)


Title: Re: List of all Bitcoin addresses ever used
Post by: NotATether on August 20, 2020, 09:50:01 PM
@LoyceV how large is the uncompressed addresses.txt.gz? It is at least 200GB and counting and it's still extracting legacy addresses. I'm worried I may run out of disk space before it's all extracted. I have a 1TB quota. If you know how big is the uncompressed unique_addresses.txt.gz while you're at it that will be useful to know.


Title: Re: List of all Bitcoin addresses ever used
Post by: LoyceV on August 21, 2020, 08:54:28 AM
@LoyceV how large is the uncompressed addresses.txt.gz?
It gets around 50% larger, Bitcoin addresses don't compress very well.


Title: Re: List of all Bitcoin addresses ever used
Post by: NotATether on August 21, 2020, 09:59:52 AM
If someone has enough RAM to experiment, I'd love to see the result of this (on the 31 GB file):
awk '!a[$0]++' (https://catonmat.net/awk-one-liners-explained-part-two)

I suggest instead of the awk one-liner you look at gz-sort (http://kmkeen.com/gz-sort/), it is a small linux program that sorts gzip-compressed files on disk while using a very small memory buffer, as low as 4 megabytes.

You sort the file using
Code:
gz-sort -u addresses.txt.gz addresses_sorted.txt.gz

The -u switch removes duplicate lines from the sorted output, and you can increase the buffer size to give it a larger buffer for transporting stuff, but this isn't necessary. I used -S 1G to give it a 1 gigabyte buffer and it took around 7 hours to complete so not much shorter than the advertised completion time, 9 or 10 hours. So this program will run well in your VM, the RAM factor isn't important.

You need to compile it yourself using make but it has minimal dependencies, only zlib and GNU headers.

I used it to find the smallest address in the dump using
Code:
zcat addresses_sorted.txt.gz | head -n 55405 | uniq

This prints 1111111111111111111114oLvT2. This address was used 55405 times (!)

Here are some the other smallest addresses:

Code:
1111111111111111111114oLvT2
111111111111111111112BEH2ro
111111111111111111112xT3273
1111111111111111111141MmnWZ
111111111111111111114ysyUW1
1111111111111111111184AqYnc
11111111111111111111BZbvjr
11111111111111111111CJawggc
11111111111111111111HV1eYjP
11111111111111111111HeBAGj
11111111111111111111QekFQw
11111111111111111111UpYBrS
11111111111111111111g4hiWR
11111111111111111111jGyPM8
11111111111111111111o9FmEC
11111111111111111111ufYVpS
111111111111111111121xzjPWX1
111111111111111111128gzo7iT
11111111111111111112AmVxQeF
11111111111111111112Fr3DURyz
11111111111111111112GvNtZ1K
11111111111111111112VUYD4wA
1111111111111111111313xyAwW
111111111111111111137vGPgFbT
11111111111111111113aT9ZSLG
111111111111111111168xDACCG
11111111111111111116B8w87yU



Maybe you can also make a list of addresses sorted by balance, now that you have an efficient way to deduplicate them.


Title: Re: List of all Bitcoin addresses ever used
Post by: LoyceV on August 21, 2020, 11:29:22 AM
If someone has enough RAM to experiment, I'd love to see the result of this (on the 31 GB file):
awk '!a[$0]++' (https://catonmat.net/awk-one-liners-explained-part-two)
I suggest instead of the awk one-liner you look at gz-sort (http://kmkeen.com/gz-sort/), it is a small linux program that sorts gzip-compressed files on disk while using a very small memory buffer, as low as 4 megabytes.
I checked, but it does what I'm doing already. The awk-command removes duplicate lines without sorting the lines. I'd like to do it, but I can't run it.

Quote
This prints 1111111111111111111114oLvT2. This address was used 55405 times (!)
I'd be interested to see which real address is the shortest. The 111111111-addresses are all burn addresses. I'm not entirely sure what determines address length, but from what I've seen, shorter addresses are much harder to find. I've been looking for short addresses created from mini-private-keys, and they were quite rare.
To find a real short address, it needs to have sent funds too.

Quote
Maybe you can also make a list of addresses sorted by balance
See List of all Bitcoin addresses with a balance (https://bitcointalk.org/index.php?topic=5254914.0).


Title: Re: List of all Bitcoin addresses ever used
Post by: naufragus on August 23, 2020, 10:14:58 PM
I actually can :D I found this regexp on Stackoverflow (https://stackoverflow.com/questions/52000440/grep-bitcoin-address-with-regexp):
Code:
egrep --regexp="^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$" filename
With some slight changes it stops matching parts of Eth-addresses:
Code:
egrep -w --regexp="[13][a-km-zA-HJ-NP-Z1-9]{25,34}" *


I have compiled these from various sources and use them to automatically set my blockchain explorer options based on user input, and also keep them at my .zshrc :
Code:
#cryptocurrency greps

#btc1 and btc2 combined
alias btcgrep="grep -Ee '\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b' -e '\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b'"

#legacy addresses only
alias btcgrep1="grep -E '\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b'"
#http://mokagio.github.io/tech-journal/2014/11/21/regex-bitcoin.html

#bech32 v1 and v0 addresses
alias btcgrep2="grep -E '\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b'"
#https://stackoverflow.com/questions/21683680/regex-to-match-bitcoin-addresses

#bech32 addresses only
alias btcgrep3="grep -E '\bbc1[ac-hj-np-zAC-HJ-NP-Z02-9]{11,71}\b'"

#both legacy and bech32
alias btcgrep4="grep -E '\b([13][a-km-zA-HJ-NP-Z1-9]{25,34}|bc1[ac-hj-np-zAC-HJ-NP-Z02-9]{11,71})\b'"
#http://mokagio.github.io/tech-journal/2014/11/21/regex-bitcoin.html

#private keys
alias btcgrep5="grep -E '\b[5KL][1-9A-HJ-NP-Za-km-z]{50,51}\b'"
#word boundary: '\b'
#https://bitcoin.stackexchange.com/questions/56737/how-can-i-find-a-bitcoin-private-key-that-i-saved-in-a-text-file

#transaction hashes
alias btcgrep6="grep -E '\b[a-fA-F0-9]{64}\b'"
#https://stackoverflow.com/questions/46255833/bitcoin-block-and-transaction-regex
#https://bitcoin.stackexchange.com/questions/70261/recognize-bitcoin-address-from-block-hash-and-transaction-hash

#block hashes
alias btcgrep7="grep -E '\b[0]{8}[a-fA-F0-9]{56}\b'"
#https://stackoverflow.com/questions/46255833/bitcoin-block-and-transaction-regex

#ethereum address hash
#test for 'plausibility'
alias ethgrep="grep -E '\b(0x)?[0-9a-fA-F]{40}\b'"
#https://ethereum.stackexchange.com/questions/1374/how-can-i-check-if-an-ethereum-address-is-valid

#ethereum transaction hash
alias ethgrep2="grep -E '\b(0x)?([A-Fa-f0-9]{64})\b'"  #parentheses are not necessary
#https://ethereum.stackexchange.com/questions/34285/what-is-the-regex-to-validate-an-ethereum-transaction-hash/34286

Flag -w is 'word bondary' and can also be set within the regex with '\b' at the ends.

Very good work on compiling those addresses, mate!


Title: Re: List of all Bitcoin addresses ever used
Post by: LoyceV on August 24, 2020, 11:39:02 AM
If someone has enough RAM to experiment, I'd love to see the result of this (on the 31 GB file):
awk '!a[$0]++' (https://catonmat.net/awk-one-liners-explained-part-two)
This looks very promising:
Code:
cat -n input.txt | sort -uk2 | sort -nk1 | cut -f2- > output.txt
I'll be testing it soon.

Some results: The awk-thing uses just over 1 GB memory for 10 million addresses. So for 1.5 billion (https://bitcointalk.org/index.php?topic=5265993.msg54912292#msg54912292) addresses, a 256 GB server should be enough. At AWS, that would cost a few dollars per hour.

I've tested with the first 10 million lines, and can confirm both give the same result:
Code:
head -n 10000000 addresses.txt | awk '!a[$0]++' | md5sum
head -n 10000000 addresses.txt | nl | sort -uk2 | sort -nk1 | cut -f2 | md5sum
As expected, awk is faster.


Title: Re: List of all Bitcoin addresses ever used
Post by: seoincorporation on August 24, 2020, 11:31:39 PM
This is an awesome apport for the community, some weeks ago i see a user asking for a list like this to make a bruteforce... Some users use their addy as password, that's why a list like this is a great tool, thanks again to LoyceV for making it fo us.


Title: Re: List of all Bitcoin addresses ever used
Post by: LoyceV on August 26, 2020, 05:48:55 PM
Sample: addresses.txt.gz (http://alladdresses.loyce.club:20319/addresses.txt.gz): all addresses in chronological order, with duplicates (Warning: 31 GB):
Code:
1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa
12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
1HLoD9E4SDFFPDiYfNYnkBLQ85Y51J3Zb1
.......
3GFfFQAFgXKiA1qqUK6rqBpEpG4vZDos6t
3Mbtv47gZ2eN6Fy7owpgHHwSLYHS42P56P
38JyF2RQknBUMETyRT2yGndDJFYSp6hJNg
Due to limitations on disk space, I'm considering removing this file. Unless anyone has a need for it, so: can anyone tell me what this can be used for? I know it can be used to make a Top 100 of addresses with most receiving transactions (https://bitcointalk.org/index.php?topic=5265993.msg54912292#msg54912292).

Instead of this list, I want to make a new list without duplicates, but still in order of first appearance of each address. Thanks to bob123 (https://bitcointalk.org/index.php?topic=5259621.msg55057381#msg55057381), I can do that now!
I'll also keep the sorted list, because that list is very convenient to find matches on a list (https://bitcointalk.org/index.php?topic=5265993.msg55037629#msg55037629).



I need some time to process all data. When done, I'll rewrite some of my posts.


Title: Re: List of all Bitcoin addresses ever used
Post by: LoyceV on August 30, 2020, 01:10:30 PM
Sample: unique_addresses.txt.gz (http://alladdresses.loyce.club:20319/unique_addresses.txt.gz): all Bitcoin addresses ever used, without duplicates, sorted by address (Warning: 15 GB)
I didn't have enough disk space to process the 31 GB file the way I want it, so I've (temporarily) removed this file. After I'm done with that, I'll restore the missing file. Give it a few days.

Since I got no response to my question above, I'll go with 2 versions:
  • All addresses ever used, without duplicates, in order of first appearance.
  • All addresses ever used, without duplicates, sorted.
The first file feels nostalgic, the second file will be very convenient to match addresses with a list of your own.


Title: Re: List of all Bitcoin addresses ever used
Post by: LoyceV on September 08, 2020, 11:33:39 AM
Sample: unique_addresses.txt.gz (http://alladdresses.loyce.club/unique_addresses.txt.gz): all Bitcoin addresses ever used, without duplicates, sorted by address (Warning: 15 GB)
I didn't have enough disk space to process the 31 GB file the way I want it, so I've (temporarily) removed this file. After I'm done with that, I'll restore the missing file. Give it a few days.
Well, that didn't go as planned :( Although I can keep all unique addresses in order of first appearance (https://bitcointalk.org/index.php?topic=5265993.msg55057504#msg55057504), it turns out 100 GB disk space is not enough for the temporary space it needs. Because of the large data traffic, I don't want to use loyce.club's AWS hosting for this, and I'm not sure yet if I should get another VPS just for this.

An alternative would be to run it from my home PC, but the heavy writing will just wear out my SSD. So this project is on hold for now. Daily updates (http://alladdresses.loyce.club/daily_updates/) still continue.


Title: Re: List of all Bitcoin addresses ever used
Post by: NotATether on October 22, 2020, 10:47:29 AM
@LoyceV

Are you downloading Blockchair dumps at the slow rate? I just contacted Blockchair for an API key, which enables people to download at the fast rate, and a support rep told me they cost $500/month.

If network bandwidth is a problem I'm able to host this on my hardware if you like.


Title: Re: List of all Bitcoin addresses ever used
Post by: LoyceV on October 22, 2020, 06:21:22 PM
Are you downloading Blockchair dumps at the slow rate?
Yes. But 100 kB/s isn't a problem anymore: the initial download took a long time, but for daily updates it doesn't take that long.

Quote
I just contacted Blockchair for an API key, which enables people to download at the fast rate, and a support rep told me they cost $500/month.
I thought they'd offer it for free for certain users, but this makes sense from a business point of view.

Quote
If network bandwidth is a problem I'm able to host this on my hardware if you like.
Just this month I'm at 264 GB for this project, and 174 GB for all Bitcoin addresses with a balance (https://bitcointalk.org/index.php?topic=5254914.0). That means this full list is only downloaded a few times per month, but the funded addy list is downloaded a few times per day.
I'm more in need for more disk space for sorting this data, but I haven't decided yet where to host it. 100 GB disk space isn't enough.


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: LoyceV on November 28, 2020, 07:47:21 PM
Just yesterday, I got a good deal on a new VPS (more memory, more disk, more CPU and more bandwidth). It's dedicated to only this project (and I have no idea how reliable it's going to be). I've updated the OP.

There's a problem though. There are:
756,494,121 addresses according to addresses_in_order_of_first_appearance.txt.gz
756,524,407 addresses according to addresses_sorted.txt.gz
Obviously, these numbers should be the same. I haven't scheduled automated updates yet, I first want to recreate this data from scratch to see which number is correct.


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: therealbtcdave on November 28, 2020, 08:08:49 PM
Just yesterday, I got a good deal on a new VPS (more memory, more disk, more CPU and more bandwidth). It's dedicated to only this project (and I have no idea how reliable it's going to be). I've updated the OP.

There's a problem though. There are:
756,494,121 addresses according to addresses_in_order_of_first_appearance.txt.gz
756,524,407 addresses according to addresses_sorted.txt.gz
Obviously, these numbers should be the same. I haven't scheduled automated updates yet, I first want to recreate this data from scratch to see which number is correct.

Thanks for the update the last .gz you had I think was from September.



Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: PrimeNumber7 on November 29, 2020, 06:01:21 AM
Some results: The awk-thing uses just over 1 GB memory for 10 million addresses. So for 1.5 billion (https://bitcointalk.org/index.php?topic=5265993.msg54912292#msg54912292) addresses, a 256 GB server should be enough. At AWS, that would cost a few dollars per hour.
As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket (https://aws.amazon.com/s3/) that can be accessed by a server.

If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources.

Separately, sorting lists are not scalable, period. There are some things you can do to increase the speed, such as keep the list in RAM, or cutting the number of instances the entire list is reviewed, but you ultimately cannot sort an unordered very large list.


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: LoyceV on November 29, 2020, 09:03:47 AM
Thanks for the update the last .gz you had I think was from September.
Correct (August 6 and September 2).

As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket (https://aws.amazon.com/s3/) that can be accessed by a server.
Amazon charges $0.09 per GB outgoing data, that's ridiculous for this purpose (my current 5 TB bandwidth limit would cost $450 per month when maxed out). And Amazon wants my creditcard instead of Bitcoin.

Quote
If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources.
Still, that's quite excessive for just 2 files that are barely used.

Quote
Separately, sorting lists are not scalable, period.
Actually, sort (https://man7.org/linux/man-pages/man1/sort.1.html) performs quite well. I've tested:
10M lines: 10 seconds (fits in RAM)
50M lines: 63 seconds (starts using temporary files)
250M lines: 381 seconds (using 2 GB RAM and temporary files)
So a 5 times larger file takes 6 times longer to sort. I'd say scalability is quite good.

It just takes a while because it uses temporare disk storage. Given enough RAM, it can utilize multiple cores.

Quote
There are some things you can do to increase the speed, such as keep the list in RAM, or cutting the number of instances the entire list is reviewed, but you ultimately cannot sort an unordered very large list.
The 256 GB RAM server idea would cost a few dollars per hour, so I'll do with less.


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: NotATether on November 29, 2020, 12:31:11 PM
As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket (https://aws.amazon.com/s3/) that can be accessed by a server.

If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources.

That may save on local resources but you will be paying a lot of money per month if people download several hundred gigabytes each month particularly if the files are large like the files hosted in the OP.

If you have the network capacity then it's better to just serve it locally (except, AWS bills your upload traffic too  >:()


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: PrimeNumber7 on November 29, 2020, 10:23:57 PM
As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket (https://aws.amazon.com/s3/) that can be accessed by a server.
Amazon charges $0.09 per GB outgoing data, that's rediculous for this purpose (my current 5 TB bandwidth limit would cost $450 per month when maxed out). And Amazon wants my creditcard instead of Bitcoin.
I had used AWS as an example because I believed you used it for some of your other projects.

Yes, transferring data to the internet is very expensive. You can use a CDN (content delivery network) to reduce costs a little bit. 5 TB of data is a lot.

Quote
Separately, sorting lists are not scalable, period.
Actually, sort (https://man7.org/linux/man-pages/man1/sort.1.html) performs quite well. I've tested:
10M lines: 10 seconds (fits in RAM)
50M lines: 63 seconds (starts using temporary files)
250M lines: 381 seconds (using 2 GB RAM and temporary files)
So a 5 times larger file takes 6 times longer to sort. I'd say scalability is quite good.
I think you are proving my point. The more input you have, the more time it takes to process one additional input.

To put it another way, it takes 1 unit of time to sort a list with a length of 2, it takes 1 + a units of time to sort a list with a length of 3, it takes 1 + a + b units of time to sort a list with a length of 4, and so on. The longer the list, the longer it will take to sort one additional line.

As a FYI, you generally will not want to host files on a server. You will probably want to host files in a storage bucket (https://aws.amazon.com/s3/) that can be accessed by a server.

If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources.

That may save on local resources but you will be paying a lot of money per month if people download several hundred gigabytes each month particularly if the files are large like the files hosted in the OP.

If you have the network capacity then it's better to just serve it locally (except, AWS bills your upload traffic too  >:()
Your local ISP might not like it very much if you are uploading that much data.


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: Vod on November 29, 2020, 11:01:11 PM
Your local ISP might not like it very much if you are uploading that much data.

Quickseller, most ISPs have a download bottleneck - not upload.

So few people upload more than they download that most ISPs don't even restrict uploads. 

What ISP does LoyceV use that does not like uploading?


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: NotATether on November 29, 2020, 11:06:07 PM
~snip

If you have the network capacity then it's better to just serve it locally (except, AWS bills your upload traffic too  >:()
Your local ISP might not like it very much if you are uploading that much data.

Sorry, when I said locally, I meant on a VPS with another cloud provider with unmetered traffic, such as Hetzner.

I guess I have been doing too much of my work on the cloud to tell the difference anymore.


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: PrimeNumber7 on November 30, 2020, 03:42:59 AM
~snip

If you have the network capacity then it's better to just serve it locally (except, AWS bills your upload traffic too  >:()
Your local ISP might not like it very much if you are uploading that much data.

Sorry, when I said locally, I meant on a VPS with another cloud provider with unmetered traffic, such as Hetzner.

I guess I have been doing too much of my work on the cloud to tell the difference anymore.
Ahh, gotcha.

I was under the impression that traffic out of the AWS network (for AWS) will count as egress traffic, and will be billed accordingly. Migrating your data from AWS to GCS will incur a charge from AWS for the amount of your data. There might be ways around this, I'm not sure.


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: LoyceV on November 30, 2020, 12:26:10 PM
I had used AWS as an example because I believed you used it for some of your other projects.
Correct, loyce.club runs on AWS (sponsored).

Quote
Yes, transferring data to the internet is very expensive. You can use a CDN (content delivery network) to reduce costs a little bit. 5 TB of data is a lot.
I highly doubt I'd find a cheaper deal :D I hope not to use the full 5 TB though, I expect some overselling and don't want to push it to the limit.

Quote
I think you are proving my point. The more input you have, the more time it takes to process one additional input.
An exponential increase in processing time is to be expected. I consider the increase acceptable for scaling: if the number of addresses is 5 times larger than it is now (20 years from now?), it takes only 6 times more processing power.

Quote
The longer the list, the longer it will take to sort one additional line.
At some point a database might beat raw text sorting, but for now I'm good with this :)

Quote
Your local ISP might not like it very much if you are uploading that much data.
I should add a storage VPS to my shopping list. I now indeed have to transfer a large amount of data through my local internet, and it's terrible compared to server performance.

I meant on a VPS with another cloud provider with unmetered traffic, such as Hetzner.
I'm not using anything with "unmetered" traffic.



Still working on restoring all data from scratch. I'm curious to see if it matches any of the 2 existing files.
I don't really get the focus on data traffic though, right after I got a good deal on a new VPS (https://bitcointalk.org/index.php?topic=5265993.msg55705153#msg55705153). I'm good for now :)

I was under the impression that traffic out of the AWS network (for AWS) will count as egress traffic, and will be billed accordingly.
AWS charges $0.09/GB, and especially since this one is sponsored, I don't want to abuse it. I love how stable the server is though, it has never been down.


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: LoyceV on December 02, 2020, 03:49:36 PM
There's a problem though. There are:
756,494,121 addresses according to addresses_in_order_of_first_appearance.txt.gz
756,524,407 addresses according to addresses_sorted.txt.gz
Obviously, these numbers should be the same. I haven't scheduled automated updates yet, I first want to recreate this data from scratch to see which number is correct.
After recreating this data, I now have 757,437,766 unique addresses (http://alladdresses.loyce.club/NEW_addresses_in_order_of_first_appearance.txt.gz) (don't click this link unless you want to download 18 GB).
My next step would be to add a few days of data, and count addresses again. Next, I'll recreate all data "from scratch", and see if I end up with the same numbers. I don't know why there's a difference, and I don't like loose ends in my data.


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: LoyceV on December 15, 2020, 11:28:03 AM
It took a while, and the new VPS got a lot slower by now, but I've enabled updates again:
Updates
Sorting a list that doesn't fit in the server's RAM is slow. Therefore I only update both large files (addresses_sorted.txt.gz (http://alladdresses.loyce.club/addresses_sorted.txt.gz) and  addresses_in_order_of_first_appearance.txt.gz (http://alladdresses.loyce.club/addresses_in_order_of_first_appearance.txt.gz)) twice a month (on the 6th and 21st, updates take more than a day). Check the file date here (http://alladdresses.loyce.club/?C=M;O=D) to see how old it is. If an update fails, please post here.
In between updates, I create daily updates: alladdresses.loyce.club/daily_updates/ (http://alladdresses.loyce.club/daily_updates/?C=S;O=D). These txt-files contain unique addresses (for that day) in order of appearance.
I won't keep older snapshots.
Downloads are fast, I've seen 20-100 MB/s. Enjoy :)

My latest count: 764,534,424 Bitcoin addresses have been used.


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: LoyceV on December 21, 2020, 10:22:11 AM
I'm glad to see this service is being used too:
https://loyce.club/other/traffic2.png

I'd love to hear feedback (because I'm curious): what are you guys using this for?


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: PrimeNumber7 on December 22, 2020, 03:28:37 AM
I had used AWS as an example because I believed you used it for some of your other projects.
Correct, loyce.club runs on AWS (sponsored).

Quote
Yes, transferring data to the internet is very expensive. You can use a CDN (content delivery network) to reduce costs a little bit. 5 TB of data is a lot.
I highly doubt I'd find a cheaper deal :D I hope not to use the full 5 TB though, I expect some overselling and don't want to push it to the limit.
I am not sure what level of access you have to the AWS account sponsoring your site. However, it is possible to setup a storage bucket so that anyone can access it, but that the requestors IP address is among the IP addresses of the same region the files are stored in. See this stack overflow (https://stackoverflow.com/questions/44638983/restrict-s3-bucket-access-to-specific-regions) discussion. You can also setup the storage bucket such that the requestor pays for egress traffic.


Quote
The longer the list, the longer it will take to sort one additional line.
At some point a database might beat raw text sorting, but for now I'm good with this :)
Using a database will not solve this problem. There are some things a DB can do to make sorting go from O^2 to O^2/n, but this is still exponential growth.

You make the argument that your input size is sufficiently small such that having exponential complexity is okay, and you may have a point.



I was under the impression that traffic out of the AWS network (for AWS) will count as egress traffic, and will be billed accordingly.
AWS charges $0.09/GB, and especially since this one is sponsored, I don't want to abuse it. I love how stable the server is though, it has never been down.
AWS is very reliable. I would not expect much downtime when using AWS or other major cloud providers. Egress traffic is very expensive though.

Downloads are fast, I've seen 20-100 MB/s. Enjoy :)

This works out to approximately a 24-minute download. I measured a download speed of ~125 Mbps using a colab instance.


Title: Re: List of all Bitcoin addresses ever used (OP rewritten, updates work again)
Post by: LoyceV on December 22, 2020, 09:16:24 AM
I am not sure what level of access you have to the AWS account sponsoring your site.
Just root access to loyce.club, but addresses.loyce.club (http://addresses.loyce.club/?C=M;O=D) and alladdresses.loyce.club (http://alladdresses.loyce.club/?C=M;O=D) aren't hosted at AWS. This month so far, they've passed 1 TB of traffic, so it was a good call not to use AWS (this would cost $90).

Quote
However, it is possible to setup a storage bucket so that anyone can access it, but that the requestors IP address is among the IP addresses of the same region the files are stored in.
That seems like overkill for this.

Quote
Using a database will not solve this problem. There are some things a DB can do to make sorting go from O^2 to O^2/n, but this is still exponential growth.
For a database it would only mean checking and adding 750k addresses per day, instead of sorting the entire data again. I expect sort to take less long too when the majority of ("old") data is already sorted, but haven't tested for speed differences.

Quote
AWS is very reliable.
I have never experienced any downtime with AWS, unlike all VPS providers I've ever used. Those "external projects" don't have much priority to me, if it's down I don't lose scraping data.

Quote
This works out to approximately a 24-minute download. I measured a download speed of ~125 Mbps using a colab instance.
It's doing the biweekly data update, that probably slowed it down too.


Title: Re: List of all Bitcoin addresses ever used - currently unavailable
Post by: LoyceV on January 07, 2021, 04:08:03 PM
Due to another VPS that decided to run off with my prepayment (Lol: for 2 weeks), this data is currently unavailable. I'm not sure yet where to move, if it takes too long I'll upload the data elsewhere (but in that case without regular backups).

Update:
I've uploaded the latest version to a temporary location: blockdata.loyce.club/alladdresses/ (http://blockdata.loyce.club/alladdresses/).


Title: Re: List of all Bitcoin addresses ever used - currently unavailable
Post by: brainless on January 12, 2021, 08:48:26 AM
Due to another VPS that decided to run off with my prepayment (Lol: for 2 weeks), this data is currently unavailable. I'm not sure yet where to move, if it takes too long I'll upload the data elsewhere (but in that case without regular backups).

Update:
I've uploaded the latest version to a temporary location: blockdata.loyce.club/alladdresses/ (http://blockdata.loyce.club/alladdresses/).

daily updates also need to be post there, if possible,
Thankx


Title: Re: List of all Bitcoin addresses ever used - currently unavailable
Post by: LoyceV on January 12, 2021, 09:22:49 AM
daily updates also need to be post there, if possible
This VPS is currently downloading other data from Blockchair (https://bitcointalk.org/index.php?topic=5307550.0), which only allows once connection at a time. I expect this to take another month (at 100 kB/s), after that I can enable daily updates (txt-files with unique addresses for that day) again.

I haven't decided yet how and where to do regular updates to the 20 GB files (this is quite resource intensive).


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: JustHereReading on January 12, 2021, 12:20:12 PM
First of all, great project!



(...)
Quote
The longer the list, the longer it will take to sort one additional line.
At some point a database might beat raw text sorting, but for now I'm good with this :)
Using a database will not solve this problem. There are some things a DB can do to make sorting go from O^2 to O^2/n, but this is still exponential growth.

You make the argument that your input size is sufficiently small such that having exponential complexity is okay, and you may have a point.
Going with these two versions:
(...)
Since I got no response to my question above, I'll go with 2 versions:
  • All addresses ever used, without duplicates, in order of first appearance.
  • All addresses ever used, without duplicates, sorted.
The first file feels nostalgic, the second file will be very convenient to match addresses with a list of your own.

I don't see how sorting would be exponential for any of these lists..

All addresses ever used, without duplicates, sorted.
  • We already have a list with all the addresses ever used sorted by address (length n).
  • We have a list of (potentially) new addresses (length k).
  • We sort the list of new items in O(k log k).
  • We check for duplicates in the new addresses in O(k).
  • We then read the big list line by line while simultaneously running through the list of new addresses and comparing the values in O(n + k). In this case we can directly write the new file to disk line by line; only the list of new addresses is kept in memory.

Resulting in O(n + k log k + 2k). In this particular case one might even argue that n > k log k + 2k, therefore O(2n) = O(n) However, it's late here and I don't like to argue.

You only need enough memory to keep the new addresses in memory and enough disk space to keep both the new and old version on disk at the same time.

The 'All addresses ever used, without duplicates, in order of first appearance' list could be created in pretty much the same way.

I'll see if I can whip some code together.


File hosting
Have you considered releasing the big files as torrents with a webseed? This will allow downloaders to still download from your server and then (hopefully) continue to seed for a while; taking some strain of your server.

You might even release it in a RSS feed so that some contributors could automatically add it to their torrent clients and start downloading with e.g. max 1 Mb/s and uploading with >1Mb/s, this will quickly allow the files to spread over the peers and further move downloads away from your server.




Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on January 12, 2021, 12:44:18 PM
We then read the big list line by line while simultaneously running through the list of new addresses and comparing the values in O(n + k). In this case we can directly write the new file to disk line by line; only the list of new addresses is kept in memory.
The problem with this is that running through a 20 MB list takes a lot of time if you need to do it 1.5 billion times. Keeping the 20 MB in memory isn't the problem, reading 30 quadrillion bytes from RAM still takes much longer than my current system.

I may be able to improve on the sorted list by merging lists, and I may be able to improve on everything by keeping big temp files instead of only compressed files (but as always I need some time to do this).

Quote
Have you considered releasing the big files as torrents with a webseed? This will allow downloaders to still download from your server and then (hopefully) continue to seed for a while; taking some strain of your server.
No, until now download bandwidth isn't a problem. Only a few people have been crazy enough to download these files. If this ever goes viral it would be a great solution though.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: JustHereReading on January 12, 2021, 01:14:39 PM
We then read the big list line by line while simultaneously running through the list of new addresses and comparing the values in O(n + k). In this case we can directly write the new file to disk line by line; only the list of new addresses is kept in memory.
The problem with this is that running through a 20 MB list takes a lot of time if you need to do it 1.5 billion times. Keeping the 20 MB in memory isn't the problem, reading 30 quadrillion bytes from RAM still takes much longer than my current system.

(...)

I might be utterly mistaking, but hear me out:

Given two sorted lists:
n = 1 5 10 11 12 13 14 15 16 19 20
k = 3 6 18

We can read n from disk line by line and compare it to the current position in k.

1 < 3, write 1 to new file.
5 > 3, write 3 to file.
5 < 6, write 5 to file.
10 > 6, write 6 to file.
10 < 18, write 10 to file.
11 < 18, write 11 to file.
....
16 < 18, write 16 to file.
19 > 18, write 18 to file.
19 & nothing left in k, write 19 to file.
20 & nothing left in k, write 20 to file.

That's n + k instead of n * k, right?


Title: Re: List of all Bitcoin addresses ever used - currently unavailable
Post by: NotATether on January 12, 2021, 04:47:56 PM
Due to another VPS that decided to run off with my prepayment (Lol: for 2 weeks), this data is currently unavailable. I'm not sure yet where to move, if it takes too long I'll upload the data elsewhere (but in that case without regular backups).

Update:
I've uploaded the latest version to a temporary location: blockdata.loyce.club/alladdresses/ (http://blockdata.loyce.club/alladdresses/).

I don't remember if I offered you this before but I can host this data for you if it's not too big (I can throw up to 300GB for this project). I can also set up an rsync cron job to pull updates from your temporary location too.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on January 12, 2021, 06:34:56 PM
We can read n from disk line by line and compare it to the current position in k.
Yes. In fact, just 2 days ago (on another forum) I was pointed at the existence of "sort (https://man7.org/linux/man-pages/man1/sort.1.html) -mu":
Code:
      -m, --merge
              merge already sorted files; do not sort
This does exactly what you described. I haven't tested it yet, but I assume it's much faster than "regular" sort.
Update: I'm testing this now.

However, the bigger problem remains: updating 1.5 billion unique addresses in chronological order. Those lists are unsorted, so for example:
Existing long list with 12 years of data:
Code:
5
3
7
2
9
New daily list:
Code:
4
3
The end result should be:
Code:
5
3
7
2
9
4
It can be done by awk '!a[$0]++' (https://bitcointalk.org/index.php?topic=5265993.msg55031112#msg55031112), but I don't have that kind of RAM. I'm not sure how efficient this is for large datasets, it might also run into the problem of having to read 30 quadrillion bytes (https://bitcointalk.org/index.php?topic=5265993.msg56079310#msg56079310). Either way, I can't test it due to lack of RAM.
I ended up with sort -uk2 | sort -nk1 | cut -f2 (https://bitcointalk.org/index.php?topic=5259621.msg55057381#msg55057381).
I can think of another option that might work: if I use the sorted list to get the new addresses, I can get those out of the daily update while keeping the chronological order. This way I only have to deal with two 20 MB files which is easy. After this, all I have to do is add them to the total file.



I don't remember if I offered you this before but I can host this data for you if it's not too big
You did (more or less):
If network bandwidth is a problem I'm able to host this on my hardware if you like.
So I guess you missed my reply too:
I'm more in need for more disk space for sorting this data, but I haven't decided yet where to host it.

(I can throw up to 300GB for this project). I can also set up an rsync cron job to pull updates from your temporary location too.
It is a good offer :) Currently, disk space isn't the problem. I am looking for a webhost that allows me to abuse the disk for a few hours continuously once in a while. Most VPS providers aren't that happy when I do that, and my (sponsored) AWS server starts throttling I/O when I do this.
I'm (again) short on time to test everything, but after some discussion on another forum I created a RamNode account. This looks promising so far. If I can pull that off by automating everything, it's not that expensive to use it a couple hours per month only.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: JustHereReading on January 12, 2021, 09:25:32 PM
Yes. In fact, just 2 days ago (on another forum) I was pointed at the existence of "sort (https://man7.org/linux/man-pages/man1/sort.1.html) -mu":
Code:
      -m, --merge
              merge already sorted files; do not sort
This does exactly what you described. I haven't tested it yet, but I assume it's much faster than "regular" sort.
Update: I'm testing this now.

Really curious how that test works out. I do hope it does a little bit more than just merge the file and not sort them.

I do see that for the other list it might be a bit more difficult...

It can be done by awk '!a[$0]++' (https://bitcointalk.org/index.php?topic=5265993.msg55031112#msg55031112), but I don't have that kind of RAM. I'm not sure how efficient this is for large datasets, it might also run into the problem of having to read 30 quadrillion bytes (https://bitcointalk.org/index.php?topic=5265993.msg56079310#msg56079310). Either way, I can't test it due to lack of RAM.

I think you wrote that you'd need about 256GB of RAM for that operation, right? Sorry... can't help you out there. However a bloomfilter (https://en.wikipedia.org/wiki/Bloom_filter) might be nice to implement if you have a 'bit' of RAM (a lot less than 256GB).
Some quick math:
1GB: 1 in 13 false positives
2GB: 1 in ~170
3GB: 1 in ~2,200
4GB: 1 in ~28,000
5GB: 1 in ~365,000
6GB: 1 in ~4,700,000
7GB: 1 in ~61,000,000
8GB: 1 in ~800,000,000


Of course this would require some hashing overhead, but this should greatly outweigh looping over your 1.5 billion addresses. Unfortunately you'd still have to double check any positives, because they might be false.

I can think of another option that might work: if I use the sorted list to get the new addresses, I can get those out of the daily update while keeping the chronological order. This way I only have to deal with two 20 MB files which is easy. After this, all I have to do is add them to the total file.
This would definitely work and was the solution I originally proposed:
The 'All addresses ever used, without duplicates, in order of first appearance' list could be created in pretty much the same way.
This would be faster than the bloom filter if there's more than 1 new address that's already in the list.

By the way, I just checked out (but not downloaded) the daily file on blockchair. It's close to 1GB (compressed), but you mentioned 20MB for new addresses on numerous occasions. I guess there's a lot of cleaning to do there. Could I maybe get one of your (old) daily files? I should be able to throw some code together that makes this work, fairly quickly.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: brainless on January 13, 2021, 05:55:06 PM
you discussion about sort and remove duplicate , and make list available raw and sorted
my system is i3-6100 processor with 16gb ddr4 ram, and i am managing there all sort and remove duplicate from raw 19gb file within 1 hour, on work daily data is just few min job
let me explain
simple do
sort  raw.txt >> sorted.txt
split -l 50000000 sorted ( it will split filesstarting with name xaa next xab....)
next is remove duplicate by perl for fast and aprox can load 3gb file, but we make it more fast by selecting 50m lines
perl -ne'print unless $_{$_}++' xaa > part1.txt
2nd file
perl -ne'print unless $_{$_}++' xab > part2.txt
last you have compelete all files within 1 hour

now combine all file
cat part*.txt >> full-sorted.txt
or like sorted ( selected all part1.txt... part10.txt )
cat part1.txt part2.txt part3.txt >> full-sorted.txt

stage 2
2nd group you can continuous onword 21 dec 2020, all daily update files, combine, sort and remove duplicate
you name it new-group.txt

command is
join new-group.txt full-sorted.txt >> filter1.txt

here filter.txt is common on 2 files(new-group.txt and full-sorted.txt)
now need remove filter.txt from newgroup.txt for get pure only new addresses

awk 'FNR==NR{ a[$1]; next } !($1 in a)' filter.txt new-group.txt >> pure-new-addresses.txt

stage 3
if you still need all in one file

combine pure-new-address.txt and full-sorted.txt
cat pure-new-address.txt full-sorted.txt >> pre-full-sorted.txt
sort pre-full-sorted.txt >> new-full-addresses


its recomemnded leave 1 file as last created on 21 dec 2020
start 2nd file onword,  perform only stage 2, you will have only new addresses which is no apear in first 19gb file

hope i try to explain all points, and will help you and community , any further info, ask me , love to provide info what ever i have


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: NotATether on January 13, 2021, 06:30:35 PM
(I can throw up to 300GB for this project). I can also set up an rsync cron job to pull updates from your temporary location too.
It is a good offer :) Currently, disk space isn't the problem. I am looking for a webhost that allows me to abuse the disk for a few hours continuously once in a while. Most VPS providers aren't that happy when I do that, and my (sponsored) AWS server starts throttling I/O when I do this.
I'm (again) short on time to test everything, but after some discussion on another forum I created a RamNode account. This looks promising so far. If I can pull that off by automating everything, it's not that expensive to use it a couple hours per month only.

I have a server on RAID0 with 882MB/s read and 191MB/s write, so copying this stuff to a different place on the same disk will take about 40 seconds or so for a 30GB dataset.

AWS VPS's run on shared hardware so that's probably why you're getting throttled. There are dedicated servers on AWS you can get where you're in total control over the hardware and they don't throttle you and stuff. But I'm glad the RamNode account worked out for you. Let me know if you need help writing automation stuff.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: Vod on January 13, 2021, 07:18:26 PM
It can be done by awk '!a[$0]++' (https://bitcointalk.org/index.php?topic=5265993.msg55031112#msg55031112), but I don't have that kind of RAM. I'm not sure how efficient this is for large datasets, it might also run into the problem of having to read 30 quadrillion bytes (https://bitcointalk.org/index.php?topic=5265993.msg56079310#msg56079310). Either way, I can't test it due to lack of RAM... it's not that expensive to use it a couple hours per month only.

You are on AWS, right?   Why not have your sponsor upgrade your instance to a higher class for a few hours?  That's the beauty of on-demand processing. :)

EC2?   Those are dedicated resources, not shared.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on January 14, 2021, 06:18:06 PM
This post is the result of some trial&error. I also noticed blockdata.loyce.club/ (http://blockdata.loyce.club/) gets terribly slow once in a while, which made it useless to use RamNode for this data.

Really curious how that test works out. I do hope it does a little bit more than just merge the file and not sort them.
It merges all lines from both sorted files in sorted order. After several tests (on my old desktop with HDD), these are the relevant results:
Code:
Old process:
time cat <(gunzip -c addresses_sorted.txt.gz) daily_updates/*.txt | sort -uS80% | gzip > test1.txt.gz
real    90m2.883s

Faster new process:
time sort -mu <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt) | gzip > test2.txt.gz
real    51m26.730s
The output is the same.
Interestingly, when I tell sort -m to use up to 40% of my RAM, it actually uses that (even though it doesn't need it), which slows it down by 7 minutes.
Most CPU time is spent compressing the new gzip file.

Quote
I think you wrote that you'd need about 256GB of RAM for that operation, right? Sorry... can't help you out there. However a bloomfilter (https://en.wikipedia.org/wiki/Bloom_filter) might be nice to implement if you have a 'bit' of RAM (a lot less than 256GB).
That's going over my head, and probably far too complicated for something this simple.

Quote
I can think of another option that might work: if I use the sorted list to get the new addresses, I can get those out of the daily update while keeping the chronological order. This way I only have to deal with two 20 MB files which is easy. After this, all I have to do is add them to the total file.
This would definitely work and was the solution I originally proposed:
The 'All addresses ever used, without duplicates, in order of first appearance' list could be created in pretty much the same way.
This would be faster than the bloom filter if there's more than 1 new address that's already in the list.
I'll try:
Code:
Old code:
time cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2 | gzip > newchronological.txt.gz
real    194m24.456s

New:
time comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt) > newaddresses.txt
real    8m4.045s
time cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2 > all_daily_addresses_chronological_order.txt
real    1m14.593s
cat all_daily_addresses_chronological_order.txt newaddresses.txt | nl -nln | sort -k2 -S80% > test.txt
real    0m36.948s

I discovered uniq -f1 on stackexchange (https://unix.stackexchange.com/questions/204747/get-or-filter-duplicated-lines-by-column):
Code:
cat test.txt | uniq -df1 | sort -nk1 -S80% | cut -f2 > test2.txt
real    0m7.721s

Code:
Combined:
time cat <(cat <(cat ../daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c ../addresses_sorted.txt.gz) <(sort -u ../daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2 > newaddresses_chronological.txt
real    9m45.163s
Even more combined:
time cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) > new.alladdresses_chronological.txt
real    19m34.926s
This can significantly improve performance, especially if I keep uncompressed files for faster access. But it's wrong, I have 3 different output files from 3 different methods.

Quote
By the way, I just checked out (but not downloaded) the daily file on blockchair. It's close to 1GB (compressed), but you mentioned 20MB for new addresses on numerous occasions. I guess there's a lot of cleaning to do there. Could I maybe get one of your (old) daily files?
I use Blockchair's daily outputs (https://gz.blockchair.com/bitcoin/outputs/) to update this, not the daily list of addresses (https://gz.blockchair.com/bitcoin/addresses/).
See: http://blockdata.loyce.club/alladdresses/daily_updates/ for old daily files.



split -l 50000000 sorted ( it will split filesstarting with name xaa next xab....)
I don't see any benefit in splitting files for processing.



I have a server on RAID0 with 882MB/s read and 191MB/s write, so copying this stuff to a different place on the same disk will take about 40 seconds or so for a 30GB dataset.
Dedicated? :D That's the dream :o But even then, sorting data means reading and writing the same data several times.

Quote
AWS VPS's run on shared hardware so that's probably why you're getting throttled. There are dedicated servers on AWS you can get where you're in total control over the hardware and they don't throttle you and stuff.
AWS dedicated is totally out of my price range (for this side project that already got out of hand). I wasn't planning on spending a lot of money on this long-term, but if I can find a very affordable solution I can just keep adding servers to my collection.
So far it's looking good on speed improvements, especially getting rid of the sort-command executed on disk helps a lot.



You are on AWS, right?   Why not have your sponsor upgrade your instance to a higher class for a few hours?  That's the beauty of on-demand processing. :)
I don't want to be demanding, and AWS charges (my sponsor) $0.09 per GB. That's okay for HTML, but not for large files. My List of all Bitcoin addresses with a balance (https://bitcointalk.org/index.php?topic=5254914.0) alone transferred 450 GB since the start of this year. That would be $1000 per year on AWS, while it costs only a fraction elsewhere. I love how reliable AWS is, it just always works, but that's not necessary for my blockdata.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: Vod on January 14, 2021, 06:42:40 PM
I don't want to be demanding, and AWS charges (my sponsor) $0.09 per GB. That's okay for HTML, but not for large files. My List of all Bitcoin addresses with a balance (https://bitcointalk.org/index.php?topic=5254914.0) alone transferred 450 GB since the start of this year. That would be $1000 per year on AWS, while it costs only a fraction elsewhere. I love how reliable AWS is, it just always works, but that's not necessary for my blockdata.

Full price storage for AWS:
First 50 TB / Month   $0.023 per GB
Next 450 TB / Month   $0.022 per GB
Over 500 TB / Month   $0.021 per GB
You can then reduce these costs up to 72% if you commit to a certain spend.

Data transfer out of AWS:
Up to 1 GB / Month      $0.00 per GB
Next 9.999 TB / Month   $0.09 per GB (About $500 a year)

Consider your data is all alone on your VPS too.  If you were on AWS, you could transfer your data to other AWS clients (like me) $0.01 per GB.  :)

Also, you can get a server with 256GB of RAM and 32 processors for $1.50 per hour.  You attach your storage to the VPS, run your queries for however long it takes, then terminate the instance and move your storage back to your existing lower powered system.

Now that you have an established site and case use for AWS services, I can get you $1,000 in AWS credits for your own account, if you are interested.  I'm in training to be certified as a cloud consultant. 



Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on January 15, 2021, 08:48:57 AM
Next 9.999 TB / Month   $0.09 per GB (About $500 a year)
Today's counter is at 50 GB (in 15 days), that brings me at $1000 per year if I'd have to pay $0.09 per GB. At current rate, I'll hit the data limit by the end of this month for this VPS, and until now traffic is going up. My current limit is 1 TB/month, and for $0.00067 per GB I can double that.

Quote
Also, you can get a server with 256GB of RAM and 32 processors for $1.50 per hour.  You attach your storage to the VPS, run your queries for however long it takes, then terminate the instance and move your storage back to your existing lower powered system.

Now that you have an established site and case use for AWS services, I can get you $1,000 in AWS credits for your own account, if you are interested.  I'm in training to be certified as a cloud consultant.
The offer is good, but AWS wants my creditcard, which I don't want to link to this. I only use hosting that accepts crypto.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: Vod on January 15, 2021, 06:09:41 PM
The offer is good, but AWS wants my creditcard, which I don't want to link to this. I only use hosting that accepts crypto.

Hello!  (waving)   You can have full control of an account minus billing.  I can pay the bill and accept crypto.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: JustHereReading on January 16, 2021, 09:20:19 AM
Really curious how that test works out. I do hope it does a little bit more than just merge the file and not sort them.
It merges all lines from both sorted files in sorted order. After several tests (on my old desktop with HDD), these are the relevant results:
Code:
Old process:
time cat <(gunzip -c addresses_sorted.txt.gz) daily_updates/*.txt | sort -uS80% | gzip > test1.txt.gz
real    90m2.883s

Faster new process:
time sort -mu <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt) | gzip > test2.txt.gz
real    51m26.730s
The output is the same.
Interestingly, when I tell sort -m to use up to 40% of my RAM, it actually uses that (even though it doesn't need it), which slows it down by 7 minutes.
Most CPU time is spent compressing the new gzip file.
That's a significant improvement. You could give pigz a try, see: https://unix.stackexchange.com/a/88739/314660. I'm not sure what the drawbacks would be, I"ve never tried pigz myself.

Quote
I think you wrote that you'd need about 256GB of RAM for that operation, right? Sorry... can't help you out there. However a bloomfilter (https://en.wikipedia.org/wiki/Bloom_filter) might be nice to implement if you have a 'bit' of RAM (a lot less than 256GB).
That's going over my head, and probably far too complicated for something this simple.
Honestly, the bloomfilter was a silly suggestion. It will probably not be a big improvement (if any) compared to your current code.

I use Blockchair's daily outputs (https://gz.blockchair.com/bitcoin/outputs/) to update this, not the daily list of addresses (https://gz.blockchair.com/bitcoin/addresses/).
See: http://blockdata.loyce.club/alladdresses/daily_updates/ for old daily files.
Thanks! Hoping to do some experimenting soon (if I have the time...)


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on January 16, 2021, 04:56:49 PM
You can have full control of an account minus billing.  I can pay the bill and accept crypto.
It's really not worth it for this project. I prefer to pay a low amount once a year, and once it reaches it's data limit, it just shuts down until the next month starts.

You could give pigz a try, see: https://unix.stackexchange.com/a/88739/314660. I'm not sure what the drawbacks would be, I"ve never tried pigz myself.
Parallel compression is only useful when server load isn't a restriction. For now I stick to the standard.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: MrFreeDragon on January 18, 2021, 11:42:25 PM
Hi! Is it possible to link the public key for every bitcoin address in your database? (of course only for whose of them where public key was exposed).


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on January 19, 2021, 09:53:27 PM
Is it possible to link the public key for every bitcoin address in your database?
If I can get the data I can add it. I'm no expert on this, can I use anything from inputs (http://blockdata.loyce.club/inputs/blockchair_bitcoin_inputs_20110313.tsv.gz) (maybe spending_signature_hex?) to get this data?


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: renedx on January 20, 2021, 04:23:15 AM
Just want to say what a great job you did. We use your data to build graphs and do some fun stuff (download each month twice to be not so demanding on your bandwidth).

We were building a pubkey list too, but wasn’t worth the effort at the end in our part (wasn’t much fun you could really do with it).

For living we host high-end enterprise, just in case you need some space or mirrors, you’re welcome if ever in need.

Thanks  ;)


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: NotATether on January 20, 2021, 05:01:21 AM
Is it possible to link the public key for every bitcoin address in your database?
If I can get the data I can add it. I'm no expert on this, can I use anything from inputs (http://blockdata.loyce.club/inputs/blockchair_bitcoin_inputs_20110313.tsv.gz) (maybe spending_signature_hex?) to get this data?

I looked for compressed keys at the end of the spending_signautre_hex values and I found that a lot of them don't have public keys at the end. Makes me think they are signatures of transactions, not scripts.

So the real solution (fell asleep while studying the dataset  :D) is to take the transaction_hex field and pass it as the argument to the "decoderawtransaction" RPC call. It'll return JSON where the signature script is located at [N]["vin"]["scriptSig"]["hex"] for each input index N and then get the compressed public key in the last 33 bytes of the hex.

You'll obviously need a bitcoind for that and it's possible to configure access from another machine if you're resource-constrained on the box you're parsing this address data on.

I think bitcoind should be able to handle the load, especially if it's running locally.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on January 20, 2021, 10:10:16 AM
we host high-end enterprise, just in case you need some space or mirrors, you’re welcome if ever in need.
For now, I'm covered for bandwidth, thanks :)

So the real solution (fell asleep while studying the dataset  :D) is to take the transaction_hex field and pass it as the argument to the "decoderawtransaction" RPC call. It'll return JSON where the signature script is located at [N]["vin"]["scriptSig"]["hex"] for each input index N and then get the compressed public key in the last 33 bytes of the hex.
And this just went over my head :P

Quote
You'll obviously need a bitcoind for that and it's possible to configure access from another machine if you're resource-constrained on the box you're parsing this address data on.

I think bitcoind should be able to handle the load, especially if it's running locally.
Although I'd like to be able to extract all data myself from Bitcoin Core (so I don't need to rely on Blockchair anymore), it also makes it much more complicated. So for now, I'll pass on this.
And I don't want to add more local data processing to what I'm doing already. If anything, I want to move more to a VPS.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: JustHereReading on January 20, 2021, 06:20:10 PM
I can think of another option that might work: if I use the sorted list to get the new addresses, I can get those out of the daily update while keeping the chronological order. This way I only have to deal with two 20 MB files which is easy. After this, all I have to do is add them to the total file.

I found a bit of time to write this. Testing it now..

Just to check with you, I was sorta wrong here:
Given two sorted lists:
n = 1 5 10 11 12 13 14 15 16 19 20
k = 3 6 18

We can read n from disk line by line and compare it to the current position in k.

1 < 3, write 1 to new file.
5 > 3, write 3 to file.
5 < 6, write 5 to file.
10 > 6, write 6 to file.
10 < 18, write 10 to file.
11 < 18, write 11 to file.
....
16 < 18, write 16 to file.
19 > 18, write 18 to file.
19 & nothing left in k, write 19 to file.
20 & nothing left in k, write 20 to file.

That's n + k instead of n * k, right?

Since we're sorting as strings it would actually be:
n = 1 10 11 12 13 14 15 16 19 20 5
k = 18 3 6

The whole list would then become:
all = 1 10 11 12 13 14 15 16 18 19 20 3 5 6

Correct?


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: JustHereReading on January 21, 2021, 12:41:01 PM
I can think of another option that might work: if I use the sorted list to get the new addresses, I can get those out of the daily update while keeping the chronological order. This way I only have to deal with two 20 MB files which is easy. After this, all I have to do is add them to the total file.

I found a bit of time to write this. Testing it now..

The first results of yesterday's testing look promising. I should go back and double check if the outputs are correct, but they seem to be.

I created a VM with 2 cores/2 threads (so no hyperthreading or whatever AMD's equivalent is called) of my Ryzen 3600 and 512mb of RAM (just because Ubuntu Server, for which I had an ISO handy, wouldn't boot with 256MB). To make the numbers mean anything I first benchmarked your current setup:

Code:
time sort -mu <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt) | gzip > test2.txt.gz
Loycev: real    51m26.730s
JustHereReading: real 40m13.684s

My cores were pushed to ~50% so unsurprisingly pigz yielded an improvement in my setup. However, I was a little surprised by the amount of improvement.
Code:
time sort -mu <(pigz -dc addresses_sorted.txt.gz) <(sort -u daily-file-long.txt) | pigz > output.txt.gz
real    14m29.865s

And now... for the main event:
Code:
time gunzip -c addresses_sorted.txt.gz | python3 add_new_adresses_sorted.py | gzip > output.txt.gz
real    39m42.574s
The script ran slightly faster than your current setup. In that time it sorted (and compressed) the first list in addition to creating a text file that can be appended to the second list. Unfortunately I overwrote the results of your current setup, so I didn't verify the output (yet).


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on January 24, 2021, 06:32:18 PM
Since we're sorting as strings it would actually be:
n = 1 10 11 12 13 14 15 16 19 20 5
k = 18 3 6

The whole list would then become:
all = 1 10 11 12 13 14 15 16 18 19 20 3 5 6
Correct:
Code:
echo '1 10 11 12 13 14 15 16 19 20 5 18 3 6' | tr ' ' '\n' | sort -u | tr '\n' ' '
1 10 11 12 13 14 15 16 18 19 20 3 5 6

I should go back and double check if the outputs are correct, but they seem to be.
I haven't had the time yet to find my bug(s).
For comparison, here's the md5sum for the result from my old code (https://bitcointalk.org/index.php?topic=5265993.msg56096599#msg56096599) (gunzipped):
Code:
md5sum newchronological.txt
4070c03f974da0ee05ea51084d0f04ac  newchronological.txt

Quote
However, I was a little surprised by the amount of improvement.
Using pigz instead of gzip is interesting indeed. It seems to be more efficient, in that case it will also be worth it on a VPS with only one core. I didn't know it's in the default repository, so I've installed it now.
It doesn't use multiple cores to decompress, but it's significantly faster anyway:
Code:
time gunzip -c addresses_sorted.txt.gz | md5sum
real    7m27.541s
time pigz -dc addresses_sorted.txt.gz | md5sum
real    4m35.826s

From 40m13 to 14m29 can't be explained by just using 2 instead of 1 core, so it must be more efficient. The performance difference is less spectacular on my system:
Code:
time sort -mu <(pigz -dc addresses_sorted.txt.gz) <(sort -u daily-file-long.txt) | pigz > pigz_output.txt.gz
real    31m54.478s

As for file size:
Code:
gzip: 17970427375 bytes
pigz: 17990501927 bytes
The 0.1% size difference is negligible.

Quote
And now... for the main event:
Code:
time gunzip -c addresses_sorted.txt.gz | python3 add_new_adresses_sorted.py | gzip > output.txt.gz
real    39m42.574s
Can you post your add_new_adresses_sorted.py?


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: MrFreeDragon on January 28, 2021, 03:37:37 PM
Is it possible to link the public key for every bitcoin address in your database?
If I can get the data I can add it. I'm no expert on this, can I use anything from inputs (http://blockdata.loyce.club/inputs/blockchair_bitcoin_inputs_20110313.tsv.gz) (maybe spending_signature_hex?) to get this data?

Yes, spending signature hash contains the public key.
As the example from your blockchair_bitcoin_inputs_20110313.tsv.gz:

1) In case of pubkey (column 'type'), the public key is exactly the value of spending signature. The public key was recorded in blockchain in early dates. So, all the early coin base transactions contains the public key.
Code:
block id: 112995
transaction_hash: 523f57581390203da2aef169b543fc1ddcd84be6dd35cfb248228b0912dc97e3

public key: 483045022100a66be0435ed532f7a065073e2549f6cb13a546efc7f55d49072f5038d0764df502206f426bff777dd44ccbab703541a38d8ff86ae09e350862c8fb9cb2b0dd79c5e801

2) For other transactions in your example the public key is part of the hex spending signature. That signature contains DER and public key.

Code:
transaction hash: d9ff0a82399ae55ab95093020ed6f17bab80697dcafe74d264900cdd56f8c0aa

48304502200a30a8634f41dbd92c007f937f6b0f220cc1f5a321252cc459092e45b1c53778022100caace1a19e1d195e8510ac290c13863f4ada16c2837d6af80ea6b237ba65975c01410418a808ea3fc6fd99c11b51f572d2e85b25a33048de2c0afda336c32da4619897629943638def6df1ab0d3acd35faa7e8c73ff1fc275677f82191b253a7bbf383

And the public key will be the following part:
48304502200a30a8634f41dbd92c007f937f6b0f220cc1f5a321252cc459092e45b1c5377802210 0caace1a19e1d195e8510ac290c13863f4ada16c2837d6af80ea6b237ba65975c01410418a808ea3fc6fd99c11b51f572d2e85b25a33048de2c0afda336c32da4619897629943638def6 df1ab0d3acd35faa7e8c73ff1fc275677f82191b253a7bbf383

Here is the example of the full structure: https://pastebin.com/Q55PyUgB


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on January 29, 2021, 09:55:06 AM
And the public key will be the following part:
48304502200a30a8634f41dbd92c007f937f6b0f220cc1f5a321252cc459092e45b1c5377802210 0caace1a19e1d195e8510ac290c13863f4ada16c2837d6af80ea6b237ba65975c01410418a808ea3fc6fd99c11b51f572d2e85b25a33048de2c0afda336c32da4619897629943638def6 df1ab0d3acd35faa7e8c73ff1fc275677f82191b253a7bbf383
So if I take these inputs (http://blockdata.loyce.club/inputs/blockchair_bitcoin_inputs_20110313.tsv.gz) as an example:
Code:
block_id        transaction_hash        index   time    value   value_usd       recipient       type    script_hex      is_from_coinbase        is_spendable    spending_block_id       spending_transaction_hash       spending_index  spending_time   spending_value_usd      spending_sequence       spending_signature_hex  spending_witness        lifespan        cdd
66558   d9ff0a82399ae55ab95093020ed6f17bab80697dcafe74d264900cdd56f8c0aa        0       2010-07-13 09:19:38     4000000000      0.4     12Y4RVpkQ4uKCNUpea4jJRWgd2nxScNkaG      pubkeyhash      76a91410d7e0a01323508f1881407713971fab34b7898788ac      0       -1      113331  c0ef633ac227c17860abeed2ebbbcfedf4a108e61d9950c0c68171e20e32e945        0       2011-03-13 00:06:05     36      4294967295      48304502200a30a8634f41dbd92c007f937f6b0f220cc1f5a321252cc459092e45b1c53778022100caace1a19e1d195e8510ac290c13863f4ada16c2837d6af80ea6b237ba65975c01410418a808ea3fc6fd99c11b51f572d2e85b25a33048de2c0afda336c32da4619897629943638def6df1ab0d3acd35faa7e8c73ff1fc275677f82191b253a7bbf383          20961987        9704.62361111111
112995  523f57581390203da2aef169b543fc1ddcd84be6dd35cfb248228b0912dc97e3        0       2011-03-10 09:08:48     5001000000      42.5085 1GEX1ZHu5aHjnL6HogKYHfqCAFQvcVWpKA      pubkey  4104a2032ae7a01f69747fac92c9c343ce5f1c841d450ada9b9498e84050de80a3e72d7d5f92967c2b1edc061222a28eee5d0256d60b5eafaf8b33e358c6dafa3923ac  1       1       113331  b25bacedd7ecc2d945e69cfe40dade836d57edf7583da70952ee37743d3852a8        0       2011-03-13 00:06:05     45.009  4294967295      483045022100a66be0435ed532f7a065073e2549f6cb13a546efc7f55d49072f5038d0764df502206f426bff777dd44ccbab703541a38d8ff86ae09e350862c8fb9cb2b0dd79c5e801              226637  131.18190243055554
The part you're looking for is:
Code:
type          spending_signature_hex
pubkeyhash    48304502200a30a8634f41dbd92c007f937f6b0f220cc1f5a321252cc459092e45b1c53778022100caace1a19e1d195e8510ac290c13863f4ada16c2837d6af80ea6b237ba65975c01410418a808ea3fc6fd99c11b51f572d2e85b25a33048de2c0afda336c32da4619897629943638def6df1ab0d3acd35faa7e8c73ff1fc275677f82191b253a7bbf383
pubkey        483045022100a66be0435ed532f7a065073e2549f6cb13a546efc7f55d49072f5038d0764df502206f426bff777dd44ccbab703541a38d8ff86ae09e350862c8fb9cb2b0dd79c5e801
If type is pubkeyhash, you want the whole spending_signature_hex (146 characters). Should I remove the first 16 characters to make it 130 characters long?
If type is pubkey, you want the last 130 characters?
If this is correct, I can just take the last 130 characters for both types, right?


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: naufragus on January 30, 2021, 04:04:12 AM
Just to let you guys know i updated my bitcoin-all-addresses list on 2021 Jan 19.
That is available in my github repo https://github.com/mountaineerbr/bitcoin-all-addresses (https://github.com/mountaineerbr/bitcoin-all-addresses)
All addresses are uniquely printed in the order they first appeared in Blockchair output dumps.

I was able to reproduce my methodology after 6 moths from the first lists.
The methodology is described in the read me of the git repo
and some code i used here: https://github.com/mountaineerbr/bitcoin-all-addresses/blob/master/blockchair.btcoutputs.process.sh (https://github.com/mountaineerbr/bitcoin-all-addresses/blob/master/blockchair.btcoutputs.process.sh)

If you export LANG=C and LC_ALL=C, that will speed up sorting and as we are dealing
with base58 and segwit base addresses, that should be OK.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: PrimeNumber7 on February 01, 2021, 03:44:02 AM
Resulting in O(n + k log k + 2k). In this particular case one might even argue that n > k log k + 2k, therefore O(2n) = O(n) However, it's late here and I don't like to argue.

You only need enough memory to keep the new addresses in memory and enough disk space to keep both the new and old version on disk at the same time.

You are correct. I had not considered updating the list not from scratch. Accessing a single line from a file can degrade performance, and it should be considered if paying for more RAM would be cost effective considering the additional time required to update the list.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on February 01, 2021, 05:44:59 PM
Quoting myself:
Code:
Old code:
time cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2 | gzip > newchronological.txt.gz

Code:
New code:
time cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) > new.alladdresses_chronological.txt
But it's wrong
I tried again:
Code:
time cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2 > oldcode.txt
time cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) > newcode.txt
The files were too large to diff, so I split them in parts (10 million lines each). There are a few differences:
Code:
< 3BS6oQKHDwrzz4RC69iAbSV13xpbGZvXLj
---
> 3BS6oQKHDwrzz4SC69iAbSV13xpbGZvXLj

< 17Q7LN9nCmS6HdjkDj3C4MdhduFobGp4hv
---
> 17Q7LN9nCmS6HdjkDk3C4MdhduFobGp4hv

< 1rVH156qu1djPVFGoKaZ29Kw8zEpmh283
9863597a9863597
> 1rVH156qu1djPVFGoKaZ29Kw8zEpmh283

< 3Kw9pkLTLExTd9LZW2qbbNUdZRpUW3JTac
---
> 3Kw9pkLTLExTd9LZW2qbbNUdZSpUW3JTac

< 1Q7NSpgjxDHTTPUkGskTNDioCYw6MQazBG
---
> 1Q7NSpgjyDHTTPUkGskTNDioCYw6MQazBG

< 331xujHAg6AGvKzPwUKZ9AJxukaemCXeRw
8496039c8496038
< 3PLoa4ccMdxyGY6mAEStSu45xwqdRftd1b

> 3PLoa4ccMdxyGY6mAEStSu55xwqdRftd1b
10000000a10000000
> 3B92y4bFFPZvjviNhtLWeBKoYXmHVwr3CD

< 3B92y4bFFPZvjviNhtLWeBKoYXmHVwr3CD
7735278a7735278
> 331xujHAg6AGvKzPwUKZ9AJxukaemCXeRw

Let's highlight this one:
Quote
< 1Q7NSpgjxDHTTPUkGskTNDioCYw6MQazBG
---
> 1Q7NSpgjyDHTTPUkGskTNDioCYw6MQazBG
The first one (with x) is correct (https://blockchair.com/bitcoin/address/1Q7NSpgjxDHTTPUkGskTNDioCYw6MQazBG?_type=address&_search=header).

I have no idea what causes this. I'm now checking if I can reproduce the exact same data change, or that it's caused by hardware failure.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on February 03, 2021, 09:29:54 PM
What can be the cause of this? The first and third run gave the same results, the others are different.
Code:
cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
7d2f923c7ce1d9534629b4502c37680d  -

cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
cede1315137bb4a2ab20c5438e4525ba  -

cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
112100c359f74c0e60b95afa92de990d  -

cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
1bc7138bb4a367c117002234a604d444  -

cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
cc91ef352ffa1e641a7c47dcc3d743f3  -
I'm still running the same command on the same (old) system, but from a different HDD (with tmpfiles on a different drive too).


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: PlutonowyPokrzycz on March 01, 2021, 11:04:02 AM
Hi LoyceV,
Thanks for the nice resource of data!

Your URL alladdresses.loyce.club/?C=M;O=D is not working. Can you please check?

Background
To follow up on List of all Bitcoin addresses with a balance (https://bitcointalk.org/index.php?topic=5254914.0) and this post (https://bitcointalk.org/index.php?topic=5259621.msg54833270#msg54833270), I made a list of all Bitcoin addresses that have ever been used.

The data
See alladdresses.loyce.club (http://alladdresses.loyce.club/?C=M;O=D) (new location)
I now have the resources (RAM, CPU power and disk space) and code (https://bitcointalk.org/index.php?topic=5265993.msg55057504#msg55057504) to show unique addresses in their original order. Each address is only shown once. I have 2 large files:



Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on March 01, 2021, 11:13:57 AM
Your URL alladdresses.loyce.club/?C=M;O=D is not working. Can you please check?
I still haven't found a new host for this, so it's still on it's temporary location:
I've uploaded the latest version to a temporary location: blockdata.loyce.club/alladdresses/ (http://blockdata.loyce.club/alladdresses/).
The latest update was 2.5 months ago (because of weird problems (https://bitcointalk.org/index.php?topic=5265993.msg56262241#msg56262241)).

Thanks for the reminder though, I'll do some more testing to get updates working again.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on May 07, 2021, 02:56:54 PM
What can be the cause of this? The first and third run gave the same results, the others are different.
~
I'm still running the same command on the same (old) system, but from a different HDD (with tmpfiles on a different drive too).
I'm now trying the same on a fresh RamNode cloud instance, and have the same problem:
Code:
loyce@160gb:~/alladdresses.loyce.club$ cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
f96f2952151451b88edcf01332ec907d  -
loyce@160gb:~/alladdresses.loyce.club$ cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
70d3d472590b3fb8356348e9fd189ddb  -
I no longer think my very old PC is the problem. As far as I know, these commands should produce the exact same output given the exact same input data. But the data changes somehow.

I realize Bitcointalk is probably not the best forum, but I'm not actively using a more specialized forum, so I post it here.

Update: I tried two more times:
Code:
loyce@160gb:~/alladdresses.loyce.club$ cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
f96f2952151451b88edcf01332ec907d  -
loyce@160gb:~/alladdresses.loyce.club$ cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
d4327e249819af8d025862bd4079d44d  -
I reproduced the same checksum only once. I need booze :P



This was how I started:
For comparison, here's the md5sum for the result from my old code (https://bitcointalk.org/index.php?topic=5265993.msg56096599#msg56096599) (gunzipped):
Code:
md5sum newchronological.txt
4070c03f974da0ee05ea51084d0f04ac  newchronological.txt
And if I split up my above command string and write some temporary files to disk, I get the exact same (correct) result again:
Code:
cat firstgunzip thirdsort | md5sum
cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2 > firstcat
cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) > firstgunzip
cat <(sort -u daily_updates/*.txt) > firstsort
cat <(gunzip -c addresses_sorted.txt.gz) > secondgunzip
cat <(comm -13 secondgunzip firstsort) > firstcomm
#cat <(cat firstcat firstcomm | nl -nln | sort -k2 -S80%) > secondsort
cat <(cat firstcat firstcomm | nl -nln | sort -k2) > secondsort
cat <(cat secondsort | uniq -df1 | sort -nk1 | cut -f2) > thirdsort
cat firstgunzip thirdsort | md5sum
4070c03f974da0ee05ea51084d0f04ac  -



It gets weirder: I now can't even reproduce the same weird problem again, so I can't know whether or not my changes fixed it.


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: mike2077 on May 25, 2021, 09:57:34 AM
Hi there, first of all thanks for nice data source.

Could you check the links , all links are down.
Thanks


Title: Re: List of all Bitcoin addresses ever used - currently available on temp location
Post by: LoyceV on May 25, 2021, 10:14:06 AM
all links are down.
Sorry, I forgot to post this here:
This server is currently offline. I don't know why (yet).
Still no response from my webhost :(


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: mike2077 on May 26, 2021, 04:55:39 AM
I can suggest sharing files via torrent. Technically you'd have to upload a lot less since a lot of data will be transferred between people downloading.
You can share magnet link here, seed for some time and then let people share it, just a suggestion.

Linux question

what is the benefit of two directional syntax -
Code:
cat <(gunzip -c addresses_sorted.txt.gz) > secondgunzip
instead of just
Code:
gunzip addresses_sorted.txt.gz
or
Code:
gunzip -c addresses_sorted.txt.gz > out.txt
?

or another exmple:

Code:
cat <(sort -u daily_updates/*.txt) > firstsort

instead of
 
Code:
cat daily_updates/*.txt |sort -u > firstsort

Thanks


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on May 26, 2021, 08:15:51 AM
I can suggest sharing files via torrent. Technically you'd have to upload a lot less since a lot of data will be transferred between people downloading.
I haven't tried that yet for these reasons: I don't want to upload from my desktop, so I still need a VPS. I don't expect many simultaneous downloads, so most of the uploads will still come from me. Every update will make an existing torrent useless again (and I don't want to keep posting new magnet links).

Quote
Linux question

what is the benefit of two directional syntax -
Code:
cat <(gunzip -c addresses_sorted.txt.gz) > secondgunzip
instead of just
Code:
gunzip addresses_sorted.txt.gz
or
Code:
gunzip -c addresses_sorted.txt.gz > out.txt
?
I was isolating a part from the longer code:
Code:
This:
<(gunzip -c addresses_sorted.txt.gz)
Came from:
....t -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(s....
I didn't edit it so it's clear where it came from.

Quote
or another exmple:
Same reason :)



I am now pretty sure the inconsistent results were caused by using sort S, --buffer-size=SIZE (https://man7.org/linux/man-pages/man1/sort.1.html). I was trying to be smart enforcing efficient memory usage, but I now believe this sometimes showed an error, which was then piped into the next command. If I omit the -S40% part, it works fine. This is actually good news, because it's much faster than my previous method.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: mike2077 on June 02, 2021, 12:40:38 PM
Hi LoyceV,

Any news from the provider, when your server is coming back up?
 Do yo have temp location I can download files from ?

Thanks


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on June 02, 2021, 02:03:40 PM
Any news from the provider, when your server is coming back up?
Nope, they're awefully quiet :(

Quote
Do yo have temp location I can download files from ?
I can boot up a pay-by-the-hour VPS for you and upload the files (recently updated, until July 26, 2021).
I have 2 versions:
1. All Bitcoin addresses ever used, in chronological order, without duplicates.
Sample: addresses_in_order_of_first_appearance.txt.gz (http://alladdresses.loyce.club/addresses_in_order_of_first_appearance.txt.gz): (Warning: 18 GB):
Code:
1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa
12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
1HLoD9E4SDFFPDiYfNYnkBLQ85Y51J3Zb1
.......
3GFfFQAFgXKiA1qqUK6rqBpEpG4vZDos6t
3Mbtv47gZ2eN6Fy7owpgHHwSLYHS42P56P
38JyF2RQknBUMETyRT2yGndDJFYSp6hJNg

2. All Bitcoin addresses ever used, sorted by address, without duplicates.
Sample: addresses_sorted.txt.gz (http://alladdresses.loyce.club/addresses_sorted.txt.gz): (Warning: 16 GB):
Code:
1111111111111111111114oLvT2
111111111111111111112BEH2ro
111111111111111111112xT3273
.......
s-ffd80dee5966fb23c1a483b28f6bfcbc
s-fff5d0faa9628c188e97661f0e185fce
s-ffff291613d413b4ac128df96a462294
Which one would you prefer? The sorted version is much more practical for most uses, so unless you have a specific reason to want the addresses in chronological order, I'd say go for the sorted file.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: mike2077 on June 03, 2021, 02:20:25 PM
If you can and its not too hard for you, version 1 would be awesome!

1. All Bitcoin addresses ever used, in chronological order, without duplicates.
Sample: addresses_in_order_of_first_appearance.txt.gz: (Warning: 18 GB):
Code:

1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa
12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
1HLoD9E4SDFFPDiYfNYnkBLQ85Y51J3Zb1
.......
3GFfFQAFgXKiA1qqUK6rqBpEpG4vZDos6t
3Mbtv47gZ2eN6Fy7owpgHHwSLYHS42P56P
38JyF2RQknBUMETyRT2yGndDJFYSp6hJNg

Thanks.

BTW,  I think mega - mega.co.nz give you something like 50GB of storage for free.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on June 03, 2021, 02:55:10 PM
BTW,  I think mega - mega.co.nz give you something like 50GB of storage for free.
That's a terrible site, I used it once to download a large file, it forced me to install their program first. So I prefer a VPS.
I'll let you know when it's available.

Update: I got you http://107.191.98.18/addresses_sorted.txt.gz ! It's 19 GB. Please let me know when I can nuke the VPS again.
Update: link expired.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: NotATether on June 04, 2021, 04:08:15 AM
BTW,  I think mega - mega.co.nz give you something like 50GB of storage for free.
That's a terrible site, I used it once to download a large file, it forced me to install their program first. So I prefer a VPS.

Most cloud storage sites can't upload files several GB large well without constantly breaking the connection, and will throttle the download speed even more which makes them unsuitable for downloading those files as well.

Also it's not 50GB of free storage, it's much smaller than that. Most of that free storage is temporary and expires after a year.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: PrimeNumber7 on June 04, 2021, 04:30:42 AM
Update: I got you http://107.191.98.18/addresses_sorted.txt.gz ! It's 19 GB. Please let me know when I can nuke the VPS again.
"Your download will take ~4 hours to complete"

IMO, you should upload the file to a GCS/AWS/Azure/Oracle/etc storage bucket, set the permissions to "anyone can access" but set the object so that the "requestor pays" for downloads. This will result in you paying under a dollar per month in storage costs, but anyone who accesses your file will pay a few dollars to get your data in seconds.

Maintainng a multigigabyte file that is accessible to the public for free, that can be accessed unlimited times is really not feasible.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: NotATether on June 04, 2021, 05:54:36 AM
IMO, you should upload the file to a GCS/AWS/Azure/Oracle/etc storage bucket, set the permissions to "anyone can access" but set the object so that the "requestor pays" for downloads. This will result in you paying under a dollar per month in storage costs, but anyone who accesses your file will pay a few dollars to get your data in seconds. 

Or you can just ask me nicely and I'll host it on my site's public directory. (https://files.notatether.com/public/loycev/addresses_sorted.txt.gz)


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on June 04, 2021, 09:28:23 AM
Update: I got you http://107.191.98.18/addresses_sorted.txt.gz ! It's 19 GB. Please let me know when I can nuke the VPS again.
"Your download will take ~4 hours to complete"
I get this (in England):
Code:
-                     0%[                    ] 139.10M  33.9MB/s    eta 10m 15s

Quote
IMO, you should upload the file to a GCS/AWS/Azure/Oracle/etc storage bucket, set the permissions to "anyone can access" but set the object so that the "requestor pays" for downloads. This will result in you paying under a dollar per month in storage costs, but anyone who accesses your file will pay a few dollars to get your data in seconds.
There's 2 problems with that: I don't want to use a creditcard, and I don't want anyone who downloads it to require a creditcard. If I need to charge a few dollars per download, I'd rather set it up myself to accept Bitcoin payments.

Quote
Maintainng a multigigabyte file that is accessible to the public for free, that can be accessed unlimited times is really not feasible.
My other project (List of all Bitcoin addresses with a balance (https://bitcointalk.org/index.php?topic=5254914.0)) is closing in on it's 2 TB montly bandwidth limit. I'd hate to have to setup a payment system, especially since this is basically just mirroring data from Blockchair.com. I've never used Torrent from CLI, that might work. It's a privacy problem for the user though: Torrent shares IP addresses get shared with other downloaders, so they need a VPN again.

Or you can just ask me nicely and I'll host it on my site's public directory. (https://files.notatether.com/public/loycev/addresses_sorted.txt.gz)
What bandwidth limitations do you have? :D I don't just want to make problem your problem.

I remember I had another offer:
For living we host high-end enterprise, just in case you need some space or mirrors, you’re welcome if ever in need.
Is this offer still valid?


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: mike2077 on June 04, 2021, 02:29:59 PM
BTW,  I think mega - mega.co.nz give you something like 50GB of storage for free.
That's a terrible site, I used it once to download a large file, it forced me to install their program first. So I prefer a VPS.
I'll let you know when it's available.

Update: I got you http://107.191.98.18/addresses_sorted.txt.gz ! It's 19 GB. Please let me know when I can nuke the VPS again.

I've got the file, thank you so much for sharing.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on June 04, 2021, 02:38:19 PM
I've got the file, thank you so much for sharing.
You're welcome :)

Judging by the 64 GB outgoing traffic, the file was downloaded 3 times. If anyone else wants it:
I got you http://107.191.98.18/addresses_sorted.txt.gz ! It's 19 GB.
I'll nuke this VPS tomorrow. It's gone, until I find a more permanent solution.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: PrimeNumber7 on June 04, 2021, 06:53:48 PM
Update: I got you http://107.191.98.18/addresses_sorted.txt.gz ! It's 19 GB. Please let me know when I can nuke the VPS again.
"Your download will take ~4 hours to complete"
I get this (in England):
Code:
-                     0%[                    ] 139.10M  33.9MB/s    eta 10m 15s
The 4 hour quote appears to be the result of my crappy WiFi connection on my back porch. I was able to reproduce a ~10 minute download estimate via a datacenter. I have been able to transfer ~a half terabyte worth of videos stored in a storage bucket in seconds.

Quote from: PN7
IMO, you should upload the file to a GCS/AWS/Azure/Oracle/etc storage bucket, set the permissions to "anyone can access" but set the object so that the "requestor pays" for downloads. This will result in you paying under a dollar per month in storage costs, but anyone who accesses your file will pay a few dollars to get your data in seconds.
There's 2 problems with that: I don't want to use a creditcard, and I don't want anyone who downloads it to require a creditcard. If I need to charge a few dollars per download, I'd rather set it up myself to accept Bitcoin payments.
That is a reasonable desire, however it is something that is more difficult as you are making big data available to the public. Service providers have limited network infrastructure, and need to pay for data sent to the internet, regardless of if they have data caps, or charge you for egress/outgoing data. If you are freely sharing a 10 or 20 GB file(s) using a service provider that does not charge per data transferred, you will eventually get kicked off from that service provider.

Another point is that many people look at bitcoin-related data today. The fact that someone is looking at blockchain data is not the privacy leak that it might have been 10 years ago.

Quote from: PN7
Maintainng a multigigabyte file that is accessible to the public for free, that can be accessed unlimited times is really not feasible.
My other project (List of all Bitcoin addresses with a balance (https://bitcointalk.org/index.php?topic=5254914.0)) is closing in on it's 2 TB montly bandwidth limit. I'd hate to have to setup a payment system, especially since this is basically just mirroring data from Blockchair.com.
There is a reason why blockchair throttles downloads, and why they charge as much as they do for an API key.



Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on June 05, 2021, 08:07:25 AM
Service providers have limited network infrastructure, and need to pay for data sent to the internet, regardless of if they have data caps, or charge you for egress/outgoing data. If you are freely sharing a 10 or 20 GB file(s) using a service provider that does not charge per data transferred, you will eventually get kicked off from that service provider.
Obviously, I won't use a host with "unlimited" bandwidth. That's never real. And I've seen shared hosts that don't allow hosting large files. But for a VPS, I pay for the bandwidth limit, and it's up to the provider to ensure it's profitable for them.

Quote
Another point is that many people look at bitcoin-related data today. The fact that someone is looking at blockchain data is not the privacy leak that it might have been 10 years ago.
I've never used a creditcard for anything crypto-related, and I want to keep it that way.
I found another reason to pay upfront in crypto instead of using my creditcard: The tale of the July 4th surprise $2700 AWS bill. (https://chrisshort.net/the-aws-bill-heard-around-the-world/)

Quote
There is a reason why blockchair throttles downloads, and why they charge as much as they do for an API key.
They're also in the money making business, and their paying customers pay for the bandwidth used by, well, people like me :P


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: NotATether on June 05, 2021, 02:45:47 PM
Or you can just ask me nicely and I'll host it on my site's public directory. (https://files.notatether.com/public/loycev/addresses_sorted.txt.gz)
What bandwidth limitations do you have? :D I don't just want to make problem your problem.

My provider gives me a cool 100TB monthly bandwidth cap so I will be fine  :) these boxes are designed for highly intensive torrenting.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on June 05, 2021, 06:29:35 PM
My provider gives me a cool 100TB monthly bandwidth cap so I will be fine  :)
Interesting :D
Can it also handle the occasional update? I have "2 methods" now: one that takes a long time but works, and one that's much faster but gives different results once in a while. And because testing takes so long, I haven't found the problem yet.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: NotATether on June 06, 2021, 06:06:14 AM
Can it also handle the occasional update? I have "2 methods" now: one that takes a long time but works, and one that's much faster but gives different results once in a while. And because testing takes so long, I haven't found the problem yet.

Depends on how much time is "long time".

Leaving it single threaded (as most shell commands already are) will probably be alright as long as it doesn't take more than a few hours.

But if it has obscene memory requirements then it'll be too much for my box. I only got 9 out of 16GB ram free and I hate force-rebooting the notatether.com webserver  :-\


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: PrimeNumber7 on June 06, 2021, 07:14:23 AM
Can it also handle the occasional update? I have "2 methods" now: one that takes a long time but works, and one that's much faster but gives different results once in a while. And because testing takes so long, I haven't found the problem yet.

Depends on how much time is "long time".

Leaving it single threaded (as most shell commands already are) will probably be alright as long as it doesn't take more than a few hours.

But if it has obscene memory requirements then it'll be too much for my box. I only got 9 out of 16GB ram free and I hate force-rebooting the notatether.com webserver  :-\
Memory/thread constraints should not be an issue. Executing a script remotely on a server optimized for script requirements is trivial, and uploading an output file a single time to your server should not be an issue for a ~20 GB file.



Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on June 06, 2021, 02:29:51 PM
Leaving it single threaded (as most shell commands already are) will probably be alright as long as it doesn't take more than a few hours.

But if it has obscene memory requirements then it'll be too much for my box. I only got 9 out of 16GB ram free and I hate force-rebooting the notatether.com webserver  :-\
It uses a bunch of pipes, so the load it causes is more like 2-3 than 1. Memory requirements can be low (then sort uses more tmp files instead). It should be done within a few hours, and doesn't need frequent updates (once every 2 weeks will be fine).


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: NotATether on June 06, 2021, 06:51:54 PM
Memory requirements can be low (then sort uses more tmp files instead).

This is what I'm worried about. I remember you writing somewhere along the lines of the sort process takes an obscene amount of memory on this file, or maybe that was in the addresses-with-a-balance project. That's why I'm trying to figure out how much RAM it uses in the worst case.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on June 06, 2021, 08:01:35 PM
I'm trying to figure out how much RAM it uses in the worst case.
I can limit sort's memory usage. I'm more worried about the grinding this causes on the hard drive. I have no idea how much data gets read and written to sort 30 GB, but I assume every bit gets pickup up at least several times.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: NotATether on June 12, 2021, 07:59:14 AM
I'm trying to figure out how much RAM it uses in the worst case.
I can limit sort's memory usage. I'm more worried about the grinding this causes on the hard drive. I have no idea how much data gets read and written to sort 30 GB, but I assume every bit gets pickup up at least several times.

On second thought, I just had one of my servers' disks fail a couple days ago (all data was lost), so I'm not comfortable running these updating scripts on the rest of my hardware with all that grinding until I can set up a proper backup plan for my TBs of data.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on June 12, 2021, 08:16:31 AM
I'm not comfortable running these updating scripts on the rest of my hardware
No worries, I'll just wait for the right hosting offer again.
This is the reason I don't want to do a lot of testing on my laptop SSD too.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: PrimeNumber7 on June 13, 2021, 02:28:25 AM
I'm trying to figure out how much RAM it uses in the worst case.
I can limit sort's memory usage. I'm more worried about the grinding this causes on the hard drive. I have no idea how much data gets read and written to sort 30 GB, but I assume every bit gets pickup up at least several times.
I'm trying to figure out how much RAM it uses in the worst case.
I can limit sort's memory usage. I'm more worried about the grinding this causes on the hard drive. I have no idea how much data gets read and written to sort 30 GB, but I assume every bit gets pickup up at least several times.

On second thought, I just had one of my servers' disks fail a couple days ago (all data was lost), so I'm not comfortable running these updating scripts on the rest of my hardware with all that grinding until I can set up a proper backup plan for my TBs of data.
Have you considered creating a script to update your existing list once you have a list of addresses? Your script could start at block_n and add an address if it is not already in your list. This should reduce read/write operations pretty significantly.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on June 13, 2021, 08:33:53 AM
Have you considered creating a script to update your existing list once you have a list of addresses? Your script could start at block_n and add an address if it is not already in your list. This should reduce read/write operations pretty significantly.
That's what I tried to do for the addresses in chronological order, but it gave inconsistent results (and due to the time it takes to test it, I'm still not sure what caused it). I could probably do the same for the sorted addresses, by using comm (https://ss64.com/bash/comm.html) instead of sort (https://ss64.com/bash/sort.html). That means only reading the compressed big file from disk, reading and sorting the weekly addition, and writing the new compressed file to disk. This does make more sense and reduces disk writes :)


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: BoraxTributary on August 12, 2021, 10:48:20 PM
For anyone stumbling open this in the future, I have created a torrent for this file, which accesses the web link.

Web Link: https://files.notatether.com/public/loycev/addresses_sorted.txt.gz

Torrent Magnet URL (magnet:?xt=urn:btih:894581b3d8c8867a9d9e9b6c32e20deae1bf4a66&dn=addresses_sorted.txt.gz&tr=http%3a%2f%2fp4p.arenabg.com%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce&tr=udp%3a%2f%2f9.rarbg.com%3a2810%2fannounce&tr=udp%3a%2f%2fexodus.desync.com%3a6969%2fannounce&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a6969%2fannounce&tr=udp%3a%2f%2fretracker.lanta-net.ru%3a2710%2fannounce&tr=udp%3a%2f%2fopen.stealth.si%3a80%2fannounce&tr=http%3a%2f%2fopenbittorrent.com%3a80%2fannounce&tr=udp%3a%2f%2fwww.torrent.eu.org%3a451%2fannounce&tr=udp%3a%2f%2ftracker.torrent.eu.org%3a451%2fannounce&tr=udp%3a%2f%2fopentor.org%3a2710%2fannounce&tr=udp%3a%2f%2ftracker.moeking.me%3a6969%2fannounce&tr=udp%3a%2f%2ftracker.loadbt.com%3a6969%2fannounce&tr=udp%3a%2f%2ftracker.dler.org%3a6969%2fannounce&tr=udp%3a%2f%2fopentracker.i2p.rocks%3a6969%2fannounce&tr=udp%3a%2f%2fipv4.tracker.harry.lu%3a80%2fannounce&tr=udp%3a%2f%2fexplodie.org%3a6969%2fannounce&tr=udp%3a%2f%2fbt2.archive.org%3a6969%2fannounce&tr=udp%3a%2f%2fbt1.archive.org%3a6969%2fannounce&tr=https%3a%2f%2ftrakx.herokuapp.com%3a443%2fannounce&ws=https%3a%2f%2ffiles.notatether.com%2fpublic%2floycev%2faddresses_sorted.txt.gz)

.torrent file, good until 2022-09 (https://tempsend.com/agwje)

This is up to date as of 2021-08-12. It will not be updated in the future, but is being left here to help, at least for now.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: BoraxTributary on August 13, 2021, 12:17:47 AM
Another way to generate the file yourself, from a local copy of the blockchain: https://github.com/graymauser/btcposbal2csv


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on August 13, 2021, 10:01:49 AM
This is up to date as of 2021-08-12.
If it's the same file notatether.com published, it was last updated in May.

I still haven't found a decent affordable replacement host.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: mrxtraf on September 19, 2021, 07:49:56 AM
What the addres with prefix s- and prefix m- and prefix d- ?


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: pooya87 on September 19, 2021, 07:56:40 AM
What the addres with prefix s- and prefix m- ?
They are not addresses, they are output scripts that don't have any corresponding address and blockchair.com explorer uses an undocumented method to convert them into these strange looking strings.


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: mrxtraf on September 19, 2021, 08:00:21 AM
What the addres with prefix s- and prefix m- ?
They are not addresses, they are output scripts that don't have any corresponding address and blockchair.com explorer uses an undocumented method to convert them into these strange looking strings.
I understand that these are scripts. But where to look at the principle of formation. And what is the difference, s, m, d?


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on September 19, 2021, 08:02:35 AM
But where to look at the principle of formation. And what is the difference, s, m, d?
Nobody knows :D But if you enter one of them into Blockchair.com's Search field, it shows the transaction.
Sorry, I forgot it works differently: if you edit them into the URL of a Bitcoin address on Blockchair.com, it shows the transaction.
Example: blockchair.com/bitcoin/address/s-ffd80dee5966fb23c1a483b28f6bfcbc (https://blockchair.com/bitcoin/address/s-ffd80dee5966fb23c1a483b28f6bfcbc).


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: mrxtraf on September 19, 2021, 08:30:42 AM
But where to look at the principle of formation. And what is the difference, s, m, d?
Nobody knows :D But if you enter one of them into Blockchair.com's Search field, it shows the transaction.
I try. But not found  :(


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: LoyceV on September 19, 2021, 08:46:42 AM
But not found  :(
My mistake, sorry. Read my updated post :)


Title: Re: List of all Bitcoin addresses ever used - currently UNavailable on temp location
Post by: pooya87 on September 19, 2021, 10:43:09 AM
And what is the difference, s, m, d?
As I said these are undocumented. Basically this particular explorer decided to use their own convention for naming these types of scripts and never told anyone how they are doing it or what they mean.
If I had to guess, they are probably computing the hash of the script (possibly SHA1 or MD5) then enter that smaller hash in their database to search based on a hash. The starting letter (s, m, ...) may be a quick indicator on where in the database to look them up.


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: LoyceV on December 08, 2021, 12:36:56 PM
Finally, this service is back online! I got myself a new VPS:
The data
See alladdresses.loyce.club (http://alladdresses.loyce.club/?C=M;O=D) (new location)
I'm still working on an update, I expect to upload it tomorrow.

This server allows 8 TB of bandwidth per month (and I'm still hoping to have it doubled because of the Black Friday deal), but doesn't have the disk space to run updates. I'm not sure yet how often I'll update it (after tomorrow's update).
For now: enjoy!



An update on my statistics from August last year (https://bitcointalk.org/index.php?topic=5265993.msg54912292#msg54912292):

Some interesting (?) statistics (updated until blockchair_bitcoin_outputs_20211202 (https://gz.blockchair.com/bitcoin/outputs/blockchair_bitcoin_outputs_20211202.tsv.gz))
Total address count: 1,967,537,866
1... address count: 1,227,646,688
3... address count: 543,236,056
bc1q... address count: 145,894,845
...-... (with a "dash") address count: 50,758,929

Unique address count: 927,366,160
1... address count: 547,749,478
3... address count: 272,205,566
bc1q... address count: 89,702,297
...-... (with a "dash") weird address count: 17,707,628


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: icopress on December 10, 2021, 07:31:20 PM
LoyceV, can you tell me where I can find a list of all companies represented on the coinmarketcap with verified accounts in third-party networks? For example, Twitter filter or Etherscan filter, etc. I apologize in advance for moving away from the topic of discussion, but I need advice ... and I figured that since you are a data guru, you can tell me where to look for what I need, (I would appreciate any feedback).  :P


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: LoyceV on December 10, 2021, 08:04:54 PM
LoyceV, can you tell me where I can find a list of all companies represented on the coinmarketcap with verified accounts in third-party networks?
No.

Quote
For example, Twitter filter or Etherscan filter, etc.
I'm so glad I barely know what "Twitter" or "Etherscan" means :D I stay away from shitcoins and spammy websites.

Quote
I apologize in advance for moving away from the topic of discussion, but I need advice ... and I figured that since you are a data guru, you can tell me where to look for what I need, (I would appreciate any feedback).  :P
Sorry, I have no useful feedback on this :( It seems more like a manual thing to collect custom data on custom networks.


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: MrFreeDragon on December 11, 2021, 06:23:26 PM
LoyceV, can you tell me where I can find a list of all companies represented on the coinmarketcap with verified accounts in third-party networks? For example, Twitter filter or Etherscan filter, etc. I apologize in advance for moving away from the topic of discussion, but I need advice ... and I figured that since you are a data guru, you can tell me where to look for what I need, (I would appreciate any feedback).  :P

Please take this list of approx. 7k coins represented on coinmarketcap with telegram groups and discord channels:

https://anonfiles.com/18Mcs300v5/coinmarketcap_chats_csv (https://anonfiles.com/18Mcs300v5/coinmarketcap_chats_csv)


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: cixegz on January 01, 2022, 05:19:27 PM
@LoyceV
i am download some tsv file those site
https://gz.blockchair.com/bitcoin/inputs/
http://addresses.loyce.club/
how to extract .tsv file  in terminal explain and best tool viewer suggest
thanks,

read this topic any update https://bitcointalk.org/index.php?topic=5379443.0
solve this i pay for 2022$


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: LoyceV on January 01, 2022, 05:36:09 PM
@LoyceV
i am download some tsv file those site
https://gz.blockchair.com/bitcoin/inputs/
That takes a while. I have the data, but still don't have a server big enough to offer faster downloads.

Quote
http://addresses.loyce.club/
how to extract .tsv file  in terminal explain and best tool viewer suggest
thanks,
After extracting the .gz file you don't need to extract the .tsv anymore: it's just plain text. I wouldn't suggest any "viewer", it's not really meant for humans to read. I use command line tools on Linux, a simple "head" or "tail" is all the viewing I need.

Quote
read this topic any update https://bitcointalk.org/index.php?topic=5379443.0
solve this i pay for 2022$
Sorry, that's out of my league.


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: mynonce on January 01, 2022, 08:11:16 PM
Quote
read this topic any update https://bitcointalk.org/index.php?topic=5379443.0
solve this i pay for 2022$
Sorry, that's out of my league.

...
iknow random private key: how to find y is small range or big range   atleast guess :-\
...
If you have a private key, you can't know whether your y coordinate is in small range or big range  before calculating it.
There is no relationship between your public key y value and your private key. Else you would be able to calculate the private key with the public key.
Or we could say:
... It is impossible ... because then ECDSA would be broken.

As of today 01/01/2022  :)


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: LoyceV on January 16, 2022, 02:02:41 PM
Finally, this service is back online! I got myself a new VPS:
The data
See alladdresses.loyce.club (http://alladdresses.loyce.club/?C=M;O=D) (new location)
I'm currently experimenting with updates. The VPS I'm using doesn't have enough space, so I use an external ("pay by the hour") host for updates. If all works out, I'll be able to provide monthly updates again with only a few hours of additional server time, and a few minutes of my own time.

The VPS got upgraded to 16 TB bandwidth per month, and downloads reach almost 1 Gbit/s.


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: nullama on January 16, 2022, 11:55:53 PM
This is great, thanks!

I have a question. Looking at the website, it shows this:

Quote
all_Bitcoin_addresses_ever_used_in_order_of_first_appearance.txt.gz   2021-12-08 20:07    23G   
all_Bitcoin_addresses_ever_used_sorted.txt.gz   2021-12-08 15:25    20G

Why is there a 3GB difference in what seems to be just an order difference? Is it because the compression of sorted addresses is able to compress the data more, or am I missing something and these files have different data?


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: LoyceV on January 17, 2022, 06:10:42 AM
Why is there a 3GB difference in what seems to be just an order difference? Is it because the compression of sorted addresses is able to compress the data more
Correct. Ordered data can be compressed more than random data.



Update complete!
See:
http://alladdresses.loyce.club/all_Bitcoin_addresses_ever_used_in_order_of_first_appearance.txt.gz (23 GB)
http://alladdresses.loyce.club/all_Bitcoin_addresses_ever_used_sorted.txt.gz (21 GB)
Updated up until yesterday's data dump.


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: LoyceV on February 12, 2022, 05:20:21 PM
I'm currently experimenting with updates. The VPS I'm using doesn't have enough space, so I use an external ("pay by the hour") host for updates. If all works out, I'll be able to provide monthly updates again with only a few hours of additional server time, and a few minutes of my own time.
I tried another update, this time with the "Premium KVM" (with NVMe) instead of "VDS" (with SSD). Unfortunately this was slower, but the update is complete again.


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: LoyceV on March 19, 2022, 10:25:14 AM
It's been more than a month, I'm running another update. I love 99 MB/s download speeds. Never mind, it dropped.
It should be done in a few hours.

Update: done!
See:
http://alladdresses.loyce.club/all_Bitcoin_addresses_ever_used_in_order_of_first_appearance.txt.gz
http://alladdresses.loyce.club/all_Bitcoin_addresses_ever_used_sorted.txt.gz


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: LoyceV on July 30, 2022, 06:25:33 AM
Bump: I've updated this data yesterday again. This VPS has 3.9 GB disk space left, so at some point it will run out. If that happens, I hope to move it to one that's big enough to handle updating it too (so I can do more frequent updates).


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: LoyceV on September 10, 2022, 02:26:10 PM
I need help, I may be going crazy!

I'm moving this project to a different server (a kindly donated dedicated Xeon server). I've transfered the data, updated the DNS, and updated Apache2's configuration for http://alladdresses.loyce.club/

This is what it shows:
https://loyce.club/other/apache2.png
However, as you can see in the console, I've updated the 2 large files: they now have a newer date. I've also added a "test" file and a "test2" directory. Both don't show up noline, but I can navigate into that directory so it exists.
It looks as if I'm viewing a cached version, except for that I'm using a private browser window. I tried Tor too, all with the same result.

What sorcery is this?


Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: seoincorporation on September 10, 2022, 03:14:09 PM
I need help, I may be going crazy!

I'm moving this project to a different server (a kindly donated dedicated Xeon server). I've transfered the data, updated the DNS, and updated Apache2's configuration for http://alladdresses.loyce.club/

This is what it shows:
https://loyce.club/other/apache2.png
However, as you can see in the console, I've updated the 2 large files: they now have a newer date. I've also added a "test" file and a "test2" directory. Both don't show up noline, but I can navigate into that directory so it exists.
It looks as if I'm viewing a cached version, except for that I'm using a private browser window. I tried Tor too, all with the same result.

What sorcery is this?

Your server must have and index.html file, but your server doesn't have it under the alladdresses subdomain, that's why it opens like a folder.

If we take a look to http://alladdresses.loyce.club/indextest.html the site looks fine. So, just rename that file to index.html and it should work fine.

---------------------

Trying to understand the problem.

If we open the new folder test2 it shows an empty folder, maybe your htaccess file is configured to hide empty folders, not sure. but the folder is there.



Title: Re: List of all Bitcoin addresses ever used - NOW BACK online!
Post by: LoyceV on September 10, 2022, 04:11:16 PM
Your server must have and index.html file, but your server doesn't have it under the alladdresses subdomain, that's why it opens like a folder.
That's intentional: I want to see the files (but some files aren't showing correctly).

Quote
If we open the new folder test2 it shows an empty folder, maybe your htaccess file is configured to hide empty folders
There is no .htaccess in use.

I use the same settings for different subdomains, but they don't have this problem.


Title: Re: {help needed, see last post} List of all Bitcoin addresses ever used
Post by: seoincorporation on September 10, 2022, 04:59:44 PM
Ok, then lets make some tests.

Could you add a text file to that folder, lets see i that way it get listed.

And another reason for a folder to not show up is the permissions, but your permissions are fine as we can see in the image, so, we can discard this one.

Quote
The default Ubuntu document root is /var/www/html. You can make your own virtual hosts under /var/www.

Since you are working on a virtual host, the way to fix this is with the configuration:

Quote
/etc/apache2/apache2.conf

I just make a test with my local server:

Code:
root@root:/var/www/html# ls
index.html  info.php
root@root:/var/www/html# cat index.html > index2.html
root@root:/var/www/html# rm index.html
root@root:/var/www/html# mkdir test2


And it show the empty folder, so, the problem should be your apache2 config.


Title: Re: {help needed, see last post} List of all Bitcoin addresses ever used
Post by: LoyceV on September 10, 2022, 05:08:39 PM
Could you add a text file to that folder, lets see i that way it get listed.
See http://alladdresses.loyce.club/test2/hi.txt
The file is visible on http://alladdresses.loyce.club/test2/
But the directory is not visible on http://alladdresses.loyce.club/

Quote
Since you are working on a virtual host, the way to fix this is with the configuration:

Quote
/etc/apache2/apache2.conf
I think that can't be the problem, because it works for other virtual hosts. I copied the files:
Code:
cp -a test* ../../blockdata.loyce.club/public_html/
They show up on http://blockdata.loyce.club/ as expected.

Quote
I just make a test with my local server:
That's how it should be indeed, hence the "I'm going crazy" O0 It looks as if http://alladdresses.loyce.club/ still runs on my old server, but that can't be (otherwise hi.txt wouldn't be visible).


Title: Re: {help needed, see last post} List of all Bitcoin addresses ever used
Post by: seoincorporation on September 10, 2022, 05:25:34 PM
...
That's how it should be indeed, hence the "I'm going crazy" O0 It looks as if http://alladdresses.loyce.club/ still runs on my old server, but that can't be (otherwise hi.txt wouldn't be visible).

And now we are 2 going crazy, lol. As you mention, this is working in the new server, but the fact that isn't indexing the new folder doesn't have sense.

Did you try restarting the apache service?

Code:
sudo service apache2 restart


Title: Re: {help needed, see last post} List of all Bitcoin addresses ever used
Post by: LoyceV on September 10, 2022, 05:30:30 PM
And now we are 2 going crazy, lol.
Lol.

Quote
Did you try restarting the apache service?
I did:
Code:
systemctl status apache2

Quote
Code:
sudo service apache2 restart
I tried this one too, just in case. No improvement.


Title: Re: {help needed, see last post} List of all Bitcoin addresses ever used
Post by: seoincorporation on September 10, 2022, 05:36:54 PM
Ok, lets try with this one. It worked in my local server:

Code:
chmod 0755 test2

I hope this one do the magic.


Title: Re: {help needed, see last post} List of all Bitcoin addresses ever used
Post by: LoyceV on September 10, 2022, 05:52:45 PM
Code:
chmod 0755 test2
That only reduced permissions (and didn't work).


Title: Re: {help needed, see last post} List of all Bitcoin addresses ever used
Post by: TryNinja on September 10, 2022, 06:17:58 PM
I'm far from being an expert with Apache, but maybe this is related?

https://cwiki.apache.org/confluence/display/httpd/DirectoryListings

More specificially, "Directory Listings" and "Some files aren't listed".


Title: Re: {help needed, see last post} List of all Bitcoin addresses ever used
Post by: seoincorporation on September 10, 2022, 07:11:05 PM
Ok, looks like i found something mate:

Code:
http://blockdata.loyce.club/
It show the test2 folder with the same modification time.

https://i.imgur.com/5SRqh06.png

I have no idea why, but i know this is a nice hint for the solution ;)

You are creating the folder under blockdata and some way it's linked to alladdresses.

-----------------------


Another solution is to build a site for that subdomain. That way you can manipulate all the links without problems.


Title: Re: {help needed, see last post} List of all Bitcoin addresses ever used
Post by: LoyceV on September 11, 2022, 08:00:03 AM
I'm far from being an expert with Apache, but maybe this is related?

https://cwiki.apache.org/confluence/display/httpd/DirectoryListings

More specificially, "Directory Listings" and "Some files aren't listed".
None of the files are forbidden, they're just regular files, like the other files.

It's even weirder that the date of the 2 large files is incorrect. That's the date it had before I updated the file, the file as it shows it doesn't even exist anymore. If you download it, you get the correct updated file.

https://i.imgur.com/5SRqh06.png
You are creating the folder under blockdata and some way it's linked to alladdresses.
I copied it there to show it works fine on another subdomain:
I think that can't be the problem, because it works for other virtual hosts. I copied the files:
Code:
cp -a test* ../../blockdata.loyce.club/public_html/
They show up on http://blockdata.loyce.club/ as expected.
I've deleted them from "blockdata" again.

Another solution is to build a site for that subdomain. That way you can manipulate all the links without problems.
FML! I feel sooooooooooooooooooooooo stupid now O0
I used wget to transfer files from the old server to the new server, and as you can see in my terminal screenshot (https://bitcointalk.org/index.php?topic=5265993.msg60915273#msg60915273), that created an index.html file. So instead of getting a fresh listing from Apache, it shows the old one. That explains everything, no scorcery involved, just my own stupidity.



Updates work again!
I've moved this data to a (kindly donated) dedicated server. The large files are updated each Tuesday. Enjoy!


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: GR Sasa on March 01, 2023, 12:26:56 PM
Hello LoyceV,

How/why could this be useful for us bitcoin users?

I mean yes we got a overview about all used addresses, but for me i am not sure how this could be helpful for us.

Thanks


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: DaveF on March 01, 2023, 12:34:11 PM
Hello LoyceV,

How/why could this be useful for us bitcoin users?

I mean yes we got a overview about all used addresses, but for me i am not sure how this could be helpful for us.

Thanks

For some people it's interesting statistical information. You can see how many new addresses were added / used in a particular timeframe.
Depending on what you need it for there are other ways of getting it but he is doing the back end work for you.
For you to use BTC day to day it's not important, but if you are doing blockchain research it's a handy thing to have.

-Dave


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: LoyceV on March 01, 2023, 08:01:34 PM
How/why could this be useful for us bitcoin users?
I'm not sure :-\ If you don't know what to do with this, I guess it's not for you. But some people download it, and it's actually consuming quite a bit of bandwidth so there must be something.

You can see how many new addresses were added / used in a particular timeframe.
There's no time stamp information included, so you'll need to get Bitcoin block data (728 GB): inputs, outputs and transactions (https://bitcointalk.org/index.php?topic=5307550.0) for the full data.

Quote
Depending on what you need it for there are other ways of getting it but he is doing the back end work for you.
What can I say: I like data :D


Title: Ltc
Post by: NecroMortis on September 30, 2023, 10:57:28 AM
Parse litecoin all addresses usage possible?


Title: Re: Ltc
Post by: LoyceV on September 30, 2023, 11:00:56 AM
Parse litecoin all addresses usage possible?
I can do it: make me an offer :)


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: philipma1957 on October 10, 2023, 03:24:01 AM
Hey how much ram do you need to run this with your server?

128gb  or 256gb.

I have a threadripper  doing nothing it has  128gb ram would it be better than the server you are using.


I could donate it to you.


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: LoyceV on October 10, 2023, 07:34:21 AM
Hey how much ram do you need to run this with your server?
I recently tested it: the more cores I use for sorting, the more RAM it takes. I tested the difference, and reduced the sorting to only 2 cores. It writes a lot of temp files (on HDD) with 32 GB RAM. I guess I could use 128 or even 256 to speed this up, but I update the large file once a week, so it doesn't matter that it takes a few hours.

Quote
I have a threadripper  doing nothing it has  128gb ram would it be better than the server you are using.

I could donate it to you.
Thanks for the offer, but I'm good for now. This server takes about 12 TB of bandwidth per month.

If you want to find a good use for your server: how about using it to run your own block explorer (mempool.space clone) or Electrum server?


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: DecipherBTC on April 21, 2024, 01:25:05 PM
Is this thread still beeing updated?

Thanks.


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: LoyceV on April 22, 2024, 05:45:51 AM
Is this thread still beeing updated?
Yes. Click (http://addresses.loyce.club/) and see :)


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: BoraxTributary on April 28, 2024, 06:00:49 PM
Have you considered posting torrents for all these links? Would make downloding faster, and would help ease the strain on your server.


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: LoyceV on April 29, 2024, 07:02:31 AM
Have you considered posting torrents for all these links? Would make downloding faster, and would help ease the strain on your server.
This question pops up once in a while (including your own post about it back in 2021). Short answer: there's no need, and frequent updates are annoying with torrents. Long answer: click All (https://bitcointalk.org/index.php?topic=5265993.0;all), CTRL-F "torrent".


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: RecRanger on May 05, 2024, 10:30:47 PM
I made a Python library and CLI tool to check this list for addresses in bulk and in other scripts. Check it out on my GitHub: https://github.com/RecRanger/used-addr-check


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: LoyceV on May 10, 2024, 07:03:51 AM
Is this thread still beeing updated?
It looks like the daily updates (http://alladdresses.loyce.club/daily_updates/) didn't happen, which means the weekly update had no fresh data. I'm not sure why, I'll check it tomorrow.

Update: it's fixed. It was waiting for another stuck update (https://bitcointalk.org/index.php?topic=5265993.msg64025835#msg64025835). The weekly update is running now, this takes a few hours to complete. Thanks for reporting this.


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: RecRanger on May 13, 2024, 12:29:09 AM
Would you consider open-sourcing the code that generates this list? It seems like it would be helpful to the community to have available.


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: LoyceV on May 13, 2024, 05:46:27 AM
Would you consider open-sourcing the code that generates this list?
I don't really see the point. There's not much to it, but you'll need to download a lot of data to use it.
I get the data from Blockchair (https://gz.blockchair.com/bitcoin/outputs/), extract the files, get the addresses, and sort them. That's it.


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: RecRanger on May 16, 2024, 01:41:02 AM
The download is going at 100 KB/s for me right now. This would be solvable with a better distribution system (e.g., torrents).

If you use torrents to distribute the file, you'd be able to publish an RSS feed which automatically lets people download the latest version if they want to help seed.


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: LoyceV on May 16, 2024, 05:40:11 AM
The download is going at 100 KB/s for me right now.
That sounds like you're downloading from Blockchair, they have speed restrictions. Try my mirror (https://bitcointalk.org/index.php?topic=5307550.0).

Quote
This would be solvable with a better distribution system (e.g., torrents).
As long as my (donated) server is only using 20% of it's allowed bandwidth, I won't bother.


Title: Re: List of all Bitcoin addresses ever used - weekly updates work again!
Post by: LoyceV on September 22, 2024, 05:49:31 PM
4 month bump :)