Bitcoin Forum
October 04, 2024, 06:09:54 PM *
News: Latest Bitcoin Core release: 27.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 4 5 6 7 8 »  All
  Print  
Author Topic: List of all Bitcoin addresses ever used - weekly updates work again!  (Read 4036 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic. (3 posts by 1+ user deleted.)
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3458
Merit: 17485


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 01, 2020, 09:05:46 AM
Last edit: April 29, 2024, 06:54:09 AM by LoyceV
Merited by pooya87 (8), Welsh (8), bitmover (4), hosseinimr93 (2), marlboroza (2), BTCW (2), vapourminer (1), seoincorporation (1), ABCbits (1), NotATether (1), friends1980 (1), MrFreeDragon (1), naufragus (1)
 #1

Background
To follow up on List of all Bitcoin addresses with a balance and this post, I made a list of all Bitcoin addresses that have ever been used.

The data
See alladdresses.loyce.club (new location)
I now have the resources (RAM, CPU power and disk space) and code to show unique addresses in their original order. Each address is only shown once. I have 2 large files:

1. All Bitcoin addresses ever used, in chronological order, without duplicates.
Sample: all_Bitcoin_addresses_ever_used_in_order_of_first_appearance.txt.gz: (Warning: 33 GB):
Code:
1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa
12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
1HLoD9E4SDFFPDiYfNYnkBLQ85Y51J3Zb1
.......
3GFfFQAFgXKiA1qqUK6rqBpEpG4vZDos6t
3Mbtv47gZ2eN6Fy7owpgHHwSLYHS42P56P
38JyF2RQknBUMETyRT2yGndDJFYSp6hJNg

2. All Bitcoin addresses ever used, sorted by address, without duplicates.
Sample: all_Bitcoin_addresses_ever_used_sorted.txt.gz: (Warning: 29 GB):
Code:
1111111111111111111114oLvT2
111111111111111111112BEH2ro
111111111111111111112xT3273
.......
s-ffd80dee5966fb23c1a483b28f6bfcbc
s-fff5d0faa9628c188e97661f0e185fce
s-ffff291613d413b4ac128df96a462294

Updates
Updates happen on Tuesday!
Sorting a list that doesn't fit in the server's RAM is slow. Therefore I only do weekly updates (for now). Check the file date here to see how old it is. If an update fails, please post here.
In between updates, I create daily updates: alladdresses.loyce.club/daily_updates/. These txt-files contain unique addresses (for that day) in order of appearance.
I won't keep older snapshots.

Bandwidth
This server should have enough bandwidth to support all my blockchain data projects. If things get crazy, I may have to resort to using torrents.

Credits
Blockchair Database Dumps has a staggering amount of data, easily accessible (at 10 kB/s (or recently 100 kB/s)) with daily updates. All data presented in this topic comes from Blockchair.

No spam please.
Self-moderated against spam. Discussion and questions are welcome.

Q&A
Can you please clarify, what is the type of these d- and s- addresses?
This is how Blockchair.com shows OP_RETURN. From the main page the search field doesn't show them, but you can replace a Bitcoin address in the URL to find them: https://blockchair.com/bitcoin/address/d-d0d953f2e7043342540a1407243e49fe.

Tips and tricks
Some suggestions for Linux/VPS users:
Code:
wget http://alladdresses.loyce.club/addresses_sorted.txt.gz -O - | gunzip > addresses_sorted.txt
This doesn't save the .gz but extracts it while downloading.

Code:
comm -12 <(sort list.txt) addresses_sorted.txt
This outputs all Bitcoin addresses from "list.txt" that have ever been funded.

Code:
comm -12 <(sort list.txt) addresses_sorted.txt > output.txt
This does the same, but writes to output.txt instead of console.
This search is fast, even with millions of addresses in list.txt, it's mainly limited by how fast your computer can read from disk.



Related topics
Bitcoin block data available in CSV format
List of all Bitcoin addresses with a balance
List of all Bitcoin addresses ever used
[~500 GB] Bitcoin block data: inputs, outputs and transactions
[800 GB] Ethereum data

LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3458
Merit: 17485


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 01, 2020, 09:06:06 AM
Merited by pooya87 (1), ABCbits (1)
 #2

Some interesting (?) statistics (updated until blockchair_bitcoin_outputs_20200719.tsv.gz)
Total address count: 1,484,589,749
1... address count: 1,039,899,708
3... address count: 343,485,961
bc1q... address count: 55,006,904
...-... (with a "dash") address count: 46,197,161

Unique address count: 693,180,830
1... address count: 470,943,308
3... address count: 167,941,821
bc1q... address count: 39,137,878
...-... (with a "dash") weird address count: 15,157,808

Addresses with most receiving transactions
This is the Top 100, the number in front of the address shows how many transactions it has received:
Code:
4467608 1HckjUpRGcrrRAtFaaCAUaGjsPx9oYmLaZ
1900428 1NxaBCFQwejSZbQfWcYNwgqML5wWoE3rK4
1601193 1dice8EMZmqKvrGE4Qc9bUFf9PX3xaYDp
1527471 1FoWyxwPXuj4C6abqwhjDWdz6D4PZgYRjA
1204787 1LuckyR1fFHEsXYyx5QK4UFzv3PEAepPMK
1105406 1dice97ECuByXAvqXpaYzSaQuPVvrtmz6
1021575 3CD1QW6fjgTwKq3Pj97nty28WZAVkziNom
1009836 1G47mSr3oANXMafVrR8UC4pzV7FEAzo3r9
 929737 3JXRVxhrk2o9f4w3cQchBLwUeegJBj6BEp
 872274 1J37CY8hcdUXQ1KfBhMCsUVafa8XjDsdCn
 859422 3422VtS7UtCvXYxoXMVp6eZupR252z85oC
 841967 168o1kqNquEJeR9vosUB5fw4eAwcVAgh8P
 832807 1P9RQEr2XeE3PEb44ZE35sfZRRW1JHU8qx
 782811 1VayNert3x1KzbpzMGt2qdqrAThiRovi8
 689574 37Tm3Qz8Zw2VJrheUUhArDAoq58S6YrS3g
 676674 1DUb2YYbQA1jjaNYzVXLZ7ZioEhLXtbUru
 663458 bc1qwqdg6squsna38e46795at95yu9atm8azzmyvckulcc7kytlcckxswvvzej
 631610 17kb7c9ndg7ioSuzMWEHWECdEVUegNkcGc
 595853 1dice9wcMu5hLF4g81u8nioL5mmSHTApw
 580565 1Po1oWkD2LmodfkBYiAktwh76vkF93LKnh
 573787 1LAnF8h3qMGx3TSwNUHVneBZUEpwE4gu3D
 520889 1NDyJtNTjmwk5xPNhjgAMu4HDHigtobu1s
 505956 13vHWR3iLsHeYwT42RnuKYNBoVPrKKZgRv
 448252 1Fi9J5TeaWPHdU5cTJ4e9jr3V58SrWtUuT
 437634 1dice7fUkz5h4z2wPc1wLMPWgB5mDwKDx
 406471 1MPxhNkSzeTNTHSZAibMaS8HS1esmUL1ne
 395663 1dice7W2AicHosf5EL3GFDUVga7TgtPFn
 394249 1LuckyY9fRzcJre7aou7ZhWVXktxjjBb9S
 389038 1D5bPm1YAdn9WvAAixht7PbACU3TtkqtJJ
 376310 17A16QmavnUfCW11DAApiJxp7ARnxN5pGX
 364311 3HNSiAq7wFDaPsYDcUxNSRMD78qVcYKicw
 363898 3MfN5to5K5be2RupWE8rjJHQ6V9L8ypWeh
 357641 3HRZjedwF2AJejNTtgznWnas4E6froNP5r
 354691 1LuckyG4tMMZf64j6ea7JhCz7sDpk6vdcS
 346986 366Dgw4pi3rnvu5zizVWZF6nijWxZWc6RA
 341430 1dice6YgEVBf88erBFra9BHf6ZMoyvG88
 326839 d-d0d953f2e7043342540a1407243e49fe
 325099 38jMiiZs2C5n5MPkyc5pSA7wwW6H4p6hPa
 293567 38ENmTr2AD1avJrmmi9iM7PfS6nZVmuMKf
 289070 d-0e9deef32abfc454392d21725f9defef
 285507 1N52wHoVR79PMDishab2XmRHsbekCdGquK
 282321 3PUuiYu5cFMsagkffArrKZzQFtWdHttU3x
 280691 367f4YWz1VCFaqBqwbTrzwi2b1h2U3w1AF
 280107 1FoxBitjXcBeZUS4eDzPZ7b124q3N7QJK7
 262539 d-73fd8c31c9fc1d084f44b301bb7adb6a
 262317 1Fi57hAqyYYwaQVdA7a9qSKfiukBbt31G3
 253795 1K2SXgApmo9uZoyahvsbSanpVWbzZWVVMF
 252344 1dice5wwEZT2u6ESAdUGG6MHgCpbQqZiy
 251282 3JnFBLxDCutY3bZEZsPTkHAaUA1bxmEMX2
 250862 1diceDCd27Cc22HV3qPNZKwGnZ8QwhLTc
 247797 352zT3Ts9piSDhZpBsDoZMvdtDmJioQNBo
 246472 12JYmnfYU2ghzjwUAspzJsSnmJtK9bZPYR
 243955 1x6YnuBVeeE65dQRZztRWgUPwyBjHCA5g
 240428 3A4U175prUGEn3B1gUDkz32u8fnF9Nx3Ly
 232303 357d4rAjQhDPaWhZrBAFY7aizVPkNSq2DH
 230290 18rdKmjrg1EawxgiVT3ikLExj6GWS2MNCk
 229128 3JjPf13Rd8g6WAyvg8yiPnrsdjJt1NP4FC
 226837 1HWqsgnSd12Gv8SpoUMi1Cj8hp79BTSpW7
 226259 1changemCPo732F6oYUyhbyGtFcNVjprq
 224451 138o15eFWEEPv2ayKW2CZCgVvv5ZaZvomP
 224217 d-752ed0099932a96fbc0a854a4d3a300f
 219697 bc1qnsupj8eqya02nm8v6tmk93zslu2e2z8chlmcej
 219174 s-e3b0c44298fc1c149afbf4c8996fb924
 215870 1Kr6QSydW9bFQG1mXiPNNu6WpJGmUa9i1g
 215691 37p9pUugydmoLpQyFLLqGAgjWmUFERa1Pq
 215520 19iVyH1qUxgywY8LJSbpV4VavjZmyuEyxV
 212059 1dice7EYzJag7SxkdKXLr8Jn14WUb3Cf1
 209001 1F89hmmrtonJfAQNAqDmeDadcw7AsZcvXG
 207701 1NDpZ2wyFekVezssSXv2tmQgmxcoHMUJ7u
 207697 1Bd5wrFxHYRkk4UCFttcPNMYzqJnQKfXUE
 207524 15fXdTyFL1p53qQ8NkrjBqPUbPWvWmZ3G9
 207499 14719bzrTyMvEPcr7ouv9R8utncL9fKJyf
 207424 18uvwkMJsg9cxFEd1QDFgQpoeXWmmSnqSs
 207385 1J4yuJFqozxLWTvnExR4Xxe9W4B89kaukY
 207376 1Bqm5MDo82m1FTxV3qYNUUEKnESPRhk9jd
 207256 1HVpyjYEPwQhvRQ3dL8tGe9kiydti616sX
 207228 17NKcZNXqAbxWsTwB1UJHjc9mQG3yjGALA
 207218 1HjDauL2kth6KJUz5vX198Nvp1xN1hgYRb
 207187 13h1DP2Boo9TAsenphroACxhNy7pGxDYXd
 207138 1MSzmVTBaaSpKDARK3VGvP8v7aCtwZ9zbw
 207053 1GoK6fv4tZKXFiWL9NuHiwcwsi8JAFiwGK
 207006 13HFqPr9Ceh2aBvcjxNdUycHuFG7PReGH4
 206834 1L4EThM6x3Rd2PjNbs1U136FpMq4Gmo3fJ
 206826 14ChPPM8rPYJeHnw6kMVUDnNNKx1KnjYW4
 206808 1AdN2my8NxvGcisPGYeQTAKdWJuUzNkQxG
 206760 1DpsR91YmHUDTtiuH1pPCuG3RqAkmg6YKB
 206707 1PeohaRGaTF8cSzDqP1yYfzDah66xiriEQ
 206664 1JmcV7G3r8k7ev2EkS84MmsvxGyhiRGP84
 206572 1HZHBnH2FbHNWieMxAh4xBPfgfuxW15UPt
 206469 18czPiA9PcCs7rFTBZnhvNAWuh1pEZRpGJ
 206346 12Cf6nCcRtKERh9cQm3Z29c9MWvQuFSxvT
 206344 1MPerpQzTABa1K2eXQxsQTDSZtDQHWf6vk
 206247 1dice1e6pdhLzzWQq7yMidf6j8eAg7pkY
 206243 18XSLnBZ8ydMUkaifU6sQBMJzmm7JvDeUp
 205690 bc1quq29mutxkgxmjfdr7ayj3zd9ad0ld5mrhh89l2
 203334 3QQB6AWxaga6wTs6Xwq8FYppgrGinGu15f
 201993 3M92sq9ssFaNbEwF47uteVKJsbw125juS7
 199135 1AScRhqdXMrJyxNmjEapMZi1PLFsqmLquG
 196271 18p9Ftp3m4435tdpZTvoBsm3yjUgkvTF2b
 193271 33fDiKKhr2F2uRv2jJzdKT3ECuK3wzCq5d

MrFreeDragon
Sr. Member
****
Offline Offline

Activity: 443
Merit: 350


View Profile
August 17, 2020, 05:56:46 PM
 #3

Very interesting statistics, thank you!

-snip-
Addresses with most receiving transactions
This is the Top 100, the number in front of the address shows how many transactions it has received:
-snip-
 326839 d-d0d953f2e7043342540a1407243e49fe
...
 289070 d-0e9deef32abfc454392d21725f9defef
...
 262539 d-73fd8c31c9fc1d084f44b301bb7adb6a
...
 224217 d-752ed0099932a96fbc0a854a4d3a300f
...
 219174 s-e3b0c44298fc1c149afbf4c8996fb924
-snip-

Can you please clarify, what is the type of these d- and s- addresses?

Casdinyard
Hero Member
*****
Offline Offline

Activity: 2170
Merit: 891


Leading Crypto Sports Betting and Casino Platform


View Profile
August 18, 2020, 09:12:54 AM
 #4

~

Can you also scrape all the Bitcoin Address used here in forum and the user that uses it? Yes, some users would have used the same wallet as they are just alts of someone (with a lot of investigation just to be proven correct). And I think it would help labeling the users and alt accounts throughout the entire forum, and would make it easier to detect which accounts are linked to each other and which are disobeying campaign rules and even forum rule (enrolling many accounts in a single bounty or sig campaign)

..Stake.com..   ▄████████████████████████████████████▄
   ██ ▄▄▄▄▄▄▄▄▄▄            ▄▄▄▄▄▄▄▄▄▄ ██  ▄████▄
   ██ ▀▀▀▀▀▀▀▀▀▀ ██████████ ▀▀▀▀▀▀▀▀▀▀ ██  ██████
   ██ ██████████ ██      ██ ██████████ ██   ▀██▀
   ██ ██      ██ ██████  ██ ██      ██ ██    ██
   ██ ██████  ██ █████  ███ ██████  ██ ████▄ ██
   ██ █████  ███ ████  ████ █████  ███ ████████
   ██ ████  ████ ██████████ ████  ████ ████▀
   ██ ██████████ ▄▄▄▄▄▄▄▄▄▄ ██████████ ██
   ██            ▀▀▀▀▀▀▀▀▀▀            ██ 
   ▀█████████▀ ▄████████████▄ ▀█████████▀
  ▄▄▄▄▄▄▄▄▄▄▄▄███  ██  ██  ███▄▄▄▄▄▄▄▄▄▄▄▄
 ██████████████████████████████████████████
▄▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▄
█  ▄▀▄             █▀▀█▀▄▄
█  █▀█             █  ▐  ▐▌
█       ▄██▄       █  ▌  █
█     ▄██████▄     █  ▌ ▐▌
█    ██████████    █ ▐  █
█   ▐██████████▌   █ ▐ ▐▌
█    ▀▀██████▀▀    █ ▌ █
█     ▄▄▄██▄▄▄     █ ▌▐▌
█                  █▐ █
█                  █▐▐▌
█                  █▐█
▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▀█
▄▄█████████▄▄
▄██▀▀▀▀█████▀▀▀▀██▄
▄█▀       ▐█▌       ▀█▄
██         ▐█▌         ██
████▄     ▄█████▄     ▄████
████████▄███████████▄████████
███▀    █████████████    ▀███
██       ███████████       ██
▀█▄       █████████       ▄█▀
▀█▄    ▄██▀▀▀▀▀▀▀██▄  ▄▄▄█▀
▀███████         ███████▀
▀█████▄       ▄█████▀
▀▀▀███▄▄▄███▀▀▀
..PLAY NOW..
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3458
Merit: 17485


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 18, 2020, 09:43:57 AM
 #5

Can you also scrape all the Bitcoin Address used here in forum and the user that uses it?
I actually can Cheesy I found this regexp on Stackoverflow:
Code:
egrep --regexp="^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$" filename
With some slight changes it stops matching parts of Eth-addresses:
Code:
egrep -w --regexp="[13][a-km-zA-HJ-NP-Z1-9]{25,34}" *

I could run this code on 53 million archived posts, but the main problem will be excluding quotes. That's annoying and slow to do, and if I don't exclude them, it will completely mess up the data. On the other hand, quotes may still contain information that was deleted by the user who posted it.
Even without quotes, users still post Bitcoin addresses that aren't theirs, for instance when providing evidence on a scammer.

Quote
I think it would help labeling the users and alt accounts throughout the entire forum, and would make it easier to detect which accounts are linked to each other
A smart user would simply use different addresses. An even smarter user would use different wallets, so they don't create a blockchain trail when they make a payment.

As a quick test, 51 out of 9999 posts contain at least one Bitcoin address (starting with 1 or 3, ignoring Bech32).

For now I won't go continue this search. If I ever do, I'll move this discussion to Reputation.

Casdinyard
Hero Member
*****
Offline Offline

Activity: 2170
Merit: 891


Leading Crypto Sports Betting and Casino Platform


View Profile
August 18, 2020, 11:01:17 AM
 #6

Can you also scrape all the Bitcoin Address used here in forum and the user that uses it?
I actually can Cheesy I found this regexp on Stackoverflow:
Code:
egrep --regexp="^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$" filename
With some slight changes it stops matching parts of Eth-addresses:
Code:
egrep -w --regexp="[13][a-km-zA-HJ-NP-Z1-9]{25,34}" *
I could run this code on 53 million archived posts, but the main problem will be excluding quotes. That's annoying and slow to do, and if I don't exclude them, it will completely mess up the data. On the other hand, quotes may still contain information that was deleted by the user who posted it.
Even without quotes, users still post Bitcoin addresses that aren't theirs, for instance when providing evidence on a scammer.

I think it would be possible if and only if you scraped the following boards:
  • Services
  • Bounties
  • Marketplace in general (both BTC and Alt)
  • And Marketplaces of all local boards if applicable/available

With that, detection with evidences on a scam wouldn't be a problem to the matter. And yes, it would be hard especially if threads/posts were deleted. But it mustn't be a problem as long as a list can be made to simply be a reference of which user had used nor mentioned any addresses throughout his post history.

Quote
I think it would help labeling the users and alt accounts throughout the entire forum, and would make it easier to detect which accounts are linked to each other
A smart user would simply use different addresses. An even smarter user would use different wallets, so they don't create a blockchain trail when they make a payment.
As a quick test, 51 out of 9999 posts contain at least one Bitcoin address (starting with 1 or 3, ignoring Bech32).
For now I won't go continue this search. If I ever do, I'll move this discussion to Reputation.

I'm looking forward to make it happen. Have I already mentioned my project on making an app (a BPIP ripoff) and such data would be helpful in it. I'm still on the planning stage to which should I go first and with many scraped data you've done, it would help me to make less scraping but rather make an API to just look up on your data.

..Stake.com..   ▄████████████████████████████████████▄
   ██ ▄▄▄▄▄▄▄▄▄▄            ▄▄▄▄▄▄▄▄▄▄ ██  ▄████▄
   ██ ▀▀▀▀▀▀▀▀▀▀ ██████████ ▀▀▀▀▀▀▀▀▀▀ ██  ██████
   ██ ██████████ ██      ██ ██████████ ██   ▀██▀
   ██ ██      ██ ██████  ██ ██      ██ ██    ██
   ██ ██████  ██ █████  ███ ██████  ██ ████▄ ██
   ██ █████  ███ ████  ████ █████  ███ ████████
   ██ ████  ████ ██████████ ████  ████ ████▀
   ██ ██████████ ▄▄▄▄▄▄▄▄▄▄ ██████████ ██
   ██            ▀▀▀▀▀▀▀▀▀▀            ██ 
   ▀█████████▀ ▄████████████▄ ▀█████████▀
  ▄▄▄▄▄▄▄▄▄▄▄▄███  ██  ██  ███▄▄▄▄▄▄▄▄▄▄▄▄
 ██████████████████████████████████████████
▄▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▄
█  ▄▀▄             █▀▀█▀▄▄
█  █▀█             █  ▐  ▐▌
█       ▄██▄       █  ▌  █
█     ▄██████▄     █  ▌ ▐▌
█    ██████████    █ ▐  █
█   ▐██████████▌   █ ▐ ▐▌
█    ▀▀██████▀▀    █ ▌ █
█     ▄▄▄██▄▄▄     █ ▌▐▌
█                  █▐ █
█                  █▐▐▌
█                  █▐█
▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▀█
▄▄█████████▄▄
▄██▀▀▀▀█████▀▀▀▀██▄
▄█▀       ▐█▌       ▀█▄
██         ▐█▌         ██
████▄     ▄█████▄     ▄████
████████▄███████████▄████████
███▀    █████████████    ▀███
██       ███████████       ██
▀█▄       █████████       ▄█▀
▀█▄    ▄██▀▀▀▀▀▀▀██▄  ▄▄▄█▀
▀███████         ███████▀
▀█████▄       ▄█████▀
▀▀▀███▄▄▄███▀▀▀
..PLAY NOW..
TryNinja
Legendary
*
Offline Offline

Activity: 2968
Merit: 7398



View Profile WWW
August 18, 2020, 11:31:28 AM
 #7

I could run this code on 53 million archived posts, but the main problem will be excluding quotes. That's annoying and slow to do, and if I don't exclude them, it will completely mess up the data. On the other hand, quotes may still contain information that was deleted by the user who posted it.
Even without quotes, users still post Bitcoin addresses that aren't theirs, for instance when providing evidence on a scammer.
This is planned for my post archive. I had done that but only with ETH addresses and the 15m posts you sent me + the new scraped one.

I plan to scan all old posts + new ones for ETH and BTC addresses after everything is working fine (new bot + full database with the whole post archive).

LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3458
Merit: 17485


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 18, 2020, 11:36:26 AM
 #8

This is planned for my post archive. I had done that but only with ETH addresses and the 15m posts you sent me + the new scraped one.
Great, saves me the trouble Smiley
Can I request a CSV of all the results? That makes it so much easier to use all data than getting them per address through your site.
Just something with (at least) "address,userID,msgID" would be great for further analysis.

I'm still on the planning stage to which should I go first and with many scraped data you've done, it would help me to make less scraping but rather make an API to just look up on your data.
I can get you a copy of all archived posts like I gave TryNinja if it helps. It beats scraping the forum again, although I didn't keep track of board names per topic.

TryNinja
Legendary
*
Offline Offline

Activity: 2968
Merit: 7398



View Profile WWW
August 18, 2020, 11:41:10 AM
 #9

Great, saves me the trouble Smiley
Can I request a CSV of all the results? That makes it so much easier to use all data than getting them per address through your site.
Just something with (at least) "address,userID,msgID" would be great for further analysis.
Of course. Once in the database, it's pretty easy to export them to the format I want.

BTCW
Copper Member
Full Member
***
Offline Offline

Activity: 193
Merit: 255

Click "+Merit" top-right corner


View Profile
August 19, 2020, 11:03:18 PM
Last edit: August 20, 2020, 08:12:25 AM by BTCW
Merited by LoyceV (6), MrFreeDragon (2)
 #10


Updates
Sorting a list that doesn't fit in the server's RAM is very slow. Therefore I only update unique_addresses.txt.gz twice a month (on the 6th and 21st). Check the file date here to see how old it is. If an update fails, please post here.
In between updates, I create daily updates: alladdresses.loyce.club:20319/daily_updates/. These txt-files contain unique addresses (for that day) in order of appearance.
Due to limitations in disk space, I don't do automatic updates for addresses.txt.gz. It's complete until blockchair_bitcoin_outputs_20200719.tsv.gz.



This is a wonderful initiative! A comment: Sorting a very large list with little RAM is not necessarily a problem! Try:


Code:
mkdir tmp
cat unsorted.txt | sort -u -S 65% -T tmp > sorted.txt
rm -r tmp

-S will tell your machine to use at most 65% CPU; this is some sort of optimum, according to my experience
-T puts temporary files in a directory (here named "tmp") and not in RAM; if you have an SSD, the speed isn't too shabby

I have sorted huge lists (>80 GB) on budget laptops using these two arguments. Worth a shot! If you want better hosting, PM me.

SendBTC.me <<< amazing imitative
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3458
Merit: 17485


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 20, 2020, 09:08:20 AM
 #11

Code:
cat unsorted.txt | sort -u -S 65% -T tmp > sorted.txt
I'm already using "sort", which uses /tmp by default.

I'll try "sort -u" though, it might need less temporary storage than "sort | uniq". The next update is scheduled for tomorrow, I'll see how it performs.

Quote
-S will tell your machine to use at most 65% CPU
I think you mean RAM, not CPU. This VM has only 256 MB, so I'll let "sort" figure it out on it's own.

Quote
-T puts temporary files in a directory (here named "tmp") and not in RAM; if you have an SSD, the speed isn't too shabby
That's default behaviour Smiley It doesn't have an SSD though, and I'm using "cputool" to keep server load low. I'm okay without daily updates on this, I wouldn't want users to download this large file on a daily basis anyway.

Quote
I have sorted huge lists (>80 GB) on budget laptops using these two arguments. Worth a shot! If you want better hosting, PM me.
Since last year, I'm using an AWS server donated by suchmoon for loyce.club. However, since AWS charges $0.15/GB, I'm not comfortable hosting very large files on suchmoon's server.
When I tested sorting data on AWS, it started throtting disk IO after a while, which made it very slow. I've also tested a pay-by-the-hour-VPS, and obviously it was a lot faster.

There's one thing on my wish list though: a method to show only unique addresses in order of appearance (without sorting them). It can be done with awk '!a[$0]++', but this requires a lot of memory and doesn't use temporary files.

NotATether
Legendary
*
Offline Offline

Activity: 1750
Merit: 7305


In memory of o_e_l_e_o


View Profile WWW
August 20, 2020, 12:40:56 PM
Merited by LoyceV (4)
 #12

Quote
-S will tell your machine to use at most 65% CPU
I think you mean RAM, not CPU. This VM has only 256 MB, so I'll let "sort" figure it out on it's own.

That is correct, the argument to -S is the amount of memory for sort(1) to use for its main buffer (manpage source). With a percentage it should calculate the amount of memory to reserve. But I think even a 256MB buffer is too small for the size of the dataset you're sorting, it will hit the disk too much.

Quote
-T puts temporary files in a directory (here named "tmp") and not in RAM; if you have an SSD, the speed isn't too shabby
That's default behaviour Smiley It doesn't have an SSD though, and I'm using "cputool" to keep server load low. I'm okay without daily updates on this, I wouldn't want users to download this large file on a daily basis anyway.

Quote
I have sorted huge lists (>80 GB) on budget laptops using these two arguments. Worth a shot! If you want better hosting, PM me.
Since last year, I'm using an AWS server donated by suchmoon for loyce.club. However, since AWS charges $0.15/GB, I'm not comfortable hosting very large files on suchmoon's server.
When I tested sorting data on AWS, it started throtting disk IO after a while, which made it very slow. I've also tested a pay-by-the-hour-VPS, and obviously it was a lot faster.

That's strange because all AWS servers have an SSD configured as the boot disk. If you are sorting in a VM, then all that sorting is done in a virtual hard disk, so not only are you moving memory into temporary host SSD space, it's being moved inside a virtual disk file inside said SSD and that puts extra strain on your hypervisor's emulated disk controller.

So, it's emulating all the disk controller calls that read and write data from the disk, updates disk cache and its other jobs while sort(1) moves data between its memory buffer in RAM and the hard disk (which is actually a file on your host). And it's doing that for the entire 31GB of addresses, and the algorithm sort uses needs an O(n log(n)) space, which I calculate to be 310GB for your data. All this while running emulated disk writes and reads. On top of that there is the hardware-accelerated reads and writes that the host does for the VM to it's disk file. That explains the poor performance while sorting.

You'll have better disk performance if you sort outside of a VM.

LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3458
Merit: 17485


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 20, 2020, 02:17:52 PM
 #13

That's strange because all AWS servers have an SSD configured as the boot disk.
I guess it wasn't clear that alladdresses.loyce.club:20319 doesn't run at AWS. It uses HDD.

Quote
And it's doing that for the entire 31GB of addresses, and the algorithm sort uses needs an O(n log(n)) space, which I calculate to be 310GB for your data.
It takes many hours while keeping server load low, but it really isn't a problem.
If someone has enough RAM to experiment, I'd love to see the result of this (on the 31 GB file):

NotATether
Legendary
*
Offline Offline

Activity: 1750
Merit: 7305


In memory of o_e_l_e_o


View Profile WWW
August 20, 2020, 09:50:01 PM
 #14

@LoyceV how large is the uncompressed addresses.txt.gz? It is at least 200GB and counting and it's still extracting legacy addresses. I'm worried I may run out of disk space before it's all extracted. I have a 1TB quota. If you know how big is the uncompressed unique_addresses.txt.gz while you're at it that will be useful to know.

LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3458
Merit: 17485


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 21, 2020, 08:54:28 AM
Last edit: November 28, 2020, 02:33:15 PM by LoyceV
 #15

@LoyceV how large is the uncompressed addresses.txt.gz?
It gets around 50% larger, Bitcoin addresses don't compress very well.

NotATether
Legendary
*
Offline Offline

Activity: 1750
Merit: 7305


In memory of o_e_l_e_o


View Profile WWW
August 21, 2020, 09:59:52 AM
 #16

If someone has enough RAM to experiment, I'd love to see the result of this (on the 31 GB file):

I suggest instead of the awk one-liner you look at gz-sort, it is a small linux program that sorts gzip-compressed files on disk while using a very small memory buffer, as low as 4 megabytes.

You sort the file using
Code:
gz-sort -u addresses.txt.gz addresses_sorted.txt.gz

The -u switch removes duplicate lines from the sorted output, and you can increase the buffer size to give it a larger buffer for transporting stuff, but this isn't necessary. I used -S 1G to give it a 1 gigabyte buffer and it took around 7 hours to complete so not much shorter than the advertised completion time, 9 or 10 hours. So this program will run well in your VM, the RAM factor isn't important.

You need to compile it yourself using make but it has minimal dependencies, only zlib and GNU headers.

I used it to find the smallest address in the dump using
Code:
zcat addresses_sorted.txt.gz | head -n 55405 | uniq

This prints 1111111111111111111114oLvT2. This address was used 55405 times (!)

Here are some the other smallest addresses:

Code:
1111111111111111111114oLvT2
111111111111111111112BEH2ro
111111111111111111112xT3273
1111111111111111111141MmnWZ
111111111111111111114ysyUW1
1111111111111111111184AqYnc
11111111111111111111BZbvjr
11111111111111111111CJawggc
11111111111111111111HV1eYjP
11111111111111111111HeBAGj
11111111111111111111QekFQw
11111111111111111111UpYBrS
11111111111111111111g4hiWR
11111111111111111111jGyPM8
11111111111111111111o9FmEC
11111111111111111111ufYVpS
111111111111111111121xzjPWX1
111111111111111111128gzo7iT
11111111111111111112AmVxQeF
11111111111111111112Fr3DURyz
11111111111111111112GvNtZ1K
11111111111111111112VUYD4wA
1111111111111111111313xyAwW
111111111111111111137vGPgFbT
11111111111111111113aT9ZSLG
111111111111111111168xDACCG
11111111111111111116B8w87yU



Maybe you can also make a list of addresses sorted by balance, now that you have an efficient way to deduplicate them.

LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3458
Merit: 17485


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 21, 2020, 11:29:22 AM
Last edit: November 28, 2020, 02:37:12 PM by LoyceV
 #17

If someone has enough RAM to experiment, I'd love to see the result of this (on the 31 GB file):
I suggest instead of the awk one-liner you look at gz-sort, it is a small linux program that sorts gzip-compressed files on disk while using a very small memory buffer, as low as 4 megabytes.
I checked, but it does what I'm doing already. The awk-command removes duplicate lines without sorting the lines. I'd like to do it, but I can't run it.

Quote
This prints 1111111111111111111114oLvT2. This address was used 55405 times (!)
I'd be interested to see which real address is the shortest. The 111111111-addresses are all burn addresses. I'm not entirely sure what determines address length, but from what I've seen, shorter addresses are much harder to find. I've been looking for short addresses created from mini-private-keys, and they were quite rare.
To find a real short address, it needs to have sent funds too.

Quote
Maybe you can also make a list of addresses sorted by balance
See List of all Bitcoin addresses with a balance.

naufragus
Newbie
*
Offline Offline

Activity: 29
Merit: 50


View Profile
August 23, 2020, 10:14:58 PM
Merited by LoyceV (12), ABCbits (4)
 #18

I actually can Cheesy I found this regexp on Stackoverflow:
Code:
egrep --regexp="^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$" filename
With some slight changes it stops matching parts of Eth-addresses:
Code:
egrep -w --regexp="[13][a-km-zA-HJ-NP-Z1-9]{25,34}" *


I have compiled these from various sources and use them to automatically set my blockchain explorer options based on user input, and also keep them at my .zshrc :
Code:
#cryptocurrency greps

#btc1 and btc2 combined
alias btcgrep="grep -Ee '\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b' -e '\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b'"

#legacy addresses only
alias btcgrep1="grep -E '\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b'"
#http://mokagio.github.io/tech-journal/2014/11/21/regex-bitcoin.html

#bech32 v1 and v0 addresses
alias btcgrep2="grep -E '\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b'"
#https://stackoverflow.com/questions/21683680/regex-to-match-bitcoin-addresses

#bech32 addresses only
alias btcgrep3="grep -E '\bbc1[ac-hj-np-zAC-HJ-NP-Z02-9]{11,71}\b'"

#both legacy and bech32
alias btcgrep4="grep -E '\b([13][a-km-zA-HJ-NP-Z1-9]{25,34}|bc1[ac-hj-np-zAC-HJ-NP-Z02-9]{11,71})\b'"
#http://mokagio.github.io/tech-journal/2014/11/21/regex-bitcoin.html

#private keys
alias btcgrep5="grep -E '\b[5KL][1-9A-HJ-NP-Za-km-z]{50,51}\b'"
#word boundary: '\b'
#https://bitcoin.stackexchange.com/questions/56737/how-can-i-find-a-bitcoin-private-key-that-i-saved-in-a-text-file

#transaction hashes
alias btcgrep6="grep -E '\b[a-fA-F0-9]{64}\b'"
#https://stackoverflow.com/questions/46255833/bitcoin-block-and-transaction-regex
#https://bitcoin.stackexchange.com/questions/70261/recognize-bitcoin-address-from-block-hash-and-transaction-hash

#block hashes
alias btcgrep7="grep -E '\b[0]{8}[a-fA-F0-9]{56}\b'"
#https://stackoverflow.com/questions/46255833/bitcoin-block-and-transaction-regex

#ethereum address hash
#test for 'plausibility'
alias ethgrep="grep -E '\b(0x)?[0-9a-fA-F]{40}\b'"
#https://ethereum.stackexchange.com/questions/1374/how-can-i-check-if-an-ethereum-address-is-valid

#ethereum transaction hash
alias ethgrep2="grep -E '\b(0x)?([A-Fa-f0-9]{64})\b'"  #parentheses are not necessary
#https://ethereum.stackexchange.com/questions/34285/what-is-the-regex-to-validate-an-ethereum-transaction-hash/34286

Flag -w is 'word bondary' and can also be set within the regex with '\b' at the ends.

Very good work on compiling those addresses, mate!
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3458
Merit: 17485


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 24, 2020, 11:39:02 AM
Last edit: August 24, 2020, 05:16:35 PM by LoyceV
 #19

If someone has enough RAM to experiment, I'd love to see the result of this (on the 31 GB file):
This looks very promising:
Code:
cat -n input.txt | sort -uk2 | sort -nk1 | cut -f2- > output.txt
I'll be testing it soon.

Some results: The awk-thing uses just over 1 GB memory for 10 million addresses. So for 1.5 billion addresses, a 256 GB server should be enough. At AWS, that would cost a few dollars per hour.

I've tested with the first 10 million lines, and can confirm both give the same result:
Code:
head -n 10000000 addresses.txt | awk '!a[$0]++' | md5sum
head -n 10000000 addresses.txt | nl | sort -uk2 | sort -nk1 | cut -f2 | md5sum
As expected, awk is faster.

seoincorporation
Legendary
*
Offline Offline

Activity: 3304
Merit: 3094



View Profile
August 24, 2020, 11:31:39 PM
 #20

This is an awesome apport for the community, some weeks ago i see a user asking for a list like this to make a bruteforce... Some users use their addy as password, that's why a list like this is a great tool, thanks again to LoyceV for making it fo us.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
Pages: [1] 2 3 4 5 6 7 8 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!