LoyceV (OP)
Legendary
Offline
Activity: 3612
Merit: 18439
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
December 11, 2024, 10:17:05 AM |
|
It's back!My site addresses.loyce.club/ is back online. For now, there's only the most recent snapshot. During the next year, I'll keep more snapshots again.
I've received a few PMs from people who missed my data. It's always good to see it fills a need.
|
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3612
Merit: 18439
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
January 03, 2025, 04:45:12 PM |
|
It was brought to my attention that my " sort" is "different" now, and I got these results testing: cat Bitcoin_addresses_LATEST.txt.gz | gunzip | sha256sum df0baad2301e9b897a02bd3fccb115968c82eb3956143e2f5b4c3ad7b2c227bf - So far so good. Now, this file is sorted on my server from a cronjob. But when I sort it on my local computer, I get this: cat Bitcoin_addresses_LATEST.txt.gz | gunzip | sort -S20% | sha256sum 27c2541369d0546ec7c7e70d09d807d8fc6d39435f8857e5ebbf8386584be2d2 - Has anyone else noticed an incompatible sorting method? Should I change this to a different sorting? Or would that break scripts from people who are currently using it?
|
|
|
|
pbies
|
It was brought to my attention that my " sort" is "different" now, and I got these results testing: cat Bitcoin_addresses_LATEST.txt.gz | gunzip | sha256sum df0baad2301e9b897a02bd3fccb115968c82eb3956143e2f5b4c3ad7b2c227bf - So far so good. Now, this file is sorted on my server from a cronjob. But when I sort it on my local computer, I get this: cat Bitcoin_addresses_LATEST.txt.gz | gunzip | sort -S20% | sha256sum 27c2541369d0546ec7c7e70d09d807d8fc6d39435f8857e5ebbf8386584be2d2 - Has anyone else noticed an incompatible sorting method? Should I change this to a different sorting? Or would that break scripts from people who are currently using it? We were talking about that in our private messages. My suggestions: - Use pv instead of cat, so you could see progress, it won't affect the result
- Maybe sorting should use LC_ALL=C or LC_ALL=C.UTF-8 before sorting command so it could be always one type of sorting for all systems (it should work like that)
- Because systems/servers/OSes differ, we always should give the sorting way for each sorting command (LC_ALL...)
- If we change that now, we can break peoples' scripts, but we should make one way of sorting forever, that's a engineering idea as it should be
- We can see in sorted file, on first page that fits the screen that the sorting differs depending on system or given LC_ALL; it is visible by naked eye that the addresses are sorted other way (mainly lowercase-uppercase are in other order)
|
BTC: bc1qmrexlspd24kevspp42uvjg7sjwm8xcf9w86h5k
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3612
Merit: 18439
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
January 03, 2025, 05:51:18 PM |
|
We were talking about that in our private messages. Thank you for pointing this out  Maybe sorting should use LC_ALL=C or LC_ALL=C.UTF-8 before sorting command so it could be always one type of sorting for all systems (it should work like that) I'll wait if someone responds with a good reason to keep things the way they are. If not, I think I'll go for LC_ALL=C. Because systems/servers/OSes differ, we always should give the sorting way for each sorting command (LC_ALL...) I agree. I just didn't know about the difference, and (before my dedicated server disappeared) never stumbled upon this problem. If we change that now, we can break peoples' scripts, but we should make one way of sorting forever, that's a engineering idea as it should be Let's say give it 2 weeks. But I guess most people don't read here, until after I broke their script by changing things  We can see in sorted file, on first page that fits the screen that the sorting differs depending on system or given LC_ALL; it is visible by naked eye that the addresses are sorted other way (mainly lowercase-uppercase are in other order) Here's the difference: 11111111111111111111HV1eYjP 11111111111111111111HeBAGj 11111111111111111111QekFQw 11111111111111111111UpYBrS 11111111111111111111g4hiWR 11111111111111111111jGyPM8 11111111111111111111o9FmEC 11111111111111111111ufYVpS vs: 11111111111111111111g4hiWR 11111111111111111111HeBAGj 11111111111111111111HV1eYjP 11111111111111111111jGyPM8 11111111111111111111o9FmEC 11111111111111111111QekFQw 11111111111111111111ufYVpS 11111111111111111111UpYBrS That is annoying to deal with!
This can of course easily be avoided by sorting the data on your local system before using it. For this project, it's quite easy. But for all Bitcoin addresses ever used, it can take hours to sort the data.
|
|
|
|
15052000bitcoin
Newbie
Offline
Activity: 1
Merit: 0
|
 |
January 29, 2025, 04:38:40 AM |
|
|
|
|
|
pbies
|
 |
January 29, 2025, 04:50:50 AM Last edit: January 29, 2025, 05:02:24 AM by pbies Merited by LoyceV (8), vapourminer (1) |
|
@LoyceV So I've made some further tests. And seems like it is all ok! LC_ALL=C should be used on systems, that have it local or different. I've put LC_ALL=C before sort command and before compare command (my cmn script, which uses comm program) to test: sort-u-mt: #!/usr/bin/env bash FILESIZE=$(stat -c%s "$1") time pv -cN input "$1" | dos2unix -f | LC_ALL=C sort -u -S 20% --parallel=16 | pv -cN output -s $FILESIZE > "$1.sorted~" if [[ -s "$1.sorted~" ]] then mv "$1.sorted~" "$1" echo Done. else echo Error! fi >&2 echo -ne "\a"
cmn: #!/usr/bin/env bash time LC_ALL=C comm -12 <(pv -cN in1 "$1") <(pv -cN in2 "$2") | (pv -cN out) > "$3" echo -e "\nResult file has $(wc -l < "$3") lines, head:" head "$3" >&2 echo -ne "\a"
So your files are LC_ALL=C sorted but thru my sort my files are not. If we add LC_ALL=C before sort and comm we get the expected results. So in my opinion there is no change needed.
|
BTC: bc1qmrexlspd24kevspp42uvjg7sjwm8xcf9w86h5k
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3612
Merit: 18439
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
January 29, 2025, 08:20:43 AM |
|
I've put LC_ALL=C before sort command Why not just put it at the start of your script once? export LC_ALL=C I'll add this to the OP. And I'll add it to my own script, so it no longer depends on the server I'm using. And right when I wanted to do this, I realized it's there already: export LC_ALL=C # This makes "sort" a few percent faster The reason I added this a long time ago is in the comment. I completely forgot this was in there. I'll also add it to " all addresses ever used". sort -u -S 20% --parallel=16 Are you sure this makes it faster? When I tested it (on a server with HDD), adding more CPU threads only helps if it fits in RAM, and with more threads, sort needs more memory so you don't want that if it means writing more to /tmp. Without the parallel-setting, sort already uses many cores. So I used this setting to limit it.
|
|
|
|
pbies
|
 |
January 29, 2025, 02:38:15 PM Last edit: January 29, 2025, 03:10:52 PM by pbies |
|
sort -u -S 20% --parallel=16 Are you sure this makes it faster? When I tested it (on a server with HDD), adding more CPU threads only helps if it fits in RAM, and with more threads, sort needs more memory so you don't want that if it means writing more to /tmp. Without the parallel-setting, sort already uses many cores. So I used this setting to limit it. Well, need to test memory percent value and number of threads with different values to make it fastest. I think sort was single-threaded when no -parallel setting was given.
|
BTC: bc1qmrexlspd24kevspp42uvjg7sjwm8xcf9w86h5k
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3612
Merit: 18439
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
January 29, 2025, 02:45:17 PM |
|
I think sort was single-threaded when no -parallel setting was given. Mine uses all cores until available memory becomes a limitation.
|
|
|
|
timon174174
Newbie
Offline
Activity: 2
Merit: 2
|
 |
March 03, 2025, 07:21:02 PM |
|
Hello, I can't download blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz
Forbidden
|
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3612
Merit: 18439
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
March 03, 2025, 07:36:13 PM Last edit: March 04, 2025, 08:49:18 AM by LoyceV Merited by timon174174 (1) |
|
I can't download blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz Forbidden Thanks for reporting this. I'm not sure what went wrong, I've started an update but that will take a while to complete. Please check again tomorrow. Update: it works again 
|
|
|
|
timon174174
Newbie
Offline
Activity: 2
Merit: 2
|
 |
March 04, 2025, 09:17:10 AM |
|
I can't download blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz Forbidden Thanks for reporting this. I'm not sure what went wrong, I've started an update but that will take a while to complete. Please check again tomorrow. Update: it works again  Yes it works, thank you for what you do
|
|
|
|
pbies
|
 |
March 08, 2025, 05:07:48 AM |
|
Seems like there is still sth wrong with the files downloaded.
For few days it didn't changed, repeatedly downloading the same file again and again...
|
BTC: bc1qmrexlspd24kevspp42uvjg7sjwm8xcf9w86h5k
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3612
Merit: 18439
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
March 08, 2025, 09:40:13 AM |
|
For few days it didn't changed, repeatedly downloading the same file again and again... Thanks for letting me know. My last manual fix didn't remove a temporary directory, which prevented it from running again. It's should be okay again (but takes a while to update).
|
|
|
|
pbies
|
 |
March 08, 2025, 03:03:57 PM |
|
When the dumps are created?
Every day at which hour?
I would make crontab entry to download a fresh one file each day...
|
BTC: bc1qmrexlspd24kevspp42uvjg7sjwm8xcf9w86h5k
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3612
Merit: 18439
Thick-Skinned Gang Leader and Golden Feather 2021
|
When the dumps are created? Every day at which hour? There's no fixed time: Blockchair's source data is sometimes delayed, which is why I check the file date before proceeding. If there's no new file, I try again an hour later. I would make crontab entry to download a fresh one file each day... Instead of downloading LATEST, you could download "today's date": http://addresses.loyce.club/Bitcoin_addresses_March_08_2025.txt.gzIf it doesn't exist, try again after an hour. That avoids duplicate downloads.
|
|
|
|
pbies
|
 |
March 08, 2025, 06:00:21 PM |
|
Python 3 script below for adding to /etc/crontab for each hour. It will download new file when files are different length. You need to cd in crontab to folder where you have script and target file. #!/usr/bin/env bash # apt install aria2 echo Checking file size... h=$(curl -sI http://addresses.loyce.club/Bitcoin_addresses_LATEST.txt.gz) cl=$(echo "$h"|grep Content-Length) l=$(echo "$cl"|grep -oE '[0-9]+') f=$(stat -c%s Bitcoin_addresses_LATEST.txt.gz) if [ $f -eq $l ]; then echo Duplicate! ; kill -INT $$ ; fi echo Removing old file... rm -f ./Bitcoin_addresses_LATEST.txt.gz echo Downloading... aria2c -x4 http://addresses.loyce.club/Bitcoin_addresses_LATEST.txt.gz echo Unpacking... pv Bitcoin_addresses_LATEST.txt.gz | gunzip > addrs-with-bal.txt echo Done! >&2 echo -ne "\a"
|
BTC: bc1qmrexlspd24kevspp42uvjg7sjwm8xcf9w86h5k
|
|
|
AliBah
Newbie
Offline
Activity: 42
Merit: 0
|
 |
March 13, 2025, 07:17:51 AM |
|
the last file downloaded correctly but when i want to use or extract that i got error :
D:\blockchair_bitcoin_addresses_and_balance_March_12_2025.tsv.gz: Checksum error in D:\blockchair_bitcoin_addresses_and_balance_March_12_2025.tsv\blockchair_bitcoin_addresses_and_balance_March_12_2025.tsv. The file is corrupt
|
|
|
|
LoyceV (OP)
Legendary
Offline
Activity: 3612
Merit: 18439
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
March 13, 2025, 07:49:02 AM |
|
the last file downloaded correctly but when i want to use or extract that i got error :
D:\blockchair_bitcoin_addresses_and_balance_March_12_2025.tsv.gz: Checksum error in D:\blockchair_bitcoin_addresses_and_balance_March_12_2025.tsv\blockchair_bitcoin_addresses_and_balance_March_12_2025.tsv. The file is corrupt I tested the file: gzip -t blockchair_bitcoin_addresses_and_balance_March_12_2025.tsv.gz This doesn't give any errors. Just download it again.
|
|
|
|
AliBah
Newbie
Offline
Activity: 42
Merit: 0
|
 |
March 13, 2025, 08:58:46 AM |
|
SalaR@PC-User MINGW64 /d # gzip -t blockchair_bitcoin_addresses_and_balance_March_12_2025.tsv.gz
gzip: blockchair_bitcoin_addresses_and_balance_March_12_2025.tsv.gz: invalid compressed data--format violated
redownloaded and still got error
I tried this file : blockchair_bitcoin_addresses_and_balance_March_11_2025.tsv.gz and thats ok !!!
|
|
|
|
|