Bitcoin Forum
May 09, 2024, 11:38:11 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 [4] 5 6 7 8 »  All
  Print  
Author Topic: List of all Bitcoin addresses ever used - weekly updates work again!  (Read 3789 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic. (3 posts by 1+ user deleted.)
renedx
Jr. Member
*
Offline Offline

Activity: 36
Merit: 3


View Profile
January 20, 2021, 04:23:15 AM
Merited by LoyceV (2)
 #61

Just want to say what a great job you did. We use your data to build graphs and do some fun stuff (download each month twice to be not so demanding on your bandwidth).

We were building a pubkey list too, but wasn’t worth the effort at the end in our part (wasn’t much fun you could really do with it).

For living we host high-end enterprise, just in case you need some space or mirrors, you’re welcome if ever in need.

Thanks  Wink
1715254691
Hero Member
*
Offline Offline

Posts: 1715254691

View Profile Personal Message (Offline)

Ignore
1715254691
Reply with quote  #2

1715254691
Report to moderator
1715254691
Hero Member
*
Offline Offline

Posts: 1715254691

View Profile Personal Message (Offline)

Ignore
1715254691
Reply with quote  #2

1715254691
Report to moderator
1715254691
Hero Member
*
Offline Offline

Posts: 1715254691

View Profile Personal Message (Offline)

Ignore
1715254691
Reply with quote  #2

1715254691
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715254691
Hero Member
*
Offline Offline

Posts: 1715254691

View Profile Personal Message (Offline)

Ignore
1715254691
Reply with quote  #2

1715254691
Report to moderator
NotATether
Legendary
*
Online Online

Activity: 1596
Merit: 6732


bitcoincleanup.com / bitmixlist.org


View Profile WWW
January 20, 2021, 05:01:21 AM
 #62

Is it possible to link the public key for every bitcoin address in your database?
If I can get the data I can add it. I'm no expert on this, can I use anything from inputs (maybe spending_signature_hex?) to get this data?

I looked for compressed keys at the end of the spending_signautre_hex values and I found that a lot of them don't have public keys at the end. Makes me think they are signatures of transactions, not scripts.

So the real solution (fell asleep while studying the dataset  Cheesy) is to take the transaction_hex field and pass it as the argument to the "decoderawtransaction" RPC call. It'll return JSON where the signature script is located at [N]["vin"]["scriptSig"]["hex"] for each input index N and then get the compressed public key in the last 33 bytes of the hex.

You'll obviously need a bitcoind for that and it's possible to configure access from another machine if you're resource-constrained on the box you're parsing this address data on.

I think bitcoind should be able to handle the load, especially if it's running locally.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
LoyceV (OP)
Legendary
*
Online Online

Activity: 3304
Merit: 16623


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
January 20, 2021, 10:10:16 AM
 #63

we host high-end enterprise, just in case you need some space or mirrors, you’re welcome if ever in need.
For now, I'm covered for bandwidth, thanks Smiley

So the real solution (fell asleep while studying the dataset  Cheesy) is to take the transaction_hex field and pass it as the argument to the "decoderawtransaction" RPC call. It'll return JSON where the signature script is located at [N]["vin"]["scriptSig"]["hex"] for each input index N and then get the compressed public key in the last 33 bytes of the hex.
And this just went over my head Tongue

Quote
You'll obviously need a bitcoind for that and it's possible to configure access from another machine if you're resource-constrained on the box you're parsing this address data on.

I think bitcoind should be able to handle the load, especially if it's running locally.
Although I'd like to be able to extract all data myself from Bitcoin Core (so I don't need to rely on Blockchair anymore), it also makes it much more complicated. So for now, I'll pass on this.
And I don't want to add more local data processing to what I'm doing already. If anything, I want to move more to a VPS.

JustHereReading
Newbie
*
Offline Offline

Activity: 6
Merit: 15


View Profile
January 20, 2021, 06:20:10 PM
 #64

I can think of another option that might work: if I use the sorted list to get the new addresses, I can get those out of the daily update while keeping the chronological order. This way I only have to deal with two 20 MB files which is easy. After this, all I have to do is add them to the total file.

I found a bit of time to write this. Testing it now..

Just to check with you, I was sorta wrong here:
Given two sorted lists:
n = 1 5 10 11 12 13 14 15 16 19 20
k = 3 6 18

We can read n from disk line by line and compare it to the current position in k.

1 < 3, write 1 to new file.
5 > 3, write 3 to file.
5 < 6, write 5 to file.
10 > 6, write 6 to file.
10 < 18, write 10 to file.
11 < 18, write 11 to file.
....
16 < 18, write 16 to file.
19 > 18, write 18 to file.
19 & nothing left in k, write 19 to file.
20 & nothing left in k, write 20 to file.

That's n + k instead of n * k, right?

Since we're sorting as strings it would actually be:
n = 1 10 11 12 13 14 15 16 19 20 5
k = 18 3 6

The whole list would then become:
all = 1 10 11 12 13 14 15 16 18 19 20 3 5 6

Correct?
JustHereReading
Newbie
*
Offline Offline

Activity: 6
Merit: 15


View Profile
January 21, 2021, 12:41:01 PM
Merited by LoyceV (4)
 #65

I can think of another option that might work: if I use the sorted list to get the new addresses, I can get those out of the daily update while keeping the chronological order. This way I only have to deal with two 20 MB files which is easy. After this, all I have to do is add them to the total file.

I found a bit of time to write this. Testing it now..

The first results of yesterday's testing look promising. I should go back and double check if the outputs are correct, but they seem to be.

I created a VM with 2 cores/2 threads (so no hyperthreading or whatever AMD's equivalent is called) of my Ryzen 3600 and 512mb of RAM (just because Ubuntu Server, for which I had an ISO handy, wouldn't boot with 256MB). To make the numbers mean anything I first benchmarked your current setup:

Code:
time sort -mu <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt) | gzip > test2.txt.gz
Loycev: real    51m26.730s
JustHereReading: real 40m13.684s

My cores were pushed to ~50% so unsurprisingly pigz yielded an improvement in my setup. However, I was a little surprised by the amount of improvement.
Code:
time sort -mu <(pigz -dc addresses_sorted.txt.gz) <(sort -u daily-file-long.txt) | pigz > output.txt.gz
real    14m29.865s

And now... for the main event:
Code:
time gunzip -c addresses_sorted.txt.gz | python3 add_new_adresses_sorted.py | gzip > output.txt.gz
real    39m42.574s
The script ran slightly faster than your current setup. In that time it sorted (and compressed) the first list in addition to creating a text file that can be appended to the second list. Unfortunately I overwrote the results of your current setup, so I didn't verify the output (yet).
LoyceV (OP)
Legendary
*
Online Online

Activity: 3304
Merit: 16623


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
January 24, 2021, 06:32:18 PM
Last edit: January 24, 2021, 06:43:01 PM by LoyceV
 #66

Since we're sorting as strings it would actually be:
n = 1 10 11 12 13 14 15 16 19 20 5
k = 18 3 6

The whole list would then become:
all = 1 10 11 12 13 14 15 16 18 19 20 3 5 6
Correct:
Code:
echo '1 10 11 12 13 14 15 16 19 20 5 18 3 6' | tr ' ' '\n' | sort -u | tr '\n' ' '
1 10 11 12 13 14 15 16 18 19 20 3 5 6

I should go back and double check if the outputs are correct, but they seem to be.
I haven't had the time yet to find my bug(s).
For comparison, here's the md5sum for the result from my old code (gunzipped):
Code:
md5sum newchronological.txt
4070c03f974da0ee05ea51084d0f04ac  newchronological.txt

Quote
However, I was a little surprised by the amount of improvement.
Using pigz instead of gzip is interesting indeed. It seems to be more efficient, in that case it will also be worth it on a VPS with only one core. I didn't know it's in the default repository, so I've installed it now.
It doesn't use multiple cores to decompress, but it's significantly faster anyway:
Code:
time gunzip -c addresses_sorted.txt.gz | md5sum
real    7m27.541s
time pigz -dc addresses_sorted.txt.gz | md5sum
real    4m35.826s

From 40m13 to 14m29 can't be explained by just using 2 instead of 1 core, so it must be more efficient. The performance difference is less spectacular on my system:
Code:
time sort -mu <(pigz -dc addresses_sorted.txt.gz) <(sort -u daily-file-long.txt) | pigz > pigz_output.txt.gz
real    31m54.478s

As for file size:
Code:
gzip: 17970427375 bytes
pigz: 17990501927 bytes
The 0.1% size difference is negligible.

Quote
And now... for the main event:
Code:
time gunzip -c addresses_sorted.txt.gz | python3 add_new_adresses_sorted.py | gzip > output.txt.gz
real    39m42.574s
Can you post your add_new_adresses_sorted.py?

MrFreeDragon
Sr. Member
****
Offline Offline

Activity: 443
Merit: 350


View Profile
January 28, 2021, 03:37:37 PM
Merited by LoyceV (12)
 #67

Is it possible to link the public key for every bitcoin address in your database?
If I can get the data I can add it. I'm no expert on this, can I use anything from inputs (maybe spending_signature_hex?) to get this data?

Yes, spending signature hash contains the public key.
As the example from your blockchair_bitcoin_inputs_20110313.tsv.gz:

1) In case of pubkey (column 'type'), the public key is exactly the value of spending signature. The public key was recorded in blockchain in early dates. So, all the early coin base transactions contains the public key.
Code:
block id: 112995
transaction_hash: 523f57581390203da2aef169b543fc1ddcd84be6dd35cfb248228b0912dc97e3

public key: 483045022100a66be0435ed532f7a065073e2549f6cb13a546efc7f55d49072f5038d0764df502206f426bff777dd44ccbab703541a38d8ff86ae09e350862c8fb9cb2b0dd79c5e801

2) For other transactions in your example the public key is part of the hex spending signature. That signature contains DER and public key.

Code:
transaction hash: d9ff0a82399ae55ab95093020ed6f17bab80697dcafe74d264900cdd56f8c0aa

48304502200a30a8634f41dbd92c007f937f6b0f220cc1f5a321252cc459092e45b1c53778022100caace1a19e1d195e8510ac290c13863f4ada16c2837d6af80ea6b237ba65975c01410418a808ea3fc6fd99c11b51f572d2e85b25a33048de2c0afda336c32da4619897629943638def6df1ab0d3acd35faa7e8c73ff1fc275677f82191b253a7bbf383

And the public key will be the following part:
48304502200a30a8634f41dbd92c007f937f6b0f220cc1f5a321252cc459092e45b1c5377802210 0caace1a19e1d195e8510ac290c13863f4ada16c2837d6af80ea6b237ba65975c01410418a808ea3fc6fd99c11b51f572d2e85b25a33048de2c0afda336c32da4619897629943638def6 df1ab0d3acd35faa7e8c73ff1fc275677f82191b253a7bbf383

Here is the example of the full structure: https://pastebin.com/Q55PyUgB

LoyceV (OP)
Legendary
*
Online Online

Activity: 3304
Merit: 16623


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
January 29, 2021, 09:55:06 AM
 #68

And the public key will be the following part:
48304502200a30a8634f41dbd92c007f937f6b0f220cc1f5a321252cc459092e45b1c5377802210 0caace1a19e1d195e8510ac290c13863f4ada16c2837d6af80ea6b237ba65975c01410418a808ea3fc6fd99c11b51f572d2e85b25a33048de2c0afda336c32da4619897629943638def6 df1ab0d3acd35faa7e8c73ff1fc275677f82191b253a7bbf383
So if I take these inputs as an example:
Code:
block_id        transaction_hash        index   time    value   value_usd       recipient       type    script_hex      is_from_coinbase        is_spendable    spending_block_id       spending_transaction_hash       spending_index  spending_time   spending_value_usd      spending_sequence       spending_signature_hex  spending_witness        lifespan        cdd
66558   d9ff0a82399ae55ab95093020ed6f17bab80697dcafe74d264900cdd56f8c0aa        0       2010-07-13 09:19:38     4000000000      0.4     12Y4RVpkQ4uKCNUpea4jJRWgd2nxScNkaG      pubkeyhash      76a91410d7e0a01323508f1881407713971fab34b7898788ac      0       -1      113331  c0ef633ac227c17860abeed2ebbbcfedf4a108e61d9950c0c68171e20e32e945        0       2011-03-13 00:06:05     36      4294967295      48304502200a30a8634f41dbd92c007f937f6b0f220cc1f5a321252cc459092e45b1c53778022100caace1a19e1d195e8510ac290c13863f4ada16c2837d6af80ea6b237ba65975c01410418a808ea3fc6fd99c11b51f572d2e85b25a33048de2c0afda336c32da4619897629943638def6df1ab0d3acd35faa7e8c73ff1fc275677f82191b253a7bbf383          20961987        9704.62361111111
112995  523f57581390203da2aef169b543fc1ddcd84be6dd35cfb248228b0912dc97e3        0       2011-03-10 09:08:48     5001000000      42.5085 1GEX1ZHu5aHjnL6HogKYHfqCAFQvcVWpKA      pubkey  4104a2032ae7a01f69747fac92c9c343ce5f1c841d450ada9b9498e84050de80a3e72d7d5f92967c2b1edc061222a28eee5d0256d60b5eafaf8b33e358c6dafa3923ac  1       1       113331  b25bacedd7ecc2d945e69cfe40dade836d57edf7583da70952ee37743d3852a8        0       2011-03-13 00:06:05     45.009  4294967295      483045022100a66be0435ed532f7a065073e2549f6cb13a546efc7f55d49072f5038d0764df502206f426bff777dd44ccbab703541a38d8ff86ae09e350862c8fb9cb2b0dd79c5e801              226637  131.18190243055554
The part you're looking for is:
Code:
type          spending_signature_hex
pubkeyhash    48304502200a30a8634f41dbd92c007f937f6b0f220cc1f5a321252cc459092e45b1c53778022100caace1a19e1d195e8510ac290c13863f4ada16c2837d6af80ea6b237ba65975c01410418a808ea3fc6fd99c11b51f572d2e85b25a33048de2c0afda336c32da4619897629943638def6df1ab0d3acd35faa7e8c73ff1fc275677f82191b253a7bbf383
pubkey        483045022100a66be0435ed532f7a065073e2549f6cb13a546efc7f55d49072f5038d0764df502206f426bff777dd44ccbab703541a38d8ff86ae09e350862c8fb9cb2b0dd79c5e801
If type is pubkeyhash, you want the whole spending_signature_hex (146 characters). Should I remove the first 16 characters to make it 130 characters long?
If type is pubkey, you want the last 130 characters?
If this is correct, I can just take the last 130 characters for both types, right?

naufragus
Newbie
*
Offline Offline

Activity: 29
Merit: 50


View Profile
January 30, 2021, 04:04:12 AM
Merited by adaseb (1)
 #69

Just to let you guys know i updated my bitcoin-all-addresses list on 2021 Jan 19.
That is available in my github repo https://github.com/mountaineerbr/bitcoin-all-addresses
All addresses are uniquely printed in the order they first appeared in Blockchair output dumps.

I was able to reproduce my methodology after 6 moths from the first lists.
The methodology is described in the read me of the git repo
and some code i used here: https://github.com/mountaineerbr/bitcoin-all-addresses/blob/master/blockchair.btcoutputs.process.sh

If you export LANG=C and LC_ALL=C, that will speed up sorting and as we are dealing
with base58 and segwit base addresses, that should be OK.
PrimeNumber7
Copper Member
Legendary
*
Offline Offline

Activity: 1624
Merit: 1899

Amazon Prime Member #7


View Profile
February 01, 2021, 03:44:02 AM
 #70

Resulting in O(n + k log k + 2k). In this particular case one might even argue that n > k log k + 2k, therefore O(2n) = O(n) However, it's late here and I don't like to argue.

You only need enough memory to keep the new addresses in memory and enough disk space to keep both the new and old version on disk at the same time.

You are correct. I had not considered updating the list not from scratch. Accessing a single line from a file can degrade performance, and it should be considered if paying for more RAM would be cost effective considering the additional time required to update the list.
LoyceV (OP)
Legendary
*
Online Online

Activity: 3304
Merit: 16623


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
February 01, 2021, 05:44:59 PM
 #71

Quoting myself:
Code:
Old code:
time cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2 | gzip > newchronological.txt.gz

Code:
New code:
time cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) > new.alladdresses_chronological.txt
But it's wrong
I tried again:
Code:
time cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2 > oldcode.txt
time cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) > newcode.txt
The files were too large to diff, so I split them in parts (10 million lines each). There are a few differences:
Code:
< 3BS6oQKHDwrzz4RC69iAbSV13xpbGZvXLj
---
> 3BS6oQKHDwrzz4SC69iAbSV13xpbGZvXLj

< 17Q7LN9nCmS6HdjkDj3C4MdhduFobGp4hv
---
> 17Q7LN9nCmS6HdjkDk3C4MdhduFobGp4hv

< 1rVH156qu1djPVFGoKaZ29Kw8zEpmh283
9863597a9863597
> 1rVH156qu1djPVFGoKaZ29Kw8zEpmh283

< 3Kw9pkLTLExTd9LZW2qbbNUdZRpUW3JTac
---
> 3Kw9pkLTLExTd9LZW2qbbNUdZSpUW3JTac

< 1Q7NSpgjxDHTTPUkGskTNDioCYw6MQazBG
---
> 1Q7NSpgjyDHTTPUkGskTNDioCYw6MQazBG

< 331xujHAg6AGvKzPwUKZ9AJxukaemCXeRw
8496039c8496038
< 3PLoa4ccMdxyGY6mAEStSu45xwqdRftd1b

> 3PLoa4ccMdxyGY6mAEStSu55xwqdRftd1b
10000000a10000000
> 3B92y4bFFPZvjviNhtLWeBKoYXmHVwr3CD

< 3B92y4bFFPZvjviNhtLWeBKoYXmHVwr3CD
7735278a7735278
> 331xujHAg6AGvKzPwUKZ9AJxukaemCXeRw

Let's highlight this one:
Quote
< 1Q7NSpgjxDHTTPUkGskTNDioCYw6MQazBG
---
> 1Q7NSpgjyDHTTPUkGskTNDioCYw6MQazBG
The first one (with x) is correct.

I have no idea what causes this. I'm now checking if I can reproduce the exact same data change, or that it's caused by hardware failure.

LoyceV (OP)
Legendary
*
Online Online

Activity: 3304
Merit: 16623


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
February 03, 2021, 09:29:54 PM
 #72

What can be the cause of this? The first and third run gave the same results, the others are different.
Code:
cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
7d2f923c7ce1d9534629b4502c37680d  -

cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
cede1315137bb4a2ab20c5438e4525ba  -

cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
112100c359f74c0e60b95afa92de990d  -

cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
1bc7138bb4a367c117002234a604d444  -

cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
cc91ef352ffa1e641a7c47dcc3d743f3  -
I'm still running the same command on the same (old) system, but from a different HDD (with tmpfiles on a different drive too).

PlutonowyPokrzycz
Newbie
*
Offline Offline

Activity: 16
Merit: 8


View Profile
March 01, 2021, 11:04:02 AM
Merited by LoyceV (1)
 #73

Hi LoyceV,
Thanks for the nice resource of data!

Your URL alladdresses.loyce.club/?C=M;O=D is not working. Can you please check?

Background
To follow up on List of all Bitcoin addresses with a balance and this post, I made a list of all Bitcoin addresses that have ever been used.

The data
See alladdresses.loyce.club (new location)
I now have the resources (RAM, CPU power and disk space) and code to show unique addresses in their original order. Each address is only shown once. I have 2 large files:

LoyceV (OP)
Legendary
*
Online Online

Activity: 3304
Merit: 16623


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
March 01, 2021, 11:13:57 AM
 #74

Your URL alladdresses.loyce.club/?C=M;O=D is not working. Can you please check?
I still haven't found a new host for this, so it's still on it's temporary location:
I've uploaded the latest version to a temporary location: blockdata.loyce.club/alladdresses/.
The latest update was 2.5 months ago (because of weird problems).

Thanks for the reminder though, I'll do some more testing to get updates working again.

LoyceV (OP)
Legendary
*
Online Online

Activity: 3304
Merit: 16623


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
May 07, 2021, 02:56:54 PM
Last edit: May 09, 2021, 02:16:18 PM by LoyceV
 #75

What can be the cause of this? The first and third run gave the same results, the others are different.
~
I'm still running the same command on the same (old) system, but from a different HDD (with tmpfiles on a different drive too).
I'm now trying the same on a fresh RamNode cloud instance, and have the same problem:
Code:
loyce@160gb:~/alladdresses.loyce.club$ cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
f96f2952151451b88edcf01332ec907d  -
loyce@160gb:~/alladdresses.loyce.club$ cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
70d3d472590b3fb8356348e9fd189ddb  -
I no longer think my very old PC is the problem. As far as I know, these commands should produce the exact same output given the exact same input data. But the data changes somehow.

I realize Bitcointalk is probably not the best forum, but I'm not actively using a more specialized forum, so I post it here.

Update: I tried two more times:
Code:
loyce@160gb:~/alladdresses.loyce.club$ cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
f96f2952151451b88edcf01332ec907d  -
loyce@160gb:~/alladdresses.loyce.club$ cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) <(cat <(cat <(cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt)) | nl -nln | sort -k2 -S80%) | uniq -df1 | sort -nk1 -S80% | cut -f2) | md5sum
d4327e249819af8d025862bd4079d44d  -
I reproduced the same checksum only once. I need booze Tongue



This was how I started:
For comparison, here's the md5sum for the result from my old code (gunzipped):
Code:
md5sum newchronological.txt
4070c03f974da0ee05ea51084d0f04ac  newchronological.txt
And if I split up my above command string and write some temporary files to disk, I get the exact same (correct) result again:
Code:
cat firstgunzip thirdsort | md5sum
cat daily_updates/*.txt | nl | sort -uk2 -S40% | sort -nk1 -S40% | cut -f2 > firstcat
cat <(gunzip -c addresses_in_order_of_first_appearance.txt.gz) > firstgunzip
cat <(sort -u daily_updates/*.txt) > firstsort
cat <(gunzip -c addresses_sorted.txt.gz) > secondgunzip
cat <(comm -13 secondgunzip firstsort) > firstcomm
#cat <(cat firstcat firstcomm | nl -nln | sort -k2 -S80%) > secondsort
cat <(cat firstcat firstcomm | nl -nln | sort -k2) > secondsort
cat <(cat secondsort | uniq -df1 | sort -nk1 | cut -f2) > thirdsort
cat firstgunzip thirdsort | md5sum
4070c03f974da0ee05ea51084d0f04ac  -



It gets weirder: I now can't even reproduce the same weird problem again, so I can't know whether or not my changes fixed it.

mike2077
Newbie
*
Offline Offline

Activity: 18
Merit: 10


View Profile
May 25, 2021, 09:57:34 AM
 #76

Hi there, first of all thanks for nice data source.

Could you check the links , all links are down.
Thanks
LoyceV (OP)
Legendary
*
Online Online

Activity: 3304
Merit: 16623


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
May 25, 2021, 10:14:06 AM
 #77

all links are down.
Sorry, I forgot to post this here:
This server is currently offline. I don't know why (yet).
Still no response from my webhost Sad

mike2077
Newbie
*
Offline Offline

Activity: 18
Merit: 10


View Profile
May 26, 2021, 04:55:39 AM
Merited by LoyceV (5), Symmetrick (5)
 #78

I can suggest sharing files via torrent. Technically you'd have to upload a lot less since a lot of data will be transferred between people downloading.
You can share magnet link here, seed for some time and then let people share it, just a suggestion.

Linux question

what is the benefit of two directional syntax -
Code:
cat <(gunzip -c addresses_sorted.txt.gz) > secondgunzip
instead of just
Code:
gunzip addresses_sorted.txt.gz
or
Code:
gunzip -c addresses_sorted.txt.gz > out.txt
?

or another exmple:

Code:
cat <(sort -u daily_updates/*.txt) > firstsort

instead of
 
Code:
cat daily_updates/*.txt |sort -u > firstsort

Thanks
LoyceV (OP)
Legendary
*
Online Online

Activity: 3304
Merit: 16623


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
May 26, 2021, 08:15:51 AM
 #79

I can suggest sharing files via torrent. Technically you'd have to upload a lot less since a lot of data will be transferred between people downloading.
I haven't tried that yet for these reasons: I don't want to upload from my desktop, so I still need a VPS. I don't expect many simultaneous downloads, so most of the uploads will still come from me. Every update will make an existing torrent useless again (and I don't want to keep posting new magnet links).

Quote
Linux question

what is the benefit of two directional syntax -
Code:
cat <(gunzip -c addresses_sorted.txt.gz) > secondgunzip
instead of just
Code:
gunzip addresses_sorted.txt.gz
or
Code:
gunzip -c addresses_sorted.txt.gz > out.txt
?
I was isolating a part from the longer code:
Code:
This:
<(gunzip -c addresses_sorted.txt.gz)
Came from:
....t -f2) <(comm -13 <(gunzip -c addresses_sorted.txt.gz) <(s....
I didn't edit it so it's clear where it came from.

Quote
or another exmple:
Same reason Smiley



I am now pretty sure the inconsistent results were caused by using sort S, --buffer-size=SIZE. I was trying to be smart enforcing efficient memory usage, but I now believe this sometimes showed an error, which was then piped into the next command. If I omit the -S40% part, it works fine. This is actually good news, because it's much faster than my previous method.

mike2077
Newbie
*
Offline Offline

Activity: 18
Merit: 10


View Profile
June 02, 2021, 12:40:38 PM
 #80

Hi LoyceV,

Any news from the provider, when your server is coming back up?
 Do yo have temp location I can download files from ?

Thanks
Pages: « 1 2 3 [4] 5 6 7 8 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!