Title: LoyceV's small Linux commands for handling big data Post by: LoyceV on April 19, 2022, 10:44:38 AM I like simple (Linux) commands that process a lot of data. A console is much more powerful than a GUI.
I've posted them in various topics, but from now on will collect them here. Use them at your own risk. Warning Don't just copy paste anything (https://www.cyberciti.biz/faq/understanding-bash-fork-bomb/) you find online into a console! Try to understand it first. Self-moderated No spam please. Questions are of course okay. Adding content is appreciated. Overview
Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on April 19, 2022, 11:08:31 AM Get pubkeys out of Bitcoin block data (https://bitcointalk.org/index.php?topic=5307550.0) (this was requested here (https://bitcointalk.org/index.php?topic=5254914.msg59902358#msg59902358)).
Note: this list is not meant for verbatim copy/pasting, it's my own notes of what I did (more or less). Get outputs data (currently 148 GB) Code: wget -r blockdata.loyce.club/outputs/ Get all addresses with pubkey Code: for day in `ls outputs/*gz`; do echo $day; gunzip -c $day | grep -v is_from_coinbase | grep -v pubkeyhash | grep pubkey | cut -f 1,2,4-11 >> output.txt; done Get currently funded Bitcoin addresses and their balance (1 GB) Code: wget addresses.loyce.club/blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz Get all unique addresses that are in both lists Code: comm -12 <(cat output.txt | cut -f6 | sort -u -S40%) <(gunzip -c blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz | grep -v balance | cut -f1 | grep "^1" | sort -S40% ) > list Get list of balances, addresses and pubkeys Code: gunzip -c blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz | grep "^1" | sort -rS60% | uniq -w30 -d > address_and_balance Title: Re: LoyceV's small Linux commands for handling big data Post by: DeepComplex on April 25, 2022, 10:51:10 PM Hi LoyceV,
Can you do something similar to what you did for the outputs? Can you process the inputs data dumps by extracting the following columns from them: recipient, type=pubkeyhash only, spending_witness. Once done, remove duplicate entries and cross-reference with the "List of all funded Bitcoin addresses" keeping only the recipients with a positive balance. Compare the recipients with balance_addy_pubkey.txt (http://balance_addy_pubkey.txt) and the difference will be the results I'm looking for. The spending_witness will contain the pubkey of those pubkeyhash addresses (truncating the 1st 148 characters will leave only the pukey remaining). The instructions would be useful too. Regards, Code: 53225 24cf2dedab3c7898ec0f0532e177f15f41984b0c820202bcb143e43eec0c25d2 1 2010-04-27 00:08:11 4106000000 0.4106 15Z5YJaaNSxeynvr6uW6jQZLwq3n1Hu6RX pubkeyhash 76a91431f19a7d0379f56cb3be0761c21f1f0c9553a47f88ac 0 -1 53241 8084468e05c6faa4029fbfd7b9b9d33e2274e9f40634207bcf86197fc6f83af5 1 2010-04-27 02:45:48 0.4106 4294967295 483045022069e95f67cc6fed7db01885d76e3294e443d9833316228bbd057f0f3a3bdd51630221009ef89fa8c34f37a245333c67d56feaef27b26651c4a30b6499e1bc386337ca8f0141047a51392bace353f4c3788c9c090ef4f635ec211159ec3b9f1bb7da7679517e126e98e0012bcb4d2b023c479afaaa1ad703ea1b24e1910e2cdad38744ba7aab8a 9457 4.494264120370371 Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on April 26, 2022, 08:56:45 AM Can you process the inputs data dumps by extracting the following columns from them: recipient, type=pubkeyhash only, spending_witness. I did this:Code: for day in `inputs/*gz`; do echo $day; gunzip -c $day | cut -f7,8,19 | grep -v spending_witness | grep pubkeyhash | grep -vP "\t$" >> output.txt; done Update: as expected: no output. Quote Code: 53225 24cf2dedab3c7898ec0f0532e177f15f41984b0c820202bcb143e43eec0c25d2 1 2010-04-27 00:08:11 4106000000 0.4106 15Z5YJaaNSxeynvr6uW6jQZLwq3n1Hu6RX pubkeyhash 76a91431f19a7d0379f56cb3be0761c21f1f0c9553a47f88ac 0 -1 53241 8084468e05c6faa4029fbfd7b9b9d33e2274e9f40634207bcf86197fc6f83af5 1 2010-04-27 02:45:48 0.4106 4294967295 483045022069e95f67cc6fed7db01885d76e3294e443d9833316228bbd057f0f3a3bdd51630221009ef89fa8c34f37a245333c67d56feaef27b26651c4a30b6499e1bc386337ca8f0141047a51392bace353f4c3788c9c090ef4f635ec211159ec3b9f1bb7da7679517e126e98e0012bcb4d2b023c479afaaa1ad703ea1b24e1910e2cdad38744ba7aab8a 9457 4.494264120370371 Title: Re: LoyceV's small Linux commands for handling big data Post by: DeepComplex on April 26, 2022, 01:08:57 PM Hi LoyceV,
Yes, I'm looking for spending_signature_hex. Sorry about the mixup. Kindly let it process all the blocks up to the current one. Thanks again Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on April 26, 2022, 04:33:44 PM <this messy post is work in progress>
Yes, I'm looking for spending_signature_hex. Sorry about the mixup. I'm currently running this:Code: for day in `ls inputs/*gz`; do echo $day; gunzip -c $day | cut -f7,8,18 | grep -v spending_signature_hex | grep pubkeyhash | grep -vP "\t$" >> output2.txt; done This is a bit smaller: Code: for day in `ls /var/www/blockdata.loyce.club/public_html/inputs/*gz`; do echo $day; gunzip -c $day | cut -f7,8,18 | grep -v spending_signature_hex | grep pubkeyhash | grep -vP "\t$" | cut -f1,3 >> output2.txt; done MrFreeDragon (https://bitcointalk.org/index.php?topic=5265993.msg56210642#msg56210642) already described what we're looking for here. I'll try this: Code: for day in `ls /var/www/blockdata.loyce.club/public_html/inputs/*gz`; do echo $day; gunzip -c $day | cut -f7,8,18 | grep -v spending_signature_hex | grep pubkeyhash | grep -vP "\t$" | cut -f1,3 > /dev/shm/tmp.file; paste <(cat /dev/shm/tmp.file | cut -f1) <(cat /dev/shm/tmp.file | cut -f2 | cut -c149-) | sort -u -S40% >> output2.txt; rm /dev/shm/tmp.file; done I'll let it run overnight. Update: This produces different results: Code: 1zxhKVZtMBt8kf7km2shn2mkR4NLHGigT 0405eec604993048314294f7c1f9b45c3ed8424ef940426336153831f8813228a788f845e1df353c2021174573e33f2fab05d94e1dd5e5449832ec83ac3d5db17e Another example with full data: Code: block_id transaction_hash index time value value_usd recipient type script_hex is_from_coinbase is_spendable spending_block_id spending_transaction_hash spending_index spending_time spending_value_usd spending_sequence spending_signature_hex spending_witness lifespan cdd Title: Re: LoyceV's small Linux commands for handling big data Post by: DeepComplex on April 26, 2022, 06:55:53 PM Hi, I have no idea why it is doing that.
Is it possible to keep with one of the lines? I'd probably guess that the address will be empty anyways or might have to do some manual on the data and the final comparison for a positive balance. Regards, Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on April 28, 2022, 08:27:52 AM Hi, I have no idea why it is doing that. I'd need to know which one to keep, and even better: how to decide that from the raw data. The current output is 81 GB (815,497,912 lines), which is too large to sort on this server (only 93 GB disk space remaining). Sorting is needed to remove all duplicates, which is needed to reduce the file size before checking for addresses with a balance.Is it possible to keep with one of the lines? Title: Re: LoyceV's small Linux commands for handling big data Post by: DeepComplex on April 28, 2022, 04:48:19 PM Keep the records that are 131 and 66 characters long. 131 chars are for the uncompress pubkey and 66 chars for the compressed pubkey. The uncompressed pubkey prefix is 04 and the compressed pukey is 02 or 03.
Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on April 28, 2022, 05:40:46 PM Keep the records that are 131 and 66 characters long. 131 chars are for the uncompress pubkey and 66 chars for the compressed pubkey. The uncompressed pubkey prefix is 04 and the compressed pukey is 02 or 03. This one is 130 characters:Code: 0405eec604993048314294f7c1f9b45c3ed8424ef940426336153831f8813228a788f845e1df353c2021174573e33f2fab05d94e1dd5e5449832ec83ac3d5db17e Could it be the 148 characters isn't always the same? Compare those 2: Code: 483045022100a83ca95b6b3153c5fce971c1eebbeebc892ba6c297157c326a8359c9b408ce1902201904060ce4e1fbd455403546232779dc9ca7bfe3582d3055270f27f245575d0901410421557041f930252b79b0fa28e6587680053b3a3672ff0c1dca6a623c79bdc0b6125a7a2be5450e28e49731ba8f60231dd8eceeff170923717d97a1ca5a67acd4 Title: Re: LoyceV's small Linux commands for handling big data Post by: DeepComplex on April 28, 2022, 10:32:44 PM It seems like it's not always 148 chars that can be truncated.
Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on April 29, 2022, 07:45:46 AM It seems like it's not always 148 chars that can be truncated. It seems like it :) Combined with the fact that only a part of the spending_witness data is the pubkey, and that the pubkey length itself can vary, I don't know how to proceed. If you can figure it out, I'll continue this, but I don't have the time to search for it myself now.Title: Re: LoyceV's small Linux commands for handling big data Post by: PawGo on April 29, 2022, 02:51:15 PM Keep the records that are 131 and 66 characters long. 131 chars are for the uncompress pubkey and 66 chars for the compressed pubkey. The uncompressed pubkey prefix is 04 and the compressed pukey is 02 or 03. This one is 130 characters:Code: 0405eec604993048314294f7c1f9b45c3ed8424ef940426336153831f8813228a788f845e1df353c2021174573e33f2fab05d94e1dd5e5449832ec83ac3d5db17e Could it be the 148 characters isn't always the same? Compare those 2: Code: 483045022100a83ca95b6b3153c5fce971c1eebbeebc892ba6c297157c326a8359c9b408ce1902201904060ce4e1fbd455403546232779dc9ca7bfe3582d3055270f27f245575d0901410421557041f930252b79b0fa28e6587680053b3a3672ff0c1dca6a623c79bdc0b6125a7a2be5450e28e49731ba8f60231dd8eceeff170923717d97a1ca5a67acd4 Hmmm It is not like leading zero removed somewhere during the process? I though pubkeys should always have 65 or 33 bytes (including the one for flag). Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on April 29, 2022, 04:29:34 PM It is not like leading zero removed somewhere during the process? I'm not sure. If there are leading zeros when the pubkey is "shorter", I may be able to include them by simply counting from the right instead of from the left.I though pubkeys should always have 65 or 33 bytes (including the one for flag). Title: Re: LoyceV's small Linux commands for handling big data Post by: DeepComplex on April 29, 2022, 05:11:04 PM That is a good idea. You'll have to search for both the uncompressed and compressed keys in the list.
It is not like leading zero removed somewhere during the process? I'm not sure. If there are leading zeros when the pubkey is "shorter", I may be able to include them by simply counting from the right instead of from the left.I though pubkeys should always have 65 or 33 bytes (including the one for flag). Title: Re: LoyceV's small Linux commands for handling big data Post by: DeepComplex on May 01, 2022, 11:55:52 PM Hi LoyceV,
Any updates? Regards, Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on May 02, 2022, 06:35:27 AM Any updates? Nope:I don't know how to proceed. If you can figure it out, I'll continue this, but I don't have the time to search for it myself now. Title: Re: LoyceV's small Linux commands for handling big data Post by: iceland2k14 on May 08, 2022, 07:50:36 AM @LoyceV The Values in Sigscript (Contains R,S, Pubkey) is not fixed but they have defined structure. One piece of the Structure is As shown by @MrFreeDragon in this Link https://pastebin.com/Q55PyUgB (https://pastebin.com/Q55PyUgB)
But even in this structure the length is not always 0x21 or 0x20 or 0x41. it varies and therefore the length of R and S and Pubkey will vary. You will need to use dynamic sizing variables to extract them. Perhaps use a Awk script or Python. That might be easier. Don't know if the Bash Shell can do all of it. The basic way to decode and extract the variable size of the data can be taken by following code below... Code: def get_rs(sig): Code: script: 8b4830450221008bf415b6c4bc7118a1d93ef8f6c63b0801d9abe2e41e390670acf9677ee58e5602200da3df76f11ae04758c947a975f84dd7dba990e00c146b451dc4fa514c6cb52d01410421557041f930252b79b0fa28e6587680053b3a3672ff0c1dca6a623c79bdc0b6125a7a2be5450e28e49731ba8f60231dd8eceeff170923717d97a1ca5a67acd4 This way you can not only extract all the Pubkeys but can also extract all the R & S values of the Signature, if needed. Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on May 15, 2022, 02:37:18 PM @iceland2k14: thanks, but it feels like I'm in over my head. I've abandoned the pubkey project (at least for now).
It looks like I'm going to need to learn using a database. Let's say I have a list like this: Code: sender recipient value fee Given that I know nothing about databases, how would I start doing this? Is it going to be a problem if the database is larger than my RAM? If needed, I can (easily) split this list up into 2 lists: one with Sender, Value and Fee, and the other with Recipient and Value. @TryNinja: Considering the performance you managed to get on ninjastic.space, I think you're the right person to ask :) Allow me to notify you :) To make it easier to understand what I need, I can turn the above table into this: Code: 0x930509a276601ca55d508cb5983c2c0d699fd7e9 1 Sorting gives this: Code: 0x1c7e19f5283aa41a496c1f351b36e96dbaad507f -42016257624091770 Main question: how do I put this in .db format? Title: Re: LoyceV's small Linux commands for handling big data Post by: PawGo on May 15, 2022, 05:07:00 PM I do not really understand what do you mean as a "database". Do you think about any particular implementation? What do you mean by ".db format"?
Why not to launch mysql or maybe better postgresql server? Loading file like that to database table is trivial. RAM has nothing to do with that I think. I mean - it helps, but is not a blocking constraint. Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on May 15, 2022, 05:14:56 PM I do not really understand what do you mean as a "database". Do you think about any particular implementation? What do you mean by ".db format"? That confirms that I know nothing about databases :(Quote Why not to launch mysql or maybe better postgresql server? Loading file like that to database table is trivial. I've heard of mysql, but not PostgreSQL.RAM has nothing to do with that I think. I mean - it helps, but is not a blocking constraint. "Trivial" sounds great :D But I have no idea how :P Google shows this (https://stackoverflow.com/questions/18223665/postgresql-query-from-bash-script-as-database-user-postgres), if that's the right track I can try it. Any idea how to handle duplicate addresses: 1 address with 2 balances that have to be added together? Title: Re: LoyceV's small Linux commands for handling big data Post by: PawGo on May 15, 2022, 05:31:03 PM Mysql is also good, they give https://www.mysql.com/products/workbench/
I will not force you to install MS SQLServer or any monster from Oracle. Let's say you decide to use postgresql. Then you receive a very nice client - pgAdmin https://www.pgadmin.org/ Using tool like that will be very helpful for you. Then you may for example: create table (tx id, address, balanceChange), txid could be our primary key (unique), you should also create index on recipient, as you will launch search using that field. load data: https://sunitc.dev/2021/07/17/load-csv-data-into-a-database-table-mysql-postgres-etc/ (create index after you load data, otherwise loading will take ages) Then you may very easily check balance change (delta) for each recipient: Code: select address, sum(balance) from tableName group by address Just try to list all the possible use cases, think what do you need, how you want to use it - to build a correct data model. It may be the most difficult task - just not to duplicate data, etc https://www.guru99.com/relational-data-model-dbms.html https://en.wikipedia.org/wiki/Database_normalization But if you start, sky is the limit ;) it will be much easier than playing with text files. Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on May 15, 2022, 05:59:46 PM Thanks! I only now realize I can just add all balances for each address, and sum them later when needed.
The drawback of such a versatile database is that I have a lot of catching up to do. Thanks for the links, I'll see if I can get something working tomorrow :) Update: I don't need a database anymore, I'll stick to what I know: clear text :) Title: Re: LoyceV's small Linux commands for handling big data Post by: DeepComplex on May 21, 2022, 01:34:40 PM I also prefer the clear tex model.
Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on August 20, 2022, 11:30:30 AM Question
I have a file (300 MB) with 8.9 million Bitcoin addresses. I also have a directory (67 GB) with all Bitcoin addresses. I want to know which address from the file is in the direcory more than once. I use this: Code: grep -hf addresses.txt alladdys/* | sort | uniq -d > output.txt What would be a better solution? Title: Re: LoyceV's small Linux commands for handling big data Post by: seoincorporation on August 20, 2022, 04:38:37 PM ... Main question: how do I put this in .db format? You don't have to put it in a DB format at all because you can import text files to a data base. The tric is to use tabs and not space betweet the address and the balance. Code: echo "hello word" | sed -e 's/ /\t/g' Once you have changed that, then you can load it in to a table with: Code: LOAD DATA INFILE '/tmp/addys.txt' INTO TABLE AddresTable; Source: https://stackoverflow.com/questions/13579810/how-to-import-data-from-text-file-to-mysql-database Title: Re: LoyceV's small Linux commands for handling big data Post by: PawGo on August 21, 2022, 10:23:55 AM Question I have a file (300 MB) with 8.9 million Bitcoin addresses. I also have a directory (67 GB) with all Bitcoin addresses. I want to know which address from the file is in the direcory more than once. Hi, I do not understand what do you mean by "directory" - is it a file with addresses? Or directory on hdd where each file has name like an address? Then how you may have the same address twice? I have prepared a small program for you: https://github.com/PawelGorny/Loyce60787783 It reads into memory list of addresses and then reads "directory" file with addresses - if address exists in memory, is marked, if the same address in hit for the second time, is removed from memory and saved to file. If you want to calculate how many times address was hit, the change is needed. Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on August 21, 2022, 11:46:18 AM I do not understand what do you mean by "directory" - is it a file with addresses? Or directory on hdd where each file has name like an address? Then how you may have the same address twice? It's a directory with files. Each file has all Bitcoin addresses that were used that day, some of them more than once.To my surprise, my grep script actually completed! It used up all RAM and added some swap, and after 24 hours of high load, it's done :D Quote I have prepared a small program for you: Thanks for this! I tried to test it, but I don't really want to install java on the server just for this. I am curious how this would perform though.https://github.com/PawelGorny/Loyce60787783 It reads into memory list of addresses and then reads "directory" file with addresses - if address exists in memory, is marked, if the same address in hit for the second time, is removed from memory and saved to file. Title: Re: LoyceV's small Linux commands for handling big data Post by: PawGo on August 21, 2022, 11:54:37 AM I do not understand what do you mean by "directory" - is it a file with addresses? Or directory on hdd where each file has name like an address? Then how you may have the same address twice? It's a directory with files. Each file has all Bitcoin addresses that were used that day, some of them more than once.Ok, I understand now. That way I may change program to process all the files from the given directory, not only the one file (daily snapshot). The question would be if you look for double hits per day/per file, or totally in any of files. If you change your mind, give it a try, maybe it will use less resources. I do not know how much memory will take 8.7mln addresses. Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on August 21, 2022, 12:11:39 PM The question would be if you look for double hits per day/per file, or totally in any of files. The total :) For now, I got it covered.I can try something else too: if I use the list of addresses that are used more than once, it's only 3 GB (instead of 67), and I can search against that list. That was slower in my initial test, but that didn't cause memory problems, and I have to do it less often so it may pay off. I'm looking for all addresses funded with 1, 2, 4, ...., 8196 mBTC in one transaction, that don't have more transactions than 1 funding and (possibly) one spending. I want to count how many of those chips exist on each day. It could be a good measure for privacy. Title: Re: LoyceV's small Linux commands for handling big data Post by: citb0in on August 30, 2022, 03:23:16 PM Hello all and thanks to LoyceV providing this great ressource of information. For a certain query I'd like to have a file containing all addresses which
either are funded (=positive balance) or had an output in the past (=sent some coins to someone) Is it possible somehow to generate such a big file with this data which I could use for a query? Alternatively, I don't mind having two separate files: one that already exists <blockchair_bitcoin_addresses_and_balance_LATEST.tsv> and one additional which contains all addresses with outputs. I could run my query agains both of those, that would certainly do the job. I'm grateful for any helpful information. Thank you so much! Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on August 30, 2022, 04:29:20 PM are funded (=positive balance) See List of all Bitcoin addresses with a balance (https://bitcointalk.org/index.php?topic=5254914.0).Quote had an output in the past (=sent some coins to someone) Interesting, I don't have such a list, but it should be quite easy to get from outputs (http://blockdata.loyce.club/outputs/blockchair_bitcoin_outputs_20111111.tsv.gz).Quote Is it possible somehow to generate such a big file with this data which I could use for a query? Are you sure you're not looking for "and" instead of "or", so all addresses that sent funds before, and still hold a balance?Quote I could run my query agains both of those, that would certainly do the job. Adding them together and removing duplicates is easy.To be clear: outputs (http://blockdata.loyce.club/outputs/blockchair_bitcoin_outputs_20111111.tsv.gz): Would 17aA19GvhzMHsq8xPwSXAPZutyr6kuzLEB and 1KPxwAbFVoDimPrVECF2zgiyfX9jGW9TCy be what you're looking for?Code: block_id transaction_hash index time value value_usd recipient type script_hex is_from_coinbase is_spendable Title: Re: LoyceV's small Linux commands for handling big data Post by: citb0in on August 30, 2022, 04:43:39 PM Not really. I am interested in addresses like that:
address, balance, outputs 1aDdressExampLeFundedxXxx, 123456, 789 bc1qnotfundedbutspent0utput, 0, 3 Addresses with balance=0 AND outputs=0 should not be listed. Only those matching this condition if balance>0 OR (balance=0 AND outputs>0) Title: Re: LoyceV's small Linux commands for handling big data Post by: LoyceV on August 30, 2022, 05:40:10 PM Addresses with balance=0 AND outputs=0 should not be listed. No balance and no outputs, that means the address is unused. Those aren't in any of the data dumps.Quote Only those matching this condition That list I have :)if balance>0 Quote OR (balance=0 AND outputs>0) I'm confused: why would the 2 addresses I gave above not qualify for this?Title: Re: LoyceV's small Linux commands for handling big data Post by: seoincorporation on August 30, 2022, 05:50:49 PM Hello LoyceV, i have been working in the Address to HASH160 conversion and i made some scripts that i would like to add to your Linux Commands.
Script to get all the HASH160 from the addyBalance.tsv file from address starting with 1. Code: for a in $(cat addyBalance.tsv | cut -f1 | sed '/^b/d' | sed '/^1/d') Script to get all the HASH160 from the addyBalance.tsv file from address starting with bc1. Code: for a in $(cat addyBalance.tsv | cut -f1 | sed '/.\{70\}/d' | sed '/^3/d' | sed '/^1/d') Script to get all the HASH160 from the addyBalance.tsv file from address starting with 1. Run: You can print the HASH with: Code: sh addy.sh Code: sh addy.sh > a.txt The script prints an error because the first word in the file is 'Addres', but it works fine: Code: $ sh addy.sh And I made a small script for single address too: sh bc.sh bc1qBitcoinAddress Code: python3 -c "import bech32; hash1 = bech32.decode(\"bc\", \"$1\"); hash2 = bytes(hash1[1]); print(hash2.hex())" sh 1.sh 1BitcoinAddress Code: python3 -c "import binascii, hashlib, base58; hash160 = binascii.hexlify(base58.b58decode_check(b'$1')).decode()[2:]; print(hash160)" You will need the Python dependencies to run this script. Code: pip install base58 bech32 binascii hashlib Title: Re: LoyceV's small Linux commands for handling big data Post by: citb0in on August 31, 2022, 06:19:59 AM @seoincorporation: thanks for the scripts you provided, but: this should be very time-consuming and slow. Imagine you would run your script against the file of LoyceV which contains all funded addresses (1.8 GB file size). It would take weeks (?) until it gets finished ? what do you think, any ways for optimization ?
Title: Re: LoyceV's small Linux commands for handling big data Post by: seoincorporation on September 01, 2022, 03:55:28 AM @seoincorporation: thanks for the scripts you provided, but: this should be very time-consuming and slow. Imagine you would run your script against the file of LoyceV which contains all funded addresses (1.8 GB file size). It would take weeks (?) until it gets finished ? what do you think, any ways for optimization ? Code: $ cat addyBalance.tsv | cut -f1 | sed '/.\{70\}/d' | sed '/^3/d' | sed '/^1/d' |wc -l I know the data is big 1.1 Million addys starting with 1, but i don't think it would take weeks. I replace cat with head -n 10000, and with the time command i get: Code: real 0m37.877s So, 10,000 on 40 seconds, that's 4,000 seconds for 1 million, that's 66 minutes, or a little more than 1 hour. I think it should be faster if you do it all with Python and not calling python from Bash as i did. |