Bitcoin Forum

Bitcoin => Bitcoin Technical Support => Topic started by: eugx0 on June 22, 2023, 12:09:46 PM



Title: Hardware requirements for analyzing ouput of rusty-blockparser
Post by: eugx0 on June 22, 2023, 12:09:46 PM
Hello,

I am trying to do some data analytics on the bitcoin blockchain. I am running a node and then parsing the .dat files using the rusty-blockparser.
This produces four .csv files which the biggest one is nearly 100 GB of data. Due to the size,  I can't load it on a python jupyter notebook. I am wondering what common programs and strategies that people use to process this data.

Thanks for the help,

Best,


Title: Re: Hardware requirements for analyzing ouput of rusty-blockparser
Post by: LoyceV on June 23, 2023, 10:48:49 AM
This produces four .csv files which the biggest one is nearly 100 GB of data. Due to the size,  I can't load it on a python jupyter notebook.
You can read the data into some database that does what you need, or use things like grep and split to get only the lines you need.

Quote
I am wondering what common programs and strategies that people use to process this data.
Can you start by sharing what you're trying to accomplish? I usually pipe the data through some Linux command line tools to get what I need, you'll find some examples in those topics:
Bitcoin block data available in CSV format (https://bitcointalk.org/index.php?topic=5246271.0)
List of all Bitcoin addresses with a balance (https://bitcointalk.org/index.php?topic=5254914.0)
List of all Bitcoin addresses ever used (https://bitcointalk.org/index.php?topic=5265993.0)
[~500 GB] Bitcoin block data: inputs, outputs and transactions (https://bitcointalk.org/index.php?topic=5307550.0)


Title: Re: Hardware requirements for analyzing ouput of rusty-blockparser
Post by: DaveF on June 23, 2023, 11:02:38 AM
Jupyter Notebook is also bloaty software.

Along with what ETF and Loyce said. Use the proper tool for the job, there are better ways to get at the data. I'm assuming (yes I know don't assume) you are a student and have to use Jupyter for class so in the end it has to wind up in there. But, you can't just do A->B you need to get it to a size and format that it can handle.

-Dave