Bitcoin Forum
May 21, 2024, 08:55:18 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Extracting data from a blk file.  (Read 247 times)
BlackHatCoiner (OP)
Legendary
*
Online Online

Activity: 1526
Merit: 7401


Farewell, Leo


View Profile
December 16, 2020, 07:24:33 AM
 #1

A block file (blk.dat) is not in human readable form. I wanted to know if there is a way to extract the data of it. For example converting it into text with readable block header, version number, last block, merkle root, time, target and all of the transactions that are included in that block. The same way bitcoin core translates it to JSON.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
mamuu
Member
**
Offline Offline

Activity: 71
Merit: 19


View Profile
December 16, 2020, 09:20:08 AM
Merited by BlackHatCoiner (1)
 #2

A block file (blk.dat) is not in human readable form. I wanted to know if there is a way to extract the data of it. For example converting it into text with readable block header, version number, last block, merkle root, time, target and all of the transactions that are included in that block. The same way bitcoin core translates it to JSON.

Hi, you can use this for python -> https://github.com/ragestack/blockchain-parser


1DWA3Sa8i6eHVWV4AG4UP2SBhYB2XrfiHW
PawGo
Legendary
*
Offline Offline

Activity: 952
Merit: 1367


View Profile
December 16, 2020, 02:11:03 PM
 #3

There is an interesting tool if you want to dump data to cvs file for analysis:
https://github.com/gcarq/rusty-blockparser
NotATether
Legendary
*
Offline Offline

Activity: 1610
Merit: 6753


bitcoincleanup.com / bitmixlist.org


View Profile WWW
December 16, 2020, 03:42:52 PM
Merited by vapourminer (2), ABCbits (2), Heisenberg_Hunter (1)
 #4

According to https://learnmeabitcoin.com/technical/blkdat, blk*.dat files are just composed of an array of magic bytes, sizes that reveals how large the rest of the immediate BlockTransaction is, and BlockTransaction structures consisting of a block header, number of transactions and then the transactions themselves. It is very easy to build a parser for it in your language of choice, because the contents after each magic and size bytes pair are the BIP152 BlockTransaction structure I am referring to.

There are multiple blocks stored in a blk.dat file separated by the magic bytes and size of the BlockTransaction immediately after it, so long as the blk.dat size stays under 128MB. It's faster to read from a handful of files than putting each block in its own file and reading every single one of them. It's possible to read the whole 128MB in one shot for example.

I think 0xd9b4bef9 is the magic bytes from looking at the example block in the link above. If you encounter this in the middle of a file then that marks the beginning of a new block. Then the size of the BlockTransaction in hex (0x011d, which is an example size) followed by the block header which starts with a version field.

This link https://en.bitcoin.it/wiki/Protocol_documentation will be very helpful for you to understand the fields in each structure.

Keep in mind that all data in the file is in little endian, so if you make a parser to read it yourself, you have to reverse the order of the bytes for 32-bit fields, 16-bit fields etc. e.g. 0x12345678 becomes 0x78563412.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
DougM
Full Member
***
Offline Offline

Activity: 173
Merit: 120


View Profile
December 17, 2020, 12:04:02 AM
Merited by vapourminer (2)
 #5

A block file (blk.dat) is not in human readable form. I wanted to know if there is a way to extract the data of it. For example converting it into text with readable block header, version number, last block, merkle root, time, target and all of the transactions that are included in that block. The same way bitcoin core translates it to JSON.
I did something just like that last summer using Python using github parser code to get me 95% where I wanted to go.  I was parsing the earliest blocks to do some analysis on coinbase transactions so my code likely doesn't handle the latest blocks as is however.  I parsed the entire block file  in a matter of minutes (granted I was parsing early smaller ones!) and harvested the fields I was interested in to sqllite database for post processing.

The following reference was useful to understand the fields and how they were encoded.

https://developer.bitcoin.org/reference/block_chain.html

I tried a number of parsers and code examples until I got one that worked for what I was doing.  I think this one was one of the more useful ones because it was standalone and I could review and edit any of the code if I needed. I didn't want to trust a 'black box' running on my PC  Wink

https://github.com/ragestack/blockchain-parser/
https://github.com/ragestack/blockchain-parser/blob/master/README.md
Quote
Blockchain parser
Author: Denis Leonov 466611@gmail.com

Simple script for parsing blkXXXXX.dat files of Bitcoin blockchain database.

This script also compatible with most of altcoins, after making some tiny tricks.

The one realisation of blockchain parser that allows you to explore the main database as close as possible.
Don't worry to email me your questions or suggestions about this parser.

No dependencies, no third-parties modules or libs needed. Just install Python standart release and run.

Make sure you change the paths for blkXXXXX.dat files and for the parsing results to yours. The script works only with fully downloaded blockchain dat files (that are ~134Mb).

This script convert the full blockchain raw database that is stored in blkXXXXX.dat files to the simple txt view.

If this was helpfull for you, don't hesistate to make a donations!!!
Bitcoin (BTC): 1FvssyzXNnmgHbJg2DYwb7rkzTrtT8adcL
a_6apcyk
Jr. Member
*
Offline Offline

Activity: 35
Merit: 10


View Profile WWW
December 17, 2020, 05:15:20 PM
Merited by ABCbits (1)
 #6

I understand that the question is about reading blk files, but why do you need it ? You can ask bitcoin RPC all the data stored in these files. 
I also tried to read blk files last summer, and also found the tool https://github.com/ragestack/blockchain-parser/. But after that I found another tool https://github.com/blockchain-etl/bitcoin-etl. It can connect to node and get data about blocks, transactions, inputs and outputs. And as for me it works great.
BlackHatCoiner (OP)
Legendary
*
Online Online

Activity: 1526
Merit: 7401


Farewell, Leo


View Profile
December 18, 2020, 08:02:39 PM
 #7

I understand that the question is about reading blk files, but why do you need it ? You can ask bitcoin RPC all the data stored in these files. 
Well to answer that, first of all, as ETFBitcoin wrote, to understand how it works. Secondly, I was thinking of creating a different block explorer. Right now, block explorers respond with RPC commands and return their result to the client. I was thinking of inserting the blocks' information to a mysql database. This way, RPC is not needed.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
a_6apcyk
Jr. Member
*
Offline Offline

Activity: 35
Merit: 10


View Profile WWW
December 18, 2020, 10:00:57 PM
Merited by vapourminer (3), ABCbits (1)
 #8

I understand that the question is about reading blk files, but why do you need it ? You can ask bitcoin RPC all the data stored in these files. 
Secondly, I was thinking of creating a different block explorer. Right now, block explorers respond with RPC commands and return their result to the client. I was thinking of inserting the blocks' information to a mysql database. This way, RPC is not needed.

I'm pretty sure that block explorers already use internally some database, I think it is NoSQL databases. I personally did this work some time before - I loaded BTC blockchain to Postgres and it was bad idea. Because for example 438 600 blocks contains  481 744 165 transactions and they contains 1 285 285 104 outputs. So you have huge tables and problems with indexing. Than i decided to use Elasticsearch and this was mush better. So you should think what database would be better to use to solve this problem.
As for RPC - I used bitcoin core rpc only to extract block data and write it to elastic. And I found bitcoinetl the best way to extract it.

 
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!