Bitcoin Forum
November 08, 2024, 11:01:10 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Reading Block directory : Sequential write ?  (Read 1372 times)
Nicolas Dorier (OP)
Hero Member
*****
Offline Offline

Activity: 714
Merit: 662


View Profile
August 08, 2014, 08:35:36 PM
 #1

In NBitcoin, I have coded a class that allows me to enumerate blocks in the Block Directory folder of bitcoind.

It works perfectly.
Then I am creating an open source indexer like BlockChain.info with stealth and CC support, and here how it works :
-I run bitcoind that maintain the Block Directory.
-Every minutes, the Indexer run, and traverse the Block Directory from its last position, to the end, and save the new position, indexing everything on the way.

It works fine, under the assumption that bitcoind will never append a block to blk5.dat, if its last block file is blk10.dat.

But one of my user seems to tell me that my assumption is wrong, and got a bug because of it.
So I looked at bitcoind code source.

I noticed that LoadBlockIndexDB() method, that is called at startup, retrieve the last position where it wrote a block into the nLastBlockFile and nLastBlockFile global variables.
Quote
    pblocktree->ReadLastBlockFile(nLastBlockFile);
    LogPrintf("LoadBlockIndexDB(): last block file = %i\n", nLastBlockFile);
    if (pblocktree->ReadBlockFileInfo(nLastBlockFile, infoLastBlockFile))
        LogPrintf("LoadBlockIndexDB(): last block file info: %s\n", infoLastBlockFile.ToString());

Then I have seen that when you save a new block, you find the next free position in such file with the FindBlockPos.
The search for free space start from the nLastBlockFile position.

So with this information, I conclude that bitcoind is writing sequentially to the BlockDirectory, and can never write back.

Can a dev would confirm my conclusion or am I missing something ?

Bitcoin address 15sYbVpRh6dyWycZMwPdxJWD4xbfxReeHe
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4270
Merit: 8805



View Profile WWW
August 08, 2014, 10:37:57 PM
 #2

Thats certantly the case today but we make no promise to maintain that in the future, if changing it serve some useful end. The block files are not really a user facing interface. Headers first will make it write to them out of order (but still append only), but pruning may delete whole blocks out from under you and also in the future we may implement things like compression which changes the format.
Nicolas Dorier (OP)
Hero Member
*****
Offline Offline

Activity: 714
Merit: 662


View Profile
August 08, 2014, 11:01:32 PM
 #3

Thanks, I understand this solution is fragile.
However, I don't see any solution yet that permit enumeration of blocks of bitcoind with high performance.
RPC is usable, but at enumeration of 300 000 with RPC is 10 000 times slower than using the blk directory directly.

I don't want either to implement a full node in NBitcoin, this is serious business and any subtle incompatibility with core would provoke a fork.

Is there another solution ? If not, is it possible at least, to expect if it were to change in the future, a flag to bitcoind to always store full blocks in directory ? (but don't use it)

Or a getblocks (with 's') in the RPC API ?

Bitcoin address 15sYbVpRh6dyWycZMwPdxJWD4xbfxReeHe
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4270
Merit: 8805



View Profile WWW
August 08, 2014, 11:57:13 PM
 #4

Thanks, I understand this solution is fragile.
However, I don't see any solution yet that permit enumeration of blocks of bitcoind with high performance.
RPC is usable, but at enumeration of 300 000 with RPC is 10 000 times slower than using the blk directory directly.
I don't want either to implement a full node in NBitcoin, this is serious business and any subtle incompatibility with core would provoke a fork.
Is there another solution ? If not, is it possible at least, to expect if it were to change in the future, a flag to bitcoind to always store full blocks in directory ? (but don't use it)
Or a getblocks (with 's') in the RPC API ?
You can speak the P2P protocol just to fetch blocks— right now this is the fastest way... Note that I'm not suggesting you implement a full node (you are wise to avoid that), but instead use bitcoind as a filter and fetch blocks over the p2p protocol.

RPC getblock"s" would likely not be a lot faster due to the fact that much of the time is spent on the JSON handling.
Nicolas Dorier (OP)
Hero Member
*****
Offline Offline

Activity: 714
Merit: 662


View Profile
August 11, 2014, 01:20:29 AM
 #5

After benchmarking, on my machine,
One whole scan on blk folder takes 7 minutes.
A local full download with the protocol (not RPC) take between 3 and 6H. But I might improve a bit with some multi threading.

I guess I will continue to scan the folder directly for now. I'll follow up any change in block storage format on github.
If I hit a problem, I'll make a custom process that will maintain its own block directory thanks to protocol connection to the local, trusted node.
Thanks,

Bitcoin address 15sYbVpRh6dyWycZMwPdxJWD4xbfxReeHe
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!