Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: Nancarrow on September 26, 2015, 11:57:40 AM



Title: blk00xxx.dat file contains ~240kb of zeroes - why?
Post by: Nancarrow on September 26, 2015, 11:57:40 AM
Running core 0.10.2 here - just on and off - to get hold of the whole blockchain. Still ~42 weeks behind. Anyway I'm trawling through the blk00xxx.dat files with a python script, and I find that part way into my blk00065.dat, there's a long string of zeroes. 249209 bytes to be precise. They start just after the end of a previous block, where my python script is obviously expecting to see 0xF9BEB4D9. After them, I do indeed get the magic bytes and it appears the blockdata resumes as normal.

Bitcoin Core doesn't seem to mind, so I presume this is expected behaviour. Is there any reason for it? Should I expect more 'deserts' later on in the blockfiles? Should I be aware of any other 'surprises', along the lines of expecting to see magic bytes but instead seeing something else?

(Oh and the answer to all 'why?' enquiries is: for fun. No, I don't get out much.)


Title: Re: blk00xxx.dat file contains ~240kb of zeroes - why?
Post by: TierNolan on September 27, 2015, 09:29:08 AM
Running core 0.10.2 here - just on and off - to get hold of the whole blockchain. Still ~42 weeks behind. Anyway I'm trawling through the blk00xxx.dat files with a python script, and I find that part way into my blk00065.dat, there's a long string of zeroes. 249209 bytes to be precise. They start just after the end of a previous block, where my python script is obviously expecting to see 0xF9BEB4D9. After them, I do indeed get the magic bytes and it appears the blockdata resumes as normal.

Skipping zeros is the expected behaviour.  In fact, you are supposed to skip any data you find until you hit the magic pattern.  Random data would work too (as long as it didn't contain the magic pattern).

Bitcoin core preallocates the blk*.dat files in large sections.  I am not sure what the size is, but assume it is 16MB.  When core creates a new blk*.dat file, it allocates 16MB of all zeros.  It then overwrites the zeros as new blocks arrive.  If there isn't enough space for new blocks, it allocates another 16MB of zeros at the end of the file.

Eventually, it reaches the threshold for the file size (2GB) and moves to the next file.  The final section of the file is left with any zeros that weren't overwritten with block data.

This method reduces file fragmentation.  The smallest allocation for the file is 16MB.

I didn't realise that it left zeros in the middle of a file though.  What probably happened was that you shutdown the program, so there was zeros left at the end of the file.  When it started again, it started a new section and those zeros were left and never overwritten.  This saves it having to scan the file to find where the last block ended.

Does the section of zeros end at an even location (like 1MB aligned)?


Title: Re: blk00xxx.dat file contains ~240kb of zeroes - why?
Post by: Nancarrow on September 27, 2015, 10:21:28 AM
Thankyou for your very lucid explanation!

The pile of zeroes doesn't seem to be on any clear boundary - starting offset is 0x03fe526d and blocks resume at 0x04021fe6. I think your idea of shutting down midway is probably the right answer.

The python script I'm writing actually scans the block files bitcoin-core generates, and builds a new set of block files with the blocks strictly in order and without gaps. Could I replace the original block files with my new ones and expect bitcoin-core to work just fine? I'm thinking not, as the database files would probably not match up. So, if I deleted everything in the data folders (not using any wallets here) and just plonked my new blk00xxx.dat files there, would core just build a new database from them? I would not want it to try to redownload the whole blockchain from scratch!


Title: Re: blk00xxx.dat file contains ~240kb of zeroes - why?
Post by: moneyart on September 27, 2015, 11:11:37 AM
Where do I find the blk00xxx.dat file and what is it used for? Is the blockchain saved in this file? ???


Title: Re: blk00xxx.dat file contains ~240kb of zeroes - why?
Post by: Chemistry1988 on September 27, 2015, 11:21:27 AM
Where do I find the blk00xxx.dat file and what is it used for? Is the blockchain saved in this file? ???

For the default location of the files, check https://en.bitcoin.it/wiki/Data_directory#Default_Location. Yes, those blk*.dat files contain all the raw bitcoin blocks.


Title: Re: blk00xxx.dat file contains ~240kb of zeroes - why?
Post by: TierNolan on September 27, 2015, 11:22:38 AM
The python script I'm writing actually scans the block files bitcoin-core generates, and builds a new set of block files with the blocks strictly in order and without gaps. Could I replace the original block files with my new ones and expect bitcoin-core to work just fine? I'm thinking not, as the database files would probably not match up.

That's right.  The index stores the location of each block in the blk*.dat files.  It is a levelDb database in the blocks/index directory.

If you change the blk*.dat files, then the index will be out of alignment.

Quote
So, if I deleted everything in the data folders (not using any wallets here) and just plonked my new blk00xxx.dat files there, would core just build a new database from them? I would not want it to try to redownload the whole blockchain from scratch!

That is possible, but you need to use the -loadblock command line switch to do it or call it bootstrap.dat.


Title: Re: blk00xxx.dat file contains ~240kb of zeroes - why?
Post by: Nancarrow on September 27, 2015, 08:41:18 PM
Thanks, I'll give it a go.