Bitcoin Forum
May 04, 2024, 08:06:14 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Different sites report different blockchain sizes?  (Read 310 times)
jnano (OP)
Member
**
Offline Offline

Activity: 301
Merit: 74


View Profile
June 15, 2018, 03:21:00 PM
Last edit: June 15, 2018, 03:35:54 PM by jnano
 #1

Different sites report different blockchain sizes. Some of that might be explained by base-2 or base-10 sizes, but that's not enough to explain the large difference between the extremes.

Is there a site that accurately reports in bytes both the raw size and Core's on-disk size (without the UTXO set)?

201.57 GB (= 187.72 GB2 if it's GB10) - BitInfoCharts (block 527,582)
178.99 GB (= 192.18 GB10 if it's GB2) - Coin Dance
171,292 MB (= 179.61 GB10 if it's MB2) - Blockchain.info (updated once per day)
161.20 GiB (= 173.08 GB10) - Bitcoin.com
1714809974
Hero Member
*
Offline Offline

Posts: 1714809974

View Profile Personal Message (Offline)

Ignore
1714809974
Reply with quote  #2

1714809974
Report to moderator
1714809974
Hero Member
*
Offline Offline

Posts: 1714809974

View Profile Personal Message (Offline)

Ignore
1714809974
Reply with quote  #2

1714809974
Report to moderator
1714809974
Hero Member
*
Offline Offline

Posts: 1714809974

View Profile Personal Message (Offline)

Ignore
1714809974
Reply with quote  #2

1714809974
Report to moderator
"The nature of Bitcoin is such that once version 0.1 was released, the core design was set in stone for the rest of its lifetime." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714809974
Hero Member
*
Offline Offline

Posts: 1714809974

View Profile Personal Message (Offline)

Ignore
1714809974
Reply with quote  #2

1714809974
Report to moderator
1714809974
Hero Member
*
Offline Offline

Posts: 1714809974

View Profile Personal Message (Offline)

Ignore
1714809974
Reply with quote  #2

1714809974
Report to moderator
aleksej996
Sr. Member
****
Offline Offline

Activity: 490
Merit: 389


Do not trust the government


View Profile
June 15, 2018, 10:12:17 PM
 #2

Well I can tell you that the blockchain data on my Ubuntu node is about 183GB while on an OpenBSD node is around 199GB.
Other data, such as chainstate maybe increases these numbers by 2GB.

So I would assume it has to do with the file system as well. I guess not all systems keep data written in the same way.
Thirdspace
Hero Member
*****
Offline Offline

Activity: 1232
Merit: 738


Mixing reinvented for your privacy | chipmixer.com


View Profile
June 15, 2018, 11:52:47 PM
 #3

I'm not sure about this but maybe it's related to segwit stuff?
since the introduction of segwit address, we can save fees on segwit enabled transactions
and I start seeing different sizes on the same transaction reported by explorers, byte and vbyte (virtual byte?)
maybe this is the cause of the discrepancies, byte vs vbyte? and not by the real size on disk use Undecided

achow101
Moderator
Legendary
*
expert
Offline Offline

Activity: 3388
Merit: 6578


Just writing some code


View Profile WWW
June 16, 2018, 12:11:12 AM
Merited by ABCbits (1)
 #4

There are many reasons the reported sizes may be different. It depends on what they are actually measuring as "blockchain size".

A naive way of measuring would be to just take the size of the datadir for a bitcoind instance. However, this is going to include a bunch of extra data which is not actually the blockchain. This would include data like log files, wallet files, the UTXO set database, the block index, and the transaction index if that is enabled.

There may also be a discrepancy if one site does not run a segwit enabled node. Since a Segwit block is physically larger than the stripped block that a non-segwit node would receive, sites that support Segwit will report a larger blockchain.

Additionally, a site may receive more orphan blocks than another site which means that they are storing more blocks on disk. They may be measuring this as well which will cause the reported blockchain size to differ from sites that have received different orphan blocks.

bob123
Legendary
*
Offline Offline

Activity: 1624
Merit: 2481



View Profile WWW
June 16, 2018, 01:36:14 PM
 #5

Additionally, a site may receive more orphan blocks than another site which means that they are storing more blocks on disk. They may be measuring this as well which will cause the reported blockchain size to differ from sites that have received different orphan blocks.

Is there a practical reason for orphaned blocks being stored ? Once a block turns out to be orphaned the valid transactions get added to the mempool again.
But what is the actual reason to further store this block if the information it cointains is no longer needed?

Unfortunately, short googling didn't bring a reasonable answer.


Xynerise
Sr. Member
****
Offline Offline

Activity: 322
Merit: 363

39twH4PSYgDSzU7sLnRoDfthR6gWYrrPoD


View Profile
June 16, 2018, 02:00:23 PM
Merited by achow101 (1), ABCbits (1)
 #6


Is there a practical reason for orphaned blocks being stored ?
In case of a Blockchain reorganisation, perhaps.
Quote
Once a block turns out to be orphaned the valid transactions get added to the mempool again.
But what is the actual reason to further store this block if the information it cointains is no longer needed?
There's no way to tell it's "no longer needed" though.
There's no central arbiter that decides whether a block is needed/valid or not.
Nodes take the longest chain with the highest cumulative POW as the canonical blockchain.
The canonical blockchain could change if nodes receive another block that follows consensus rules but has a higher difficulty than the one they possess, even though the probability of  a blockchain reorg 6 blocks deep is small, it's still theoretically possible.
For example the March 2013 reorg was ~21 blocks deep.
bob123
Legendary
*
Offline Offline

Activity: 1624
Merit: 2481



View Profile WWW
June 16, 2018, 02:23:08 PM
 #7

There's no way to tell it's "no longer needed" though.
There's no central arbiter that decides whether a block is needed/valid or not.

Each node can decide for his own whether a block is need (e.g. part of longest chain) or valid (has to be validated anyways).



Nodes take the longest chain with the highest cumulative POW as the canonical blockchain.
The canonical blockchain could change if nodes receive another block that follows consensus rules but has a higher difficulty than the one they possess, even though the probability of  a blockchain reorg 6 blocks deep is small, it's still theoretically possible.
For example the March 2013 reorg was ~21 blocks deep.

Theoretically it still would not be necessary to store this block. In case of a reorganisation the block(s) can easily be broadcasted / received via the network.
Since nodes do choose the longest chain, why do they need to store an orphaned block from years ago ?

Do most clients (bitcoin core?) really store those orphaned blocks (especially from years ago)? If so, is there a specific reason why it has been decided to keep them ? Or are they just kept because there is no real reason to delete them?

Xynerise
Sr. Member
****
Offline Offline

Activity: 322
Merit: 363

39twH4PSYgDSzU7sLnRoDfthR6gWYrrPoD


View Profile
June 16, 2018, 03:00:12 PM
Merited by LoyceV (1)
 #8

There's no way to tell it's "no longer needed" though.
There's no central arbiter that decides whether a block is needed/valid or not.

Each node can decide for his own whether a block is need (e.g. part of longest chain) or valid (has to be validated anyways).
Okay, that came out wrong.
I meant no node can tell if its block (the block it has received) is not going to be orphaned in the future for a block with a higher POW.

Quote
Theoretically it still would not be necessary to store this block. In case of a reorganisation the block(s) can easily be broadcasted / received via the network.
If no one stores blocks where do nodes get blocks to bootstrap from? Huh
Obviously for a node to receive a block from a peer, another node has to have stored it then sent it.
If you don't store it, you can't send it.

Quote
Do most clients (bitcoin core?) really store those orphaned blocks (especially from years ago)? If so, is there a specific reason why it has been decided to keep them ? Or are they just kept because there is no real reason to delete them?
AFAIK, orphaned blocks are stored in the node's blockchain forever, iff it received it when it wasn't orphaned yet.
No node would upload a block that wasn't part of the canonical chain (ie a new node in the network won't receive old orphaned blocks)
aleksej996
Sr. Member
****
Offline Offline

Activity: 490
Merit: 389


Do not trust the government


View Profile
June 16, 2018, 04:13:46 PM
 #9

There are many reasons the reported sizes may be different. It depends on what they are actually measuring as "blockchain size".

A naive way of measuring would be to just take the size of the datadir for a bitcoind instance. However, this is going to include a bunch of extra data which is not actually the blockchain. This would include data like log files, wallet files, the UTXO set database, the block index, and the transaction index if that is enabled.

There may also be a discrepancy if one site does not run a segwit enabled node. Since a Segwit block is physically larger than the stripped block that a non-segwit node would receive, sites that support Segwit will report a larger blockchain.

Additionally, a site may receive more orphan blocks than another site which means that they are storing more blocks on disk. They may be measuring this as well which will cause the reported blockchain size to differ from sites that have received different orphan blocks.

I run the same version of Bitcoin Core on both of my nodes and one has a significantly bigger blocks directory (~20 GB more).
Do you maybe know where these transaction index data is being stored, since this node with a bigger blocks folder is also the one with transaction index enabled?
bob123
Legendary
*
Offline Offline

Activity: 1624
Merit: 2481



View Profile WWW
June 16, 2018, 05:48:48 PM
 #10

There's no way to tell it's "no longer needed" though.
There's no central arbiter that decides whether a block is needed/valid or not.

Each node can decide for his own whether a block is need (e.g. part of longest chain) or valid (has to be validated anyways).
Okay, that came out wrong.
I meant no node can tell if its block (the block it has received) is not going to be orphaned in the future for a block with a higher POW.

I understand that. But after the block got orphaned, the node definitely knows that it is not part of the longest chain and (theoretically) can be discarded.



Quote
Theoretically it still would not be necessary to store this block. In case of a reorganisation the block(s) can easily be broadcasted / received via the network.
If no one stores blocks where do nodes get blocks to bootstrap from? Huh
Obviously for a node to receive a block from a peer, another node has to have stored it then sent it.
If you don't store it, you can't send it.

I was not refering to blocks which are included in the longest chain. I was refering to orphaned blocks.
In your statement you were talking about reorganization of the latest X blocks.

Nodes do not need to store blocks which are currently not in the longest PoW chain.
If my node is up to sync and someone else broadcasts a new block whose heigt is 1 above the 'current' height, with different 'previous' blocks, i do not need to have those previous blocks (which weren't part of the blockchain until this newest block got broadcasted. I could easily just request the block information from this node (who definitely has the information since a miner built upon it). 



Quote
Do most clients (bitcoin core?) really store those orphaned blocks (especially from years ago)? If so, is there a specific reason why it has been decided to keep them ? Or are they just kept because there is no real reason to delete them?
AFAIK, orphaned blocks are stored in the node's blockchain forever, iff it received it when it wasn't orphaned yet.
No node would upload a block that wasn't part of the canonical chain (ie a new node in the network won't receive old orphaned blocks)

If you, with 'nodes blockchain' refer to the block files (blk*.dat), then this answers my question. Thanks.

jnano (OP)
Member
**
Offline Offline

Activity: 301
Merit: 74


View Profile
June 18, 2018, 10:49:42 PM
 #11

So I would assume it has to do with the file system as well. I guess not all systems keep data written in the same way.
I don't think anyone would count file system overhead as blockchain size. And anyway filesystems wouldn't differ that much in overhead in this case.

I'm not sure about this but maybe it's related to segwit stuff?
The real blockchain includes all the segwit data. Excluding that would be very wrong, and by now segwit has existed for 10 months. I don't think these sites would be that negligent.

Quote
maybe this is the cause of the discrepancies, byte vs vbyte?
vbyte should be just the number used for fee calculations instead of the actual byte size (weight/4).

A naive way of measuring would be to just take the size of the datadir for a bitcoind instance.
Sounds too crude to be likely?

Quote
may receive more orphan blocks than another site which means that they are storing more blocks on disk. They may be measuring this as well
That sounds possible.


So there is no site known to be fully accurate in its reporting, or at least more accurate than the rest?
achow101
Moderator
Legendary
*
expert
Offline Offline

Activity: 3388
Merit: 6578


Just writing some code


View Profile WWW
June 18, 2018, 11:24:02 PM
 #12

I run the same version of Bitcoin Core on both of my nodes and one has a significantly bigger blocks directory (~20 GB more).
Do you maybe know where these transaction index data is being stored, since this node with a bigger blocks folder is also the one with transaction index enabled?
The block and transaction indexes are stored in the blocks folder.

monkeydominicorobin
Full Member
***
Offline Offline

Activity: 294
Merit: 104


✪ NEXCHANGE | BTC, LTC, ETH & DOGE ✪


View Profile
June 23, 2018, 01:37:52 PM
 #13

Different sites report different blockchain sizes. Some of that might be explained by base-2 or base-10 sizes, but that's not enough to explain the large difference between the extremes.

Is there a site that accurately reports in bytes both the raw size and Core's on-disk size (without the UTXO set)?

201.57 GB (= 187.72 GB2 if it's GB10) - BitInfoCharts (block 527,582)
178.99 GB (= 192.18 GB10 if it's GB2) - Coin Dance
171,292 MB (= 179.61 GB10 if it's MB2) - Blockchain.info (updated once per day)
161.20 GiB (= 173.08 GB10) - Bitcoin.com


There are disparities like this since they are using fast synchronizations. Other web application in your list with larger blockchain file sizes are using the full on option. This is just about the parameters that they use when syncing with the Bitcoin blockchain. The disparity that you examining are not intended to modify the whole blockchain itself. It is not even intended to create doubt. They are just avoiding the lag that they will experience if they download the whole blockchain in their blockchain file folder.

DannyHamilton
Legendary
*
Online Online

Activity: 3388
Merit: 4615



View Profile
June 24, 2018, 04:24:48 PM
 #14

So I would assume it has to do with the file system as well. I guess not all systems keep data written in the same way.
I don't think anyone would count file system overhead as blockchain size.

People will do lots of things that you wouldn't think they'd do.

I'm not sure about this but maybe it's related to segwit stuff?
The real blockchain includes all the segwit data. Excluding that would be very wrong, and by now segwit has existed for 10 months. I don't think these sites would be that negligent.

There are plenty of sites that are negligent in many ways.  I wouldn't be surprised if some of the sites you are looking at are "that negligent".

Quote
maybe this is the cause of the discrepancies, byte vs vbyte?
vbyte should be just the number used for fee calculations instead of the actual byte size (weight/4).

"Should be" doesn't mean "is by this site".  Unless a site has given you undeniable proof that they are doing something a particular way, you should consider the possibility that they are doing things in a way that you don't think they should.

A naive way of measuring would be to just take the size of the datadir for a bitcoind instance.
Sounds too crude to be likely?

There are plenty of sites that are crude in many ways.  I wouldn't be surprised if some of the sites you are looking at are likely to do something that crude.

Quote
may receive more orphan blocks than another site which means that they are storing more blocks on disk. They may be measuring this as well
That sounds possible.

Correct.  There is no such thing as THE blockchain.  Everyone has their own blockchain.  Each individual's blockchain is the result of that individual's experiences while connected to the bitcoin network.

So there is no site known to be fully accurate in its reporting, or at least more accurate than the rest?

They are all accurately reporting exactly what they've chosen to report.  You'll need to talk to the programmers of each site to understand exactly what process they are using, how they are generating their calculations, and what decisions they've made.

Then, once you've collected all that information from them (if you can find them and they are even willing to share it), then you can decide for yourself whose count is the closest to the way that YOU would want the size counted.
jnano (OP)
Member
**
Offline Offline

Activity: 301
Merit: 74


View Profile
June 25, 2018, 02:06:07 AM
 #15

Besides the tip area, surely there's only one canonical blockchain. That's the whole idea, no?

My most basic interpretation for "blockchain size" would be the cumulative size of the blocks in the best chain, with block data serialized the same way the
network tallies block sizes. Anything else would have to be explained in detail. Would you have another interpretation?

They are all accurately reporting exactly what they've chosen to report.
They could be buggy. Smiley

There are disparities like this since they are using fast synchronizations.
What's fast synchronization?
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!