Bitcoin Forum
May 12, 2024, 07:22:18 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Data compression  (Read 582 times)
mda (OP)
Member
**
Offline Offline

Activity: 144
Merit: 13


View Profile
November 09, 2017, 01:49:36 AM
Merited by ABCbits (1)
 #1

Is anybody working on this https://bitcointalk.org/index.php?topic=1533714.0 ?
A compression ratio of 50% equals 3.5 years of traffic growth at 23% CAGR https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/vni-hyperconnectivity-wp.html .
1715498538
Hero Member
*
Offline Offline

Posts: 1715498538

View Profile Personal Message (Offline)

Ignore
1715498538
Reply with quote  #2

1715498538
Report to moderator
1715498538
Hero Member
*
Offline Offline

Posts: 1715498538

View Profile Personal Message (Offline)

Ignore
1715498538
Reply with quote  #2

1715498538
Report to moderator
1715498538
Hero Member
*
Offline Offline

Posts: 1715498538

View Profile Personal Message (Offline)

Ignore
1715498538
Reply with quote  #2

1715498538
Report to moderator
"I'm sure that in 20 years there will either be very large transaction volume or no volume." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
haltingprobability
Member
**
Offline Offline

Activity: 98
Merit: 26


View Profile
November 09, 2017, 04:19:36 AM
 #2


Most of the data in Bitcoin blocks and mempool transactions is incompressible, while some of it is trivially compressible (certain format fields, etc.) I am guessing that the Core devs would be interested in more aggressive compression using Merkle hashes. Since it is basically the case that the entire blockchain is an immutable, read-only structure (except just when a new block arrives), the only time you need to transmit raw data is for new transactions and the latest block. I doubt that the bandwidth for these is a bottleneck, so that 25% is probably not worth the cost of optimization. For other kinds of synchronization between nodes, all you need to transmit are the hashes.
elbandi
Hero Member
*****
Offline Offline

Activity: 525
Merit: 529


View Profile
November 09, 2017, 04:27:46 PM
 #3

i "move" blk*.dat and rev*.dat file to a squashfs file, and remount back to bitcoin core, here is the stats:

Code:
# du -hs bitcoin-blocks?.squashfs
53G     bitcoin-blocks0.squashfs
56G     bitcoin-blocks1.squashfs
# du -hs blocks-ro?
71G     blocks-ro0
71G     blocks-ro1

both files contain 500-500 blk*.dat files, compression rate is 34%. so transaction data can be compressed.
ZipReg
Hero Member
*****
Offline Offline

Activity: 848
Merit: 640



View Profile WWW
November 10, 2017, 02:49:11 AM
 #4

I backed up the blockchain data recently (late October) it was 160GB uncompressed, I used 7zip with normal compression and it is 112GB in 25 dvd sized archive files.
sivagananathan
Newbie
*
Offline Offline

Activity: 9
Merit: 0


View Profile
November 10, 2017, 05:25:35 AM
 #5

I backed up the blockchain data recently (late October) it was 160GB uncompressed, I used 7zip with normal compression and it is 112GB in 25 dvd sized archive files.

Data compression;

When data compression is used in a data transmission application,the goal is speed. Speed of transmission depends upon the number of bits sent,the time required for the encoder to generate the coded message and the time required for the decoder to recover the original ensemble. In a data storage application,Although the degree of compression is the primary concern,it is nonetheless necessary that the algorithm be efficient in order for the scheme to be practical.
sivagananathan
Newbie
*
Offline Offline

Activity: 9
Merit: 0


View Profile
November 10, 2017, 05:50:38 AM
 #6

I backed up the blockchain data recently (late October) it was 160GB uncompressed, I used 7zip with normal compression and it is 112GB in 25 dvd sized archive files.

Data compression;

When data compression is used in a data transmission application,the goal is speed. Speed of transmission depends upon the number of bits sent,the time required for the encoder to generate the coded message and the time required for the decoder to recover the original ensemble. In a data storage application,Although the degree of compression is the primary concern,it is nonetheless necessary that the algorithm be efficient in order for the scheme to be practical.

As discussed in the Introduction,data compression has wide application in terms of information storage,including representation of the abstract data type string and file compression.Huffman coding is used for compression in several file archival systems [ARC 1986; one of the adaptive schemes to be discussed in Section 5.An adaptive Huffman coding technique is the basis for the compact command of the UNIX operating system.
one could expect to see even greater use of variable-length coding in the future.
mda (OP)
Member
**
Offline Offline

Activity: 144
Merit: 13


View Profile
November 10, 2017, 09:43:26 AM
 #7

As discussed in the Introduction,data compression has wide application in terms of information storage,including representation of the abstract data type string and file compression.Huffman coding is used for compression in several file archival systems [ARC 1986; one of the adaptive schemes to be discussed in Section 5.An adaptive Huffman coding technique is the basis for the compact command of the UNIX operating system.
one could expect to see even greater use of variable-length coding in the future.

Really, isn't it amazing?
LoyceV
Legendary
*
Offline Offline

Activity: 3304
Merit: 16638


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
November 10, 2017, 03:44:59 PM
 #8

Most of the data in Bitcoin blocks and mempool transactions is incompressible
This thread got me curious, so I've tested it for myself using bzip2 (options -z -9). Result (in kB):
Code:
130820  blk00400.dat
106864  blk00400.dat.bz
The compressed file is 18.3% smaller. Considering the current cost of disk space, and the complications it would give to read back data (for a wallet rescan), I see no reason to implement this.

I backed up the blockchain data recently (late October) it was 160GB uncompressed, I used 7zip with normal compression and it is 112GB in 25 dvd sized archive files.
Why would you do this? Unless you have a very slow and very expensive internet connection, loading 25 DVDs is much more work than just downloading the blockchain again.

ZipReg
Hero Member
*****
Offline Offline

Activity: 848
Merit: 640



View Profile WWW
November 10, 2017, 04:00:28 PM
 #9

As discussed in the Introduction,data compression has wide application in terms of information storage,including representation of the abstract data type string and file compression.Huffman coding is used for compression in several file archival systems [ARC 1986; one of the adaptive schemes to be discussed in Section 5.An adaptive Huffman coding technique is the basis for the compact command of the UNIX operating system.
one could expect to see even greater use of variable-length coding in the future.

Really, isn't it amazing?

lol bot users?

I'm sure bitcoin could benefit similarly to websites using gzip to deliver content -if- it can be applied. 40+GB is a pretty big difference, so just offering up some data.
ZipReg
Hero Member
*****
Offline Offline

Activity: 848
Merit: 640



View Profile WWW
November 10, 2017, 04:11:57 PM
 #10

Most of the data in Bitcoin blocks and mempool transactions is incompressible
This thread got me curious, so I've tested it for myself using bzip2 (options -z -9). Result (in kB):
Code:
130820  blk00400.dat
106864  blk00400.dat.bz
The compressed file is 18.3% smaller. Considering the current cost of disk space, and the complications it would give to read back data (for a wallet rescan), I see no reason to implement this.

I backed up the blockchain data recently (late October) it was 160GB uncompressed, I used 7zip with normal compression and it is 112GB in 25 dvd sized archive files.
Why would you do this? Unless you have a very slow and very expensive internet connection, loading 25 DVDs is much more work than just downloading the blockchain again.

Data retention. In case of loss or unrecoverable error, you can use the backup, instead of having to download the entire blockchain. A backup lets you be back in sync within hours. The purpose of having dvd sized archives is to ensure file transfer ease and integrity, not to actually use dvd media as storage. Cheers!
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!