Bitcoin Forum
April 25, 2024, 12:02:41 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: (non-ultimate) blockchain compression?  (Read 1805 times)
picopicopico (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
May 29, 2014, 09:46:35 AM
 #1

Hi,

I'm a software engineer and I want to contribute to the bitcoin[d] development. I'm a data compression expert so I thought that a block compression library would be a good start.

I've read about the "ultimate" blockchain compression where lite nodes keep only relevant leaves of the merkle tree. However as I understand we are still far from having this implemented, also this will not help full nodes, nodes with lots of transactions and all those who needs to keep full blockchain. Therefore my vision is that blockchain compression whilst is not critical now, is a nice-to-have feature.

The library I'm thinking of would be usable for wallets and daemons, both for storage and transmission. The algorithms would require modest computational resources and be friendly to hardware implementation. My quick investigation shows that considerable compression ratios (like 50%) may be achievable.

Is this something worth doing?

Thanks!
1714046561
Hero Member
*
Offline Offline

Posts: 1714046561

View Profile Personal Message (Offline)

Ignore
1714046561
Reply with quote  #2

1714046561
Report to moderator
You can see the statistics of your reports to moderators on the "Report to moderator" pages.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714046561
Hero Member
*
Offline Offline

Posts: 1714046561

View Profile Personal Message (Offline)

Ignore
1714046561
Reply with quote  #2

1714046561
Report to moderator
1714046561
Hero Member
*
Offline Offline

Posts: 1714046561

View Profile Personal Message (Offline)

Ignore
1714046561
Reply with quote  #2

1714046561
Report to moderator
1714046561
Hero Member
*
Offline Offline

Posts: 1714046561

View Profile Personal Message (Offline)

Ignore
1714046561
Reply with quote  #2

1714046561
Report to moderator
capsqrl
Sr. Member
****
Offline Offline

Activity: 444
Merit: 250



View Profile
May 29, 2014, 10:35:29 AM
 #2

The algorithms would require modest computational resources and be friendly to hardware implementation. My quick investigation shows that considerable compression ratios (like 50%) may be achievable.

Could you go into more details? Blockchain compression is not exactly a new field of study around here, yet it seems you have identified considerable low-hanging fruit. I don't know much about compression myself, but given that hashes, signatures etc closely resemble random noise, it's hard to imagine how to achieve a 50% compression ratio.

Norsk Bitcoin-bruker? Kom til /r/BitcoinNO på reddit!
picopicopico (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
May 29, 2014, 11:47:08 AM
 #3

I didn't do a detailed study yet, but speaking of low hangin fruits, what about address dictionary, typical script patterns and database index compression?

Once there's a confirmed interest, I plan to start a detailed study and open a technical discussion. I'd also appreciate links to previous technical discussions (my googling skills apparently are not sufficient to find anything technical beside that "ultimate" thing, which I consider orthogonal to normal compression).
apxu
Member
**
Offline Offline

Activity: 229
Merit: 13


View Profile
May 29, 2014, 07:12:01 PM
 #4

Quote
I didn't do a detailed study yet, but speaking of low hangin fruits, what about address dictionary, typical script patterns and database index compression?
Set compression for the hard drive and place bitcoin files on it.
And think what is cheaper: disk space or processor work?
picopicopico (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
May 29, 2014, 09:36:33 PM
 #5

I don't think a common disk compression methods are efficient for blockchain. Efficient compression means understanding the underlying structure.

Speaking on cheap vs expensive - I think it's users time that's more expensive than processor time or disk space. We can significantly save time necessary to wait for another wallet sync, or for a new wallet to init.
picopicopico (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
May 29, 2014, 10:07:43 PM
 #6

...also, this would be an optional feature anyway
Kluge
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1015



View Profile
May 29, 2014, 10:10:18 PM
Last edit: May 30, 2014, 02:34:02 PM by Kluge
 #7

Quote
I didn't do a detailed study yet, but speaking of low hangin fruits, what about address dictionary, typical script patterns and database index compression?
Set compression for the hard drive and place bitcoin files on it.
And think what is cheaper: disk space or processor work?
Disk space is fairly insignificant. Transmission is the real money/time suck. I bugged Jeff about compressing the bootstrap torrent he maintains. He's open to it... needs more prodding, I think. Smiley Even without anything built specifically for Bitcoin, >40% can be cut from the blockchain without requiring anywhere near as much time to decompress (compared to downloading time saved) unless you have a higher-end cable connection or fiber.
grau
Hero Member
*****
Offline Offline

Activity: 836
Merit: 1021


bits of proof


View Profile WWW
May 30, 2014, 04:06:15 AM
 #8

I don't think a common disk compression methods are efficient for blockchain. Efficient compression means understanding the underlying structure.

Speaking on cheap vs expensive - I think it's users time that's more expensive than processor time or disk space. We can significantly save time necessary to wait for another wallet sync, or for a new wallet to init.

Block chain data does not compress impressively on a global scale, but indices on addresses and tx hashes do.

Bits of Proof stores both the block chain data and supplementing indices in LevelDB and achieves high performance in retrieving transactions referring an arbitrary HD master key, that is why it powers the myTREZOR web wallet.

I am sure it could be further optimized with your ideas, so let me know if you'd like to discuss them in that scope.
theymos
Administrator
Legendary
*
Offline Offline

Activity: 5180
Merit: 12884


View Profile
May 30, 2014, 05:56:43 AM
 #9

blocks/blk*.dat could probably benefit from some sort of compression. These files are reasonably compressible and very large, but they're not accessed very frequently. It might also be good to compress blocks before sending them to peers, especially when they're downloading many blocks. The other files are already compressed about as well as they can be without hurting performance, or small enough that it doesn't matter.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
apxu
Member
**
Offline Offline

Activity: 229
Merit: 13


View Profile
May 30, 2014, 06:58:52 AM
 #10

I don't think a common disk compression methods are efficient for blockchain. Efficient compression means understanding the underlying structure.

I disagree.
You will have benefits from compression big chunks of data (such as blk-files), not a small pieces (transactions in blocks)

Quote
Speaking on cheap vs expensive - I think it's users time that's more expensive than processor time or disk space. We can significantly save time necessary to wait for another wallet sync, or for a new wallet to init.

The main time-expensive routine is verifying ECDSA signatures, not downloading.
But we can not (?) eliminate this step on every node.

Hmm... May be some checkpoints? Let's say we have bootstrap.dat & all index files up to the block on May,1,2014
And the client has hardcoded hash of this data.
So, new user have to download bootstrap&indexes, check hash and... do not verify all signatures from the beginning of bitcoin era
theymos
Administrator
Legendary
*
Offline Offline

Activity: 5180
Merit: 12884


View Profile
May 30, 2014, 07:18:32 AM
 #11

Hmm... May be some checkpoints? Let's say we have bootstrap.dat & all index files up to the block on May,1,2014
And the client has hardcoded hash of this data.
So, new user have to download bootstrap&indexes, check hash and... do not verify all signatures from the beginning of bitcoin era

Bitcoin Core already skips signature verification on blocks before the latest checkpoint.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
apxu
Member
**
Offline Offline

Activity: 229
Merit: 13


View Profile
May 30, 2014, 08:11:21 AM
 #12

Quote
Bitcoin Core already skips signature verification on blocks before the latest checkpoint.

But it parses all blocks and transactions to create indexes. Or I am wrong?
picopicopico (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
May 30, 2014, 08:28:08 AM
 #13

Quote
I disagree.
You will have benefits from compression big chunks of data (such as blk-files), not a small pieces (transactions in blocks)

I didn't say there's no benefit from compressing big chunks. However in this thread I'd like to study interest and requirements rather than benefits of a specific compression technique. I think it's pretty clear that disk compression is not a solution.
picopicopico (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
May 30, 2014, 08:40:38 AM
 #14

I don't think a common disk compression methods are efficient for blockchain. Efficient compression means understanding the underlying structure.

Speaking on cheap vs expensive - I think it's users time that's more expensive than processor time or disk space. We can significantly save time necessary to wait for another wallet sync, or for a new wallet to init.

Block chain data does not compress impressively on a global scale, but indices on addresses and tx hashes do.

Bits of Proof stores both the block chain data and supplementing indices in LevelDB and achieves high performance in retrieving transactions referring an arbitrary HD master key, that is why it powers the myTREZOR web wallet.

I am sure it could be further optimized with your ideas, so let me know if you'd like to discuss them in that scope.

Definitely I'm interested in this. I think optimization of network transmission and storage on portable/battery powered devices are major targets to consider. I agree with comments that disk space on PC is not an issue.
picopicopico (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
May 30, 2014, 08:52:34 AM
 #15

Quote
The main time-expensive routine is verifying ECDSA signatures, not downloading.
But we can not (?) eliminate this step on every node.

Hmm... May be some checkpoints? Let's say we have bootstrap.dat & all index files up to the block on May,1,2014
And the client has hardcoded hash of this data.
So, new user have to download bootstrap&indexes, check hash and... do not verify all signatures from the beginning of bitcoin era

ECDSA checking may be expensive, but I open my wallet every few days and on every open I see considerable network traffic for couple of minutes (I'm on ADSL). I wouldn't mind halving the time. Also for bitcoin nodes with lots of down links this might be a bigger problem.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
May 30, 2014, 03:27:33 PM
 #16

The issues there are not block chain data size, but rather the network sync algorithm. The bitcoind code that is deployed right now uses a rather stupid algorithm for fetching blocks from peers that results in downloading the same block multiple times. There is a developer working on that right now. Block chain size is currently rate limited to about 1MB per 10 minutes, or about 13kbps. It is not at all clear that compressing this data will result in faster syncs, but if that is what interests you then by all means give it a shot.

Note that there is already a Script compressor which is used by the ultraprune code to greatly reduce the size of common bitcoin scriptPubKeys.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
picopicopico (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
June 02, 2014, 11:17:09 AM
 #17

Block chain size is currently rate limited to about 1MB per 10 minutes, or about 13kbps. It is not at all clear that compressing this data will result in faster syncs, but if that is what interests you then by all means give it a shot.

1. If I sync once a day I need about 3 mins just to fetch the data (250 KiB/s channel, about 200 KiB block, 240 blocks).

2. As I undestand clients propagate new transactions through the network instantly and whenever there's a new block, there's a spike in the amount of data (which may well exceed 13 kpbs).

3. The new block will contain transactions previously propagated through the network, so some (all?) transactions get received by peers twice - when transaction is created, and as a part of a block. Is this correct?
apxu
Member
**
Offline Offline

Activity: 229
Merit: 13


View Profile
June 02, 2014, 12:52:20 PM
 #18

Quote
1. If I sync once a day I need about 3 mins just to fetch the data (250 KiB/s channel, about 200 KiB block, 240 blocks).
~144 blocks

Quote
3. The new block will contain transactions previously propagated through the network, so some (all?) transactions get received by peers twice - when transaction is created, and as a part of a block. Is this correct?
This is usually correct.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 02, 2014, 11:53:45 PM
 #19

But the protocol layer can fix that. A block that is just header + coinbase + txid list would be pretty short.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
apxu
Member
**
Offline Offline

Activity: 229
Merit: 13


View Profile
June 03, 2014, 03:36:47 AM
 #20

But the protocol layer can fix that. A block that is just header + coinbase + txid list would be pretty short.
Yes, but what if I do not have one or more transactions in my mempool to assemble a block from this template?
This situation occures when a transaction comes to a miner, miner accepts it into a block and solve block immideately after.
Or miner takes very old (month or year) transaction from mempool
So, no one node on network has this transaction in mempool.

Ok, first node receives "template", and have to ask for missing transaction its peer.

Quote
https://en.bitcoin.it/wiki/Protocol_specification#getdata
getdata is used in response to inv, to retrieve the content of a specific object, and is usually sent after receiving an inv packet, after filtering known elements. It can be used to retrieve transactions, but only if they are in the memory pool or relay set - arbitrary access to transactions in the chain is not allowed to avoid having clients start to depend on nodes having full transaction indexes (which modern nodes do not).

Peer does not have this tx in memory pool - because tx is already in block
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!