Bitcoin Forum
April 23, 2024, 09:25:08 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 [12] 13 14 15 16 17 18 19 20 21 22 »  All
  Print  
Author Topic: Ultimate blockchain compression w/ trust-free lite nodes  (Read 87867 times)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1136


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
November 04, 2012, 07:48:48 PM
Last edit: November 04, 2012, 08:06:21 PM by casascius
 #221

As casascius proposed, it only keeps the currently-active UTXO state, and no effort is done to keep older trees available by sharing subtrees. Keeping full snapshots of older trees is a bad idea in my opinion (and re-rolling the entire history even more so), but on-disk files with data to 'undo' the applications of blocks to the UTXO set are an easy solution. If you see blocks as patches to the UTXO structure, these undo files are the reverse patches. They are only necessary for reorganisations, and as that happens far less frequently than accessing the currently active tree state, they don't need super-fast access anyway (simple rule in optimizing for cache effects: if two data sets have different access patterns, don't mix them).

My main thought when weighing this idea versus my "snapshots" idea, is that this idea is asking for a storage burden of little objects that grows O(n) with the block chain and only supports a reorg as long as the record goes, and the one I suggest is of bigger objects that grows only O(log n) and supports a reorg all the way back to the genesis block, without requiring development, support, testing, or specifying any new functionality or requirements for storing and transferring orphan blocks among peers.

EDIT: another concern is that any rollback scheme that depends on the ability to fetch orphan blocks from peers is not only at risk of failing if those peers do not have those orphan blocks, but is also an avenue for attack.  If a node ends up bootstrapping from an isolated segment of the network that has a large number of orphan blocks (or is forced into this situation by an attacker), never seeing any full blocks or tree snapshots from the real network... and then suddenly connects to the "real" network, the real network will have no way to provide the orphan blocks needed to roll his tree out of fantasy land and back to earth.  He must start over from some point in the past, he has no choice.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Bitcoin addresses contain a checksum, so it is very unlikely that mistyping an address will cause you to lose money.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713907508
Hero Member
*
Offline Offline

Posts: 1713907508

View Profile Personal Message (Offline)

Ignore
1713907508
Reply with quote  #2

1713907508
Report to moderator
1713907508
Hero Member
*
Offline Offline

Posts: 1713907508

View Profile Personal Message (Offline)

Ignore
1713907508
Reply with quote  #2

1713907508
Report to moderator
Pieter Wuille
Legendary
*
qt
Offline Offline

Activity: 1072
Merit: 1174


View Profile WWW
November 04, 2012, 08:03:26 PM
 #222

My main thought when weighing this idea versus my "snapshots" idea, is that this idea is asking for a storage burden of little objects that grows O(n) with the block chain and only supports a reorg as long as the record goes, and the one I suggest is of bigger objects that grows only O(log n) and supports a reorg all the way back to the genesis block, without requiring development, support, testing, or specifying any new functionality or requirements for storing and transferring orphan blocks among peers.

I'm not sure what you mean. Sure you only needs O(log n) separate objects, while I need O(n) separate objects. However, in total you store way way more data. Currently the "forward" block data is 3.75 GB, and the undo data all the way back to the genesis block is 0.48 GB. The UTXO set itself, is 0.13 GB. So just 4 snapshots requires more storage than all that is needed to store the entire history, without O(n) operations to make the copies. Sorry, but full snapshots is out of the question in my opinion. Both explicit reverse-deltas (like I use) or multiple trees with subtree-sharing have significantly better complexity. And yes, sure it may be somewhat harder to develop, but if Bitcoin is still around when the UTXO set is several gigabytes large, we'll be very thankful that we don't need to make fast copies of it continuously.

I do Bitcoin stuff.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1136


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
November 04, 2012, 08:20:48 PM
 #223


I'm not sure what you mean. Sure you only needs O(log n) separate objects, while I need O(n) separate objects. However, in total you store way way more data. Currently the "forward" block data is 3.75 GB, and the undo data all the way back to the genesis block is 0.48 GB.

The way I understand it, this 0.48 GB of undo data is useless without part of the 3.75 GB of block data to go with it.  I have assumed this because in order to roll back a block, you need to replace elements of the UTXO set that have been cut out during forward motion, which clearly don't all fit in 0.48 GB.  And in order to roll a block back, you need to download orphaned data from a peer, data that is now no longer part of the block chain, isn't required storage for anybody, which nobody on the main network has if it originated from an attacker or isolated network, and that can't be assumed to be available from any given peer.

Meanwhile, while my snapshots likewise depend on external data to properly roll things forward, it is data that at least a) some nodes on the main network are guaranteed to have, and b) worthwhile downloading, since it is actual block chain data that is relevant to the network and worth storing and propagating.

The UTXO set itself, is 0.13 GB. So just 4 snapshots requires more storage than all that is needed to store the entire history, without O(n) operations to make the copies.

That assumes all four snapshots are the same size.  How big was the block chain around block 0x10000?  Tiny compared to today is my guess, its size is certainly a fraction of 0.13 GB.  The earlier snapshots are likely to always be much smaller than the later ones.

That also assumes that they can't be stored on disk without any sort of differential coding.  Surely that won't help much for storing the snapshots for block 0x8000 and block 0x10000, but the snapshots between blocks in close succession (especially when no rebalance has been done) are easily written as "here's a snapshot" and "here's a snapshot-diff" (and "snapshot-diff" can easily just be defined as the block itself, something the client is probably already saving)

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Pieter Wuille
Legendary
*
qt
Offline Offline

Activity: 1072
Merit: 1174


View Profile WWW
November 04, 2012, 08:36:50 PM
 #224

The 0.48 GB does include all data ever removed from the txout set in fact, as only a fraction of the block data is actual txouts - most is signatures which never make it to the txout set. However, you are right that the undo data does depend on the forward block data, not for the data itself but for the txids being spent. Compared to data being removed, this is "local" data: to disconnect a block N you need the block data for N, and the undo data for N (and especially not the block data of the transactions that were spent). If the txids would be stored in the undo data as well (to make it fully standalone), it would be around 1.5 GB (yes, seriously, the majority of undo data is not the "values" of the UTXO map, but the "keys").

Still, I think we're moving away from the real discussion. We're not talking about today's storage requirements - if we were nothing complex would be needed, the UTXO set is only 130 MB, and is completely scanned in seconds. I do forefee the UTXO set to grow significantly the next years, and the problem is not how you store it: people should have the ability to store it in a small compact way, or a larger way, by choosing different priorities of cpu/bandwidth/storage.

I do Bitcoin stuff.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
November 05, 2012, 01:26:51 AM
 #225

Again, the claim "this is not possible with BSTs" about impossibility of parallelism in b-trees is false. I wonder what is going on here?

I should've stated that differently -- it's not impossible to parallelize general BSTs, etc, but since we are dependent on the underlying structure of the BST (which is not a normal requirement of BSTs in other applications), insert order must be strictly adhered to, and thus each insert operation must be completed in order before the next can be applied.   By definition, that is a serial process.  Along with that, sub-branches of the BST are not independent, so it's more complicated to even have the storage space split up, since rebalance operations may shift nodes from one thread's storage space to another.  It's not an impossible task, it's just an unnecessarily complex one.

And I have a tough time believing that no one in the future will ever benefit from sub-tree independence:  with the BST, lite nodes can't even maintain their own little sub-trees for their addresses, because the structure of that subtree could change due to unrelated inserts nearby.  Or, the subtree could induce a rebalance itself, and change the root of the subtree that has to be tracked which may include other nodes it doesn't care about.

With the trie structure, sorted first by address then by OutPoint, a lite node can maintain sub-trees of each of its own addresses, insert and delete elements in any order, and roll back its subtree on reorgs without having to care about anyone else's OutPoints.  And all using simple, standard insert and delete ops, found in any textbook on the subject.

But, I have to agree that the first implementer of this stuff is likely to set this standard.  I was hoping to persuade the folks who are working on it now, that the trie structure is not only the simplest, but the most flexible and robust.  Apparently, I'm not succeeding.    I better get back to fixing Armory so that maybe one day soon I can work on this problem, too Smiley

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
jl2012
Legendary
*
Offline Offline

Activity: 1792
Merit: 1092


View Profile
November 05, 2012, 04:34:03 AM
 #226

I have a temporary solution. Currently the satoshi client has hard code of historical block hash (currently at about height 190000). Could we also hard code all the unspent output to the satoshi client up to a certain block height? For each output, only the transaction id, output order, script, and the block height are needed. That would be all information needed for verifying any future transactions and spending the coins (block height is needed to calculate the priority. If these unspent outputs are deep enough in the chain, the block height may be omitted as well).

Since the users can verify the hard-coded data with the blockchain, they don't really need to trust the development team. If the hard-coded data is widely distributed and independently verified by many people, normal users may simply accept it as-is.

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1091


View Profile
November 05, 2012, 06:25:03 PM
 #227

I have a temporary solution. Currently the satoshi client has hard code of historical block hash (currently at about height 190000). Could we also hard code all the unspent output to the satoshi client up to a certain block height?

Yes.

Quote
For each output, only the transaction id, output order, script, and the block height are needed.

You just need a hash of the UTXO set at a given height, really.  Then later UTXO sets may be provably traced to the checkpoint hash.


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
December 12, 2012, 05:26:08 AM
Last edit: December 18, 2012, 09:19:48 PM by d'aniel
 #228


The "values" of each leaf is just the root of the sub tree, and the value of each branch is the skip-string concatenated with all its children's values.





I was thinking that to reduce the number of hashes a lightweight client would need to download when proving inclusion, you could replace the concatenation of the children's values in the formula for the value of branch node B with their Merkle root.  Then a lightweight client would only need log2(256) = 8 hashes 8 + (8 - 1) = 15 hashes per node to prove inclusion in the worst case scenario, instead of 256.

The downside is having to store more hashes (twice as many, worst case), but it actually appears to make updates faster, I guess due to fewer hashes needing to be accessed from memory.  This is because all node values updated during insertion/deletion/leaf update, except the one being attached to or removed from, have only a single leaf in the Merkle tree updated => no unbalancing issues, and only a single Merkle branch needs to be updated.

Also, this paper, 'Implementation of an Authenticated Dictionary with Skip Lists and Commutative Hashing' seems to offer an alternative solution than trees/tries: http://www.cs.brown.edu/cgc/stms/papers/discex2001.pdf
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 12, 2012, 06:26:00 AM
 #229


I was thinking that to reduce the number of hashes a lightweight client would need to download when proving inclusion, you could replace the concatenation of the children's values in the formula for the value of branch node B with their Merkle root.  Then a lightweight client would only need log2(256) = 8 hashes per node to prove inclusion in the worst case scenario, instead of 256.

The downside is having to store more hashes (twice as many, worst case), but it actually appears to make updates faster, I guess due to fewer hashes needing to be accessed from memory.  This is because all node values updated during insertion/deletion/leaf update, except the one being attached to or removed from, have only a single leaf in the Merkle tree updated => no unbalancing issues, and only a single Merkle branch needs to be updated.

That's a very good idea, as I've been concerned about how to reduce the number of hashes that need to be transferred for a lite-node to get its balance.  I had spent some time looking for "associative" hashing functions.  If they existed, the transfers would be tiny:  if a node is known to have Hash(A|B|C|D|E|F|G|H|I|J), and you want to prove inclusion of G, you simply supply {Hash(A|B|C|D|E|F), G, Hash(H|I|J)}.   So you would need to transfer at most 3 hashes per level.  Unfortunately, the only associative hash function I found was based on matrix multiplications, and was most definitely not cryptographically secure Sad

So, I welcome the idea of per-node merkle trees.  I wonder if the trees really need to be stored, though.  Hashing is so fast that recomputing the merkle tree on-demand may be fine, especially for the sparse, lower-level nodes.  Of course, the top couple levels could/should be cached since they'll get hit all the time and are pretty dense.

However, I'm not sure if I agree/understand the point about updating only Merkle branches.  Because the linked-lists at each node in the tree are sorted, the deletion/insertion of a node is likely to occur in the middle of the list and reverse the parity/pairings -- i.e.  you started with {A,C, D,H, J,M, Q,Y} -- the bottom level pairs (A,C), (D,H), (J,M) and (Q,Y).  Now, you insert E and the list now looks like: {A,C  D,E, H,J, M,Q, Y,Y} which means that all branches to the right of the insertion or deletion need to be recalculated.  

On the other hand, you may be recomputing sparse nodes all the time, anyway (meaning this recomputation will happen regardless), and the dense higher-level nodes could be batch-updated -- you know that a given block is going to modify 77 of your 256 branches at the top level, so you don't need to recompute the tree until you have all 77 children are complete.


Also, this paper, 'Implementation of an Authenticated Dictionary with Skip Lists and Commutative Hashing' seems to offer an alternative solution than trees/tries: http://www.cs.brown.edu/cgc/stms/papers/discex2001.pdf

That's an interesting paper, though I have trouble envisioning how they could be made deterministic for the purposes of authenticated structures.  Maybe I just need to read the paper a little closer.  Unfortunately, it's late so I will have to come back to this, later.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
December 12, 2012, 01:58:33 PM
Last edit: December 12, 2012, 02:32:10 PM by d'aniel
 #230

That's a very good idea, as I've been concerned about how to reduce the number of hashes that need to be transferred for a lite-node to get its balance.  I had spent some time looking for "associative" hashing functions.  If they existed, the transfers would be tiny:  if a node is known to have Hash(A|B|C|D|E|F|G|H|I|J), and you want to prove inclusion of G, you simply supply {Hash(A|B|C|D|E|F), G, Hash(H|I|J)}.   So you would need to transfer at most 3 hashes per level.  Unfortunately, the only associative hash function I found was based on matrix multiplications, and was most definitely not cryptographically secure Sad
Funny, I was looking for an associative hashing function as well.  I gave up after thinking they would seem to defeat the purpose of Merkle trees altogether, and thus likely don't exist (yet?).

Quote
However, I'm not sure if I agree/understand the point about updating only Merkle branches.  Because the linked-lists at each node in the tree are sorted, the deletion/insertion of a node is likely to occur in the middle of the list and reverse the parity/pairings -- i.e.  you started with {A,C, D,H, J,M, Q,Y} -- the bottom level pairs (A,C), (D,H), (J,M) and (Q,Y).  Now, you insert E and the list now looks like: {A,C  D,E, H,J, M,Q, Y,Y} which means that all branches to the right of the insertion or deletion need to be recalculated.  
My point was that for insertion/deletion, only the trie node being attached to/removed from requires an insertion/deletion in the leaves of its Merkle tree.  All of this node's parents simply have to update the value of one of the leaves in the Merkle trees, i.e. no insertions/deletions, and this requires only updating a single branch in each parent's Merkle tree.  The node being attached to/removed from does require an insertion/deletion in its Merkle tree, but this would usually happen lower down the trie, where nodes branch less, and Merkle tree rebalancing is faster.

Quote
Also, this paper, 'Implementation of an Authenticated Dictionary with Skip Lists and Commutative Hashing' seems to offer an alternative solution than trees/tries: http://www.cs.brown.edu/cgc/stms/papers/discex2001.pdf

That's an interesting paper, though I have trouble envisioning how they could be made deterministic for the purposes of authenticated structures.  Maybe I just need to read the paper a little closer.  Unfortunately, it's late so I will have to come back to this, later.
To make it deterministic, assuming the hash of each node's element is uniformly distributed, this hash value could be used as a "random" number for determining the tower height.  Though, this provides an avenue for attack by users purposely building lots of "tall towers".  To avoid this, each UTxO could also get randomness from the Merkle root of the tx tree that the UTxO showed up in, something a non-miner attacker has no control over.  It might also be hard enough for even a miner to perform the attack, since he has only one lever to control many degrees of freedom with.  This is just off the cuff, though, so I'm sure there are better ways to do it, or perhaps I'm not understanding it correctly.

I think the thing in this paper is patented, though.

Edit: You should name this beautiful thing, cause "hybrid PATRICIA/de la Brandais trie with authenticating Merkle trees" doesn't exactly roll off the tongue Smiley
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 13, 2012, 05:54:15 AM
 #231

My point was that for insertion/deletion, only the trie node being attached to/removed from requires an insertion/deletion in the leaves of its Merkle tree.  All of this node's parents simply have to update the value of one of the leaves in the Merkle trees, i.e. no insertions/deletions, and this requires only updating a single branch in each parent's Merkle tree.  The node being attached to/removed from does require an insertion/deletion in its Merkle tree, but this would usually happen lower down the trie, where nodes branch less, and Merkle tree rebalancing is faster.

Ahh, excellent point.   On that note, isn't it actually 15 hashes per full merkle tree of 256 nodes?  Because you need not only the straight branch from root to leaf, but also the brothers of each of those nodes so you can confirm.  Though, it's still dramatically less than sending all 256, and makes the bandwidth requirements of this structure much more pleasant.


Edit: You should name this beautiful thing, cause "hybrid PATRICIA/de la Brandais trie with authenticating Merkle trees" doesn't exactly roll off the tongue Smiley

How about the "Reiner Tree"?   Tongue  It's funny you asked that, because the other day my dad was asking me if I have any part of Bitcoin named after me, yet...

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1136


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
December 13, 2012, 06:28:11 AM
 #232

How about the "Reiner Tree"?   Tongue  It's funny you asked that, because the other day my dad was asking me if I have any part of Bitcoin named after me, yet...

I get stuff like this too, for being the ultimate bitcoin freak in my circles.

I talk about Bitcoins all the time, am everybody's Bitcoin broker, give away Casascius Coins and paper bitcoins for every occasion, I now have the Utah "BITCOIN" vanity license plate.  People think I'm Satoshi and just not telling, and gifts I have received for birthdays and Christmas are often bitcoin themed, one included a book from the bookstore about "the history of money" spoofed with my coins being part of history: content from my website was mocked up into the style of the book and facetiously glued into the middle of it.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 15, 2012, 02:34:10 PM
 #233

I just realized something that will need to be addressed with any such scheme that is used:  how will a soon-to-be-full-validation-but-pruned node bootstrap off the network?  Sure, we can say that at block X, the entire set of UTXOs is well-defined and compact (in whatever tree structure we're talking about).  But any new node that jumps onto the network will have download that entire UTXO set, and surely peers will have updated their internal UTXO set/tree in at least once while the node is downloading.   This means that even if a node can start supplying the UTXO set at block X, within an hour that node will be on X+3, or something like that.  Branches and leaves of the tree will have changed, and that node will not recognize the branches and leaves of the previous state (well, most will be the same, but you get the point).

This is resolved in the main network by having persistent, authenticated information that can be downloaded in chunks (i.e. blocks in the blockchain), which are still valid, no matter how long it's been since that block was created.  Each block can be downloaded quickly, and checked directly against the header chain.  However, in this case, the entire tree is to be authenticated against a block header, and you pretty much have to download the entire tree before you can confirm any part of it.  Seems like this could be a problem...

One idea, which doesn't seem ideal, is that the node simply stores a "snapshot" at every retarget event.  Every new node will have a two week window to download the UTXO set from the latest retarget, and then download the block history that has occurred since then.  If the new node also has a wallet, it can use the absolute latest meta-header to get its own balance and UTXO list and let the user manage their wallet using the still-secure address branches, it just won't be able to fully sync and start full validation until it's done downloading the latest retarget tree.

That's not an ideal solution, but it's food for thought...

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1091


View Profile
December 15, 2012, 04:32:15 PM
 #234

I just realized something that will need to be addressed with any such scheme that is used:  how will a soon-to-be-full-validation-but-pruned node bootstrap off the network?  Sure, we can say that at block X, the entire set of UTXOs is well-defined and compact (in whatever tree structure we're talking about).  But any new node that jumps onto the network will have download that entire UTXO set, and surely peers will have updated their internal UTXO set/tree in at least once while the node is downloading. 

The prevailing idea is that you download the block chain from "archive nodes", which are nodes that retain the full blockchain.


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 16, 2012, 02:32:01 AM
 #235

I just realized something that will need to be addressed with any such scheme that is used:  how will a soon-to-be-full-validation-but-pruned node bootstrap off the network?  Sure, we can say that at block X, the entire set of UTXOs is well-defined and compact (in whatever tree structure we're talking about).  But any new node that jumps onto the network will have download that entire UTXO set, and surely peers will have updated their internal UTXO set/tree in at least once while the node is downloading. 

The prevailing idea is that you download the block chain from "archive nodes", which are nodes that retain the full blockchain.

I'm not talking about downloading the entire blockchain.  I'm talking about downloading just the pruned UTXO-tree.  Every pruned-full node knows what the UTXO-tree looks like at a given time, but before it can finish telling you what it looks like, it's updating itself with new tx and blocks and changing it.  I'm not seeing a clean way to transfer pruned-blockchain data from a node that is constantly updating its own meta tree.    Ultimately, it's because knowing what the network looked like at block X (from the pruned blockchain perspective), does not mean it's easy/possible to tell me what it looked like at block X-30.  And it's probably a lot of data and complexity to accommodate taking a "snapshot" just for helping other full nodes catch up.

You could say:  well they should just download the entire transaction history and build the UTXO set themselves.  I'm sure some users will do that, either out of paranoia, or for historic reasons, etc.  But it most definitely shouldn't be a requirement to download 100 GB of history just to get 2 GB worth of UTXOs.   The data needed to become a pruned-but-still-full-validation node is a fraction of the entire-and-ever-growing history, and it would be a pretty significant waste of resources to not be able to download the raw UTXO list to get up and running.

As I said, the simplest is probably to have nodes just spend the space on a snapshot at every retarget, and let nodes synchronize with that (or perhaps every 500 blocks or something, as long as all pick the same frequency so that you can download from lots of sources simultaneously).  After that, they can download the few remaining blocks to update their own tree, appropraitely.

This could affect pruned-lite nodes, too.  If there are updated meta-chain hashes every block, then even transferring small "address branches" to lite nodes could be "interrupted" by new tx/blocks coming in that changes the seeder's UTXO-tree before it's finished.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
December 18, 2012, 09:33:12 PM
 #236

On that note, isn't it actually 15 hashes per full merkle tree of 256 nodes?
Yeah, whoops.

Regarding the issue of synching ones Reiner tree: Smiley is it really a problem this proposal needs to solve?  Couldn't the client just wait to build/update it til after he's caught up with the network in the usual way?
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1136


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
December 18, 2012, 10:16:31 PM
 #237

As I said, the simplest is probably to have nodes just spend the space on a snapshot at every retarget, and let nodes synchronize with that (or perhaps every 500 blocks or something, as long as all pick the same frequency so that you can download from lots of sources simultaneously).  After that, they can download the few remaining blocks to update their own tree, appropraitely.

I had come up with a scheme for deciding how long to keep each snapshot I thought would balance space and usefulness well.

If the block height (in binary) ends in 0, keep it for 4 blocks.
If 00, keep for 8 blocks.
If 000, keep for 16 blocks.
If 0000, keep for 32 blocks.
If 00000, keep for 64 blocks... etc. all the way to the genesis block.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 18, 2012, 10:43:03 PM
Last edit: March 03, 2013, 05:24:09 PM by etotheipi
 #238

Is it possible to read somewhere exactly what is stored in a block in pseudocode?

A block, as it is stored on disk is very straightforward and parsed very easily:

Code:
[MagicBytes(4) || BlockSize(4) || RawHeader(80) || NumTx(var_int) || RawTx0 || RawTx1 || ... || RawTxN] 

The blk*.dat files are just a list of binary sequences like this

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 18, 2012, 10:51:08 PM
 #239

On that note, isn't it actually 15 hashes per full merkle tree of 256 nodes?
Yeah, whoops.

Regarding the issue of synching ones Reiner tree: Smiley is it really a problem this proposal needs to solve?  Couldn't the client just wait to build/update it til after he's caught up with the network in the usual way?

Well, I'm hoping that it will be possible to not need to "catch up with the network" in the current sense.  Certain types of nodes will only care about having the final UTXO set, not replaying 100 GB of blockchain history just to get their 2 GB of UTXO data.  I'd like it if such nodes had a way of sharing these UTXO trees without using too much resources, and without too much complication around the fact that the tree is changing as you are downloading. 

One core benefit of the trie structure is that nodes can simply send a raw list of UTXOs, since insertion order doesn't matter (and thus deleted UTXOs don't need to be transferred).  Sipa tells me there's currently about 3 million UTXOs, so at 36 bytes each, that's about 100 MB to transfer.  There is, of course, the raw transactions with any remaining UTXO that need to be transferred, too -- currently 1.3 million out of about 10million total tx in the blockchain.  So that's probably another few hundred MB.  But still only a fraction of the 4.5 GB blockchain.   


As I said, the simplest is probably to have nodes just spend the space on a snapshot at every retarget, and let nodes synchronize with that (or perhaps every 500 blocks or something, as long as all pick the same frequency so that you can download from lots of sources simultaneously).  After that, they can download the few remaining blocks to update their own tree, appropraitely.

I had come up with a scheme for deciding how long to keep each snapshot I thought would balance space and usefulness well.

If the block height (in binary) ends in 0, keep it for 4 blocks.
If 00, keep for 8 blocks.
If 000, keep for 16 blocks.
If 0000, keep for 32 blocks.
If 00000, keep for 64 blocks... etc. all the way to the genesis block.

That's a generalization of what I proposed:  if(blockheight mod 2016 == 0) {storeSnapshotFor2016Blocks}.  Clearly, the modulus needs to be calibrated...  The problem is these snapshots are very expensive, so we would prefer not to do snapshots at all.  But one may be necessary.  Hopefully not more than that.  Although it would be great if I just overlooked something and we could do this without snapshots at all.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1136


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
December 18, 2012, 11:31:59 PM
 #240

As I said, the simplest is probably to have nodes just spend the space on a snapshot at every retarget, and let nodes synchronize with that (or perhaps every 500 blocks or something, as long as all pick the same frequency so that you can download from lots of sources simultaneously).  After that, they can download the few remaining blocks to update their own tree, appropraitely.

I had come up with a scheme for deciding how long to keep each snapshot I thought would balance space and usefulness well.

If the block height (in binary) ends in 0, keep it for 4 blocks.
If 00, keep for 8 blocks.
If 000, keep for 16 blocks.
If 0000, keep for 32 blocks.
If 00000, keep for 64 blocks... etc. all the way to the genesis block.

That's a generalization of what I proposed:  if(blockheight mod 2016 == 0) {storeSnapshotFor2016Blocks}.  Clearly, the modulus needs to be calibrated...  The problem is these snapshots are very expensive, so we would prefer not to do snapshots at all.  But one may be necessary.  Hopefully not more than that.  Although it would be great if I just overlooked something and we could do this without snapshots at all.

Yes, other than what you're proposing stores snapshots that are equally balanced for the entirety of the blockchain, and I'm proposing one that is biased towards having more options that are all recent.  My reason for having done so is it seems plausible that someone using a client that tracks only the UTXO set might be given a choice as to how far back in the blockchain history they want to request from peers (e.g. 1 hour? 8 hours? 1day? 1week? 1month? 3 months? 6months? 1year? 2years? 4years? everything?) and it would make sense to put the resources in being able to accommodate selections that looked like that, versus ones like 2weeks? 4weeks? 6weeks? ... 1028weeks? 1030weeks? 1032weeks?  etc.

Of course, if snapshots end up not being the best solution, then I'm all for that as well.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Pages: « 1 2 3 4 5 6 7 8 9 10 11 [12] 13 14 15 16 17 18 19 20 21 22 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!