Bitcoin Forum
December 14, 2024, 01:01:18 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 [13] 14 15 16 17 18 19 20 21 22 »  All
  Print  
Author Topic: Ultimate blockchain compression w/ trust-free lite nodes  (Read 87947 times)
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 18, 2012, 11:52:04 PM
 #241

Of course, if snapshots end up not being the best solution, then I'm all for that as well.

Well, I am not seeing a way around using snapshots.  I was hoping someone more insightful than myself would point out something simpler, but it hasn't happened yet...

Also, as mentioned earlier, I think snapshots are wildly expensive to store.  I think if a node wants block X and the snapshot is only for block X+/-100, then he can get that snapshot at X and the 100 blocks in between and rewind or fast forward the UTXO tree on his own.  The rewinding and fast-forwarding should be extremely fast once you have the block data. 

Although this does open the question of how nodes intend to use this data.  If it turns out they will want to understand how the blockchain looked at multiple points in time, then perhaps it's worth the effort to store all these snapshots.  If it never happens, then the fast-foward/rewind would be better.  My thoughts on this is:

(1) The gap between snapshots should be considered relative to the size of a snapshot.  My guess is that 100 blocks is smaller than a snapshot, and thus you never need more snapshots than that
(2) Snapshots at various points in time actually won't be that useful, other than helping other nodes download.  These kinda-full-nodes only care about the latest state of the UTXO tree, nothing else.  If you think there's other reasons, please point them out.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1140


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
December 18, 2012, 11:58:25 PM
 #242

Of course, if snapshots end up not being the best solution, then I'm all for that as well.

Well, I am not seeing a way around using snapshots.  I was hoping someone more insightful than myself would point out something simpler, but it hasn't happened yet...

Also, as mentioned earlier, I think snapshots are wildly expensive to store.  I think if a node wants block X and the snapshot is only for block X+/-100, then he can get that snapshot at X and the 100 blocks in between and rewind or fast forward the UTXO tree on his own.  The rewinding and fast-forwarding should be extremely fast once you have the block data. 

Although this does open the question of how nodes intend to use this data.  If it turns out they will want to understand how the blockchain looked at multiple points in time, then perhaps it's worth the effort to store all these snapshots.  If it never happens, then the fast-foward/rewind would be better.  My thoughts on this is:

(1) The gap between snapshots should be considered relative to the size of a snapshot.  My guess is that 100 blocks is smaller than a snapshot, and thus you never need more snapshots than that
(2) Snapshots at various points in time actually won't be that useful, other than helping other nodes download.  These kinda-full-nodes only care about the latest state of the UTXO tree, nothing else.  If you think there's other reasons, please point them out.



The operators of the other nodes may care about holding on to a certain amount of history just as a greater assurance they're really dealing with the real block chain.  One could argue that 1 hour or one day worth of history is enough, but if I'm about to start mining and want to do my part to help the network, I might feel more comfortable if I made it a month or a year, especially if I had the bandwidth to spare.  The more information I hold, the more history I'm "seeding" just in case it's needed in the event of a reorg.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
December 19, 2012, 12:47:21 AM
 #243

How about this: download whatever non-sychronized UTxO set you can from peers, then start downloading blocks backwards, adding any missing new txouts and removing any that were spent during the download.  Then once you're a few blocks before the time you started the download you could build the tree and make sure it hashes properly.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 19, 2012, 02:41:34 AM
 #244

How about this: download whatever non-sychronized UTxO set you can from peers, then start downloading blocks backwards, adding any missing new txouts and removing any that were spent during the download.  Then once you're a few blocks before the time you started the download you could build the tree and make sure it hashes properly.

@d'aniel:  You might be right that it's possible to reconstruct the tree from an amalgamation of closely related states.  Though, I'm concerned that there's too many ways for that to go wrong.  Let's start a thought-experiment:  I have a fresh slate, with no Tx and no UTXO.  I then execute 65,536 requests for data, and download each one from a different peer (each request is for a different 2-byte prefix branch).  I will assume for a moment that all requests execute successfully, and we end up with something like the following:



A couple notes/assumptions about my drawing:  

  • (1) I have drawn this as a raw trie, but the discussion is the same (or very close to the same) when you transition to the Patricia/Brandais hybrid.  Let me know if you think that's a bad assumption.
  • (2) We have headers up to block 1000.  So we ask one peer that is at block 1000 for all 65,536 trie-node hashes.  We verify it against the meta-chain header.
  • (3) We make attempts to download all 65,536 subtrees from a bunch of peers, and end up mostly with those for block 1000, but a few for 995-999, and a couple have given us block 1001-1002 because that was what they had by the time we asked them to send us that branch.  We assume that peers tell us what block they are serving from.
  • (4) Some branches don't exist.  Even though the second layer on the main network they will always be at 100% density, there may be various optimization-related reasons to do this operation at a lower branch level where it's not 100% density.
  • (4a) I've used green to highlight four situations that I don't think are difficult, but need to be aware of them.  Branch \x0202 is where the node hashes at block 1000 say it's an empty node, but is reported as having data by the peer serving us from block 1001.  \x0203 is the same, but with a peer serving block 993 telling us there is data there.  \x0302 and \x0303 are the inverse:  block 1000 has hashes for those trie-nodes, but when requested from peers serving at other points in time, they report empty.
  • (5) Downloading the transactions-with-any-unspent-txouts from sources at different blocks also needs to be looked at.  We do eventually need to end up with a complete list of tx for the tree at block 1000 (or 1002?).  I'm expecting that any gaps can be filled with subsequent requests to other nodes.

So, as a starter algorithm, we acquire all this data and an almost-full UTXO tree.  We also acquire all of the blocks between 993 and 1002.   One branch at a time, we fast forward or rewind that branch based on the tx in blocks 993-1002.  It is possible we will be missing block data needed (due to #5), but I assume we will be able to acquire that info from someone -- perhaps this warrants keeping tx in the node's database for some number of blocks after it is depleted, to make sure it can still be served to other nodes catching up (among other reasons).

On the surface, this looks workable and actually not terribly complicated.  And no snapshots required!  Just ask peers for their data, and make sure you know what block their UTXO tree is at.  But my brain is at saturation, and I'm going to have to look at this with a fresh set of eyes later this week, to make sure I'm not neglecting something stupid.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
December 19, 2012, 04:25:13 AM
 #245

I love your inkscape graphics.  I downloaded it cause of your mention Smiley

Doesn't it make more sense to start downloading from the bottom of the tree instead of the top?  Say, partition the address space up, and request all UTxOs that lie a given partition - aong with the full bounding branches - and then compute the missing node hashes up to the root.  Inclusion of each partition in some known block is verified, and then we'd just have catch up the delayed partitions separately using full tx data.  The deterministic property of the tree makes the partition syncing trivial, and I assume tx data will be available within some relatively large time window for reorgs and serving, etc.

My brain is fried right now too, I'll have a closer look at what you wrote after some sleep.  Maybe I'm oversimplifying it...
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 19, 2012, 04:55:23 AM
Last edit: December 20, 2012, 06:36:55 PM by etotheipi
 #246

I love your inkscape graphics.  I downloaded it cause of your mention Smiley

It's like VIM:  it's got a bit of a learning curve to be able to use it efficiently, but there's so many shortcuts and hotkeys that you can really fly once you have some experience (and yes, I do 100% of my code development in vim Smiley)

Doesn't it make more sense to start downloading from the bottom of the tree instead of the top?  Say, partition the address space up, and request all UTxOs that lie a given partition - aong with the full bounding branches - and then compute the missing node hashes up to the root.  Inclusion of each partition in some known block is verified, and then we'd just have catch up the delayed partitions separately using full tx data.  The deterministic property of the tree makes the partition syncing trivial, and I assume tx data will be available within some relatively large time window for reorgs and serving, etc.

My brain is fried right now too, I'll have a closer look at what you wrote after some sleep.  Maybe I'm oversimplifying it...

I think we're saying the same thing:  I showed partitioning at the second level, but it really would be any level low enough to meet some kind of criteria (though I'm not sure what that criteria is, if you don't have any of the data yet).  It was intended to be a "partitioning from the bottom" and then filling up to the root once you have it all.  

I imagine there would be a P2P command that says "RequestHashes | HeaderHash | Prefix".  If you give it an empty prefix, that means start at root:  it will give you the root hash, followed by the 256 child hashes.  If you give it a prefix "\x01" it gives you the hash of the node starting at '\x01' and the hashes of its 256 children. This is important, because I think for this to work, you have to have a baseline for what the tree is going to look like for your particular target block.  I think it gets significantly more complicated if you are aiming for partitions that are from different blocks...

Then there'd be another command that says "RequestBranch | HeaderHash | Prefix | StartNode".  The header hash/height would be included only so that peers that are significantly detached from your state won't start feeding you their data.  i.e. Maybe because they don't recognize your hash, or somehow they are more than 100 blocks from the state you are requesting.  If the peer's state is within 100 blocks, they start feeding you that partition, ordered lexicographically.  They'll probably be transferred in chunks of 1000 nodes, and then you put in the next request using the 1000th node as the start node to get the next chunk.  Since we have branch independence and insert order independence, the transfer should be stupid simple.

Also something to note:  I think that the raw TxOut script should be the key for this tree.  Sure, a lot of these will have a common prefix, but PATRICIA trees will compress those anyway.  What I'm concerned about is something like multiple variations of the same address, such as a TxOut using hash160 vs using full public key.  That can lead to stupid side-effects if you are only requesting by addr.  

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
January 02, 2013, 11:37:44 PM
 #247

Just a random update:  I was updating the top post and want a check on my math for the question:

"In a post-Reiner-Tree world, how much data must a lite-node download to know its balance and spend its coins, from scratch?"

Okay, so, I will assume a wallet has 100 addresses each holding 2 UTXOs and no information/history. 

Pre-Req: Need the headers for the main chain and meta-chain:  meta-chain doesn't exist yet, but let's assume this is 2 years after implementation of meta chain (assume starting now).  There will be about 24 MB of main chain headers, and 8 MB of meta-chain.  Total:  about 30 MB.  (NOTE: If this is eventually integrated into the main-chain coinbase headers (at the risk of invalid blocks if it's wrong), then there is no extra data to download.

Also note, this data may simply be bundled with the app, so it doesn't really feel like a "download" to them.  It will be longer to download the app, but start up basically instantly.  For the remainder of this discussion: I assume that we've already downloaded the headers.

Balance verification info:  Before you even download your own balance and UTXO list, you need to collect and verify the tree information pointing to your addresses.  Once you have that, you can download the data itself and know you got the right thing.

Here I assume d'aniel's idea about merkle trees at each level.  That means the for a single address, I need to download 15 hashes at each tree level to verify that tree-node:  480 bytes per node.  For efficiency, I'll always just download all 256 hashes at the top level (8kB) and the 480 bytes at each subsequent level. 

I assume that the network is "big", with trillions of UTXOs, thus having to go down 6 levels to get to my address.
I assume that all 100 addresses have a different starting byte, maximizing the data I have to download by not sharing any tree-nodes below the first level. 
So the amount of data I download is Top Level (8kB) + 5 more levels (480 bytes * 5 levels * 100 addresses).

Total verification data:  ~250 kB

UTXO data itself:  After you get the verification data, you need the actual UTXOs.  We assume 100 addr * 2UTXOs = 200 UTXOs.  Each UTXO consists of its 36-byte OutPoint followed by its value (8B) and TxOut script (~26 B).  So 35 bytes/UTXO * 200 UTXOs = 7 kB.

Total:  30 MB for header data downloaded once.  After that, you can throw away all wallet information and re-download on every application restart for 250 kB/100 addresses.

As I look at this, I realize that this is pretty darned conservative:  since the individual tree-nodes are compressed, the last 2-3 levels will require only 3-7 hashes each, not all 15.   So probably more like 150-200 kB to 200 UTXOs.

Am I neglecting anything?  This seems pretty darned reasonable!

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
January 03, 2013, 12:48:18 AM
 #248

Am I neglecting anything?
Maybe just that for privacy reasons, instead of a request for txouts for a specific address, a lightweight client would probably want to submit a bloom filter to its peer that would yield enough false positives to obfuscate his ownership of the address.
blueadept
Full Member
***
Offline Offline

Activity: 225
Merit: 101


View Profile
January 03, 2013, 02:36:22 PM
 #249

I may have missed you saying it, but in downloading the UTXO data, would it be better to download the entire transaction instead of just the OutPoint? That way, the client can verify the transaction actually matches the hash, and in combination with the header/PoW info and the Merkle branch, that fully authenticates the OutPoint.

Edit: never mind, I see it now. Under the assumption that the validity of the meta chain is enforced as part of the main chain's block validation rules, the meta chain's authentication is enough to verify the validity of the OutPoint.

Like my posts?  Connect with me on LinkedIn and endorse my "Bitcoin" skill.
Decentralized, instant off-chain payments.
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
January 17, 2013, 11:25:02 AM
 #250

I was thinking rather than use a Merkle tree at each trie node for verifying the presence of a child, we could use a bitwise trie instead, since it would have faster updates due to always being balanced and not having to sort.  This would also speed up lookups, since we wouldn't have to traverse through a linked list of pointers at each node.

But then we've just arrived in a roundabout sort of way at the whole trie being essentially a bitwise trie, since the nodes in the original trie with 256 bit keysize overlap exactly with subtries of the bitwise trie.

Was there a reason for not proposing a single bit keysize from the start?  I thought about it a while back, but disregarded it for some reason I can't recall.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
January 17, 2013, 09:04:07 PM
 #251

I was thinking rather than use a Merkle tree at each trie node for verifying the presence of a child, we could use a bitwise trie instead, since it would have faster updates due to always being balanced and not having to sort.  This would also speed up lookups, since we wouldn't have to traverse through a linked list of pointers at each node.

But then we've just arrived in a roundabout sort of way at the whole trie being essentially a bitwise trie, since the nodes in the original trie with 256 bit keysize overlap exactly with subtries of the bitwise trie.

Was there a reason for not proposing a single bit keysize from the start?  I thought about it a while back, but disregarded it for some reason I can't recall.

I thought about the same thing, and disregarded it for some reason, as well.  I don't remember though.  Though, working with raw bits is a bit more complicated than bytes, especially when you have skip strings that are like 13 bits.  But I see the benefit that you don't even really need linked lists at each node, only two pointers.  But you end up with a lot more total overhead, since you have the trienode overhead for so many more nodes...

I'll have to think about this one.  I think there is clearly a benefit to a higher branching factor:  if you consider a 2^32-way Trie, there is a single root node and just N trienodes -- one for each entry (so it's really just a lookup table, and lookup is fast).  If you have a 2-way (bitwise) Trie, you still have N leaf nodes, but you have a ton of other intermediate nodes and all the data that comes with them.  And a lot more pointers and "hops" between nodes to get to your lefa.  It leads me to believe that you want a higher branching factor, but you need to balance against the fact that the branching adds some efficiency (i.e. in the case of using linked lists between entries, it would obviously be bad to have a branching factor of 2**32).


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
January 18, 2013, 02:51:23 AM
Last edit: January 18, 2013, 09:44:52 AM by d'aniel
 #252

But I see the benefit that you don't even really need linked lists at each node, only two pointers.  But you end up with a lot more total overhead, since you have the trienode overhead for so many more nodes...
More specifically, it will need somewhat less than double the amount of nodes, since every node in a 2-way trie has two children - i.e. the same number of extra nodes as the Merkle tree, which we were going to need to include anyway.

Quote
I'll have to think about this one.  I think there is clearly a benefit to a higher branching factor:  if you consider a 2^32-way Trie, there is a single root node and just N trienodes -- one for each entry (so it's really just a lookup table, and lookup is fast).  If you have a 2-way (bitwise) Trie, you still have N leaf nodes, but you have a ton of other intermediate nodes and all the data that comes with them.  And a lot more pointers and "hops" between nodes to get to your lefa.  It leads me to believe that you want a higher branching factor, but you need to balance against the fact that the branching adds some efficiency (i.e. in the case of using linked lists between entries, it would obviously be bad to have a branching factor of 2**32).
Wouldn't the bitwise trie actually require fewer hops, since it doesn't need to traverse through the linked list?  You seem to be saying this and at the same time saying it requires more Smiley

Assuming we were going to be serializing the original nodes and keying them individually in a database, going bitwise shouldn't affect this at all, since we would just prune off a bunch of crap to "zoom out" to a 256-way "macro-node", and conversely build in the details of a macro-node grabbed from storage to "zoom back in" to the bitwise subtrie.

Estimate of the amount of extra overhead this proposal will impose on existing clients

Since the network is switching to using a leveldb store of utxos now, we can easily make this estimate.  I'll ignore for now the fact that we're actually using a tree of tries.  Assuming we're actually storing 256-way macro-nodes, and that the trie is well-balanced (lots of hashes, should be), then for a set of N = 256^L utxos, a branch will have L macro-nodes plus the root to retrieve from/update to disk, instead of just the utxo like we do now.  But it makes sense to cache the root and first level of macro-nodes, and have a "write-back cache policy", where we only do periodic writes, so that the root and first level writes are done only once per batch.  So for a batch size >> 256, we have more like L - 1 disk operations to do per utxo retrieval/update, or L - 2 more than we do now.  That's only one extra disk operation per retrieval/update with L = 3, or N = ~17M utxos total.  Due to the extreme sparsity, it probably doesn't make much sense to permanently cache beyond the first level of macro-nodes (16 levels of bitwise nodes ~2*2^16*(2*32 + 2*8 ) bytes (2*2^16 nodes, 32 bytes per hash, 8 bytes per pointer)  ~ 10MB of permanently cached data), since any macro-nodes in the next level can be expected to be accessed during a batch of, say 10,000 writes, roughly 1/256^2 * 10,000 ~ 0.15 times.  So additional memory requirements should stay pretty low, depending on how often writes are done.

I guess to do it properly and not assume the trie is perfectly balanced, we would have a "LFU cache policy", where we track the frequency with which macro-nodes are accessed, and throw out during writes the least frequently used ones, keeping some desired number of those most frequently used.

Disk operations are almost certainly the bottleneck in this proposal, but they aren't in the main client, so it's possible that this wouldn't add much in the way of noticeable overhead.

Is my math correct?  (There was a small mistake I fixed in an edit.)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
January 18, 2013, 06:36:02 AM
Last edit: January 18, 2013, 09:38:20 AM by d'aniel
 #253

Am I correct in thinking that the skip string and child keys don't really need to be hashed in at each step running up a Patricia trie, since the hash stored at the end of a given branch is of the full word formed by that branch, and inclusion of that hash in the root is proved regardless?

This would simplify the authentication structure a fair bit, allowing us to avoid having to hash weird numbers of bits, and make it not depend on whether or not we're compressing strings of single child nodes with skip strings (I don't know if this would ever be an advantage).
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
February 12, 2013, 05:49:51 PM
 #254

Have you considered that since the hashes are well distributed, you could mostly skip tree balancing.

Basically a PATRICIA-brandais tree without the skip function.  The odds of it helping is pretty low and it adds complexity.

In your trees example, the benefit is that the right hand nodes end up with lower depth.  However, collisions on a sequence of bytes is pretty unlikely, unless you are near the root, and it will be dense there anyway, so you don't get the benefit.

You could also drop the linked list and replace it with a hashmap.  However, most nodes (near the root) will be fully loaded, so maybe better to just use a PATRICIA tree directly.

A compromise would be to have 2 node types.  A node could use an (expanding) hashmap, until it gets to 50% full and then switch to a flat byte array.  However, that is an efficiency thing, and not part of the tree structure.

So, insert a node and then use its bits to determine where to place it, until it gets to a bit that doesn't match any other nodes.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1164


View Profile
February 12, 2013, 07:46:25 PM
 #255

Have you considered that since the hashes are well distributed, you could mostly skip tree balancing.

You can't make that assumption because an attacker might create a bunch of transactions by brute force search that just happen to create an unbalanced tree. If you have a system where hash values determine the balance of the tree, you have no choice but to have some sort of way to measure the amount the tree is out of balance, which might be difficult if not all nodes know the full state of the tree, and prohibit the creation of unbalanced trees. You also need to be careful to ensure that if a tree is at the threshold just prior to being too out-of-balance to be legal there are legal operations that make the tree balanced again so honest miners can fix the problem. Finally fixing an unbalanced tree has to always be cheap.

TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
February 12, 2013, 08:00:05 PM
 #256

You can't make that assumption because an attacker might create a bunch of transactions by brute force search that just happen to create an unbalanced tree.

It would only affect the transactions in question.  Also, creating unbalancing transactions requires generating hashes with the same starts, which is hard to do.

Quote
Finally fixing an unbalanced tree has to always be cheap.

If you don't do balancing, then it doesn't require constant merkle tree rehashes.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1164


View Profile
February 12, 2013, 08:20:15 PM
Last edit: February 12, 2013, 08:36:29 PM by retep
 #257

You can't make that assumption because an attacker might create a bunch of transactions by brute force search that just happen to create an unbalanced tree.

It would only affect the transactions in question.  Also, creating unbalancing transactions requires generating hashes with the same starts, which is hard to do.

There is nothing else in existence that has put as much towards finding statistically unlikely hashes as Bitcoin has! Heck, the Bitcoin network has found hashes with IIRC 66 zero bits.

You just can-not assume that hashes calculated directly from data that an attacker has any control of will be uniformly distributed under any circumstance. Someone will get some GPUs together and write a program to find the right hashes to unbalance your tree just because they can.

TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
February 12, 2013, 08:47:18 PM
 #258

You just can-not assume that hashes calculated directly from data that an attacker has any control of will be uniformly distributed under any circumstance. Someone will get some GPUs together and write a program to find the right hashes to unbalance your tree just because they can.

The effect doesn't really matter though.  It will just slow down accesses for those particular transactions.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1164


View Profile
February 12, 2013, 09:05:39 PM
 #259

Actually, here is an idea that could work: rather than making the generated UTXO set be the state of the UTXO tree for the current block, make it the state of the UTXO tree for the previous block. Then for the purposes of putting the tx in the tree, essentially define the tx hash not as H(d) but as H(b | d) with b equal to the hash of the block where the tx was confirmed. This works because finding a block hash is difficult, and once you have found any valid block hash not using it represents a huge financial hit.

Of course, this doesn't actually work directly, because you usually don't know the block number when a txout was created. So we'll actually do something a bit different:

For every node in your radix tree, store the block number of the tree was last changed. The 0th node, the empty prefix, gets block 0. For the purpose of the radix tree, define the radix hash of the transaction, rh, as rh=H(hn | h) where hn is the hash of that block number, and h is the transaction's actual hash. Ignoring the fact that the transaction hash prefix changed, as can treat the rest of the bytes of the radix tx hash normally to determine which prefix edge is matched, and thus what node to follow down the tree.

Other than having to provide the block hash, proving that a transaction is in the UTXO set is not really any different than before. the proof is still a merkle path, and the security is still based on the infeasibility of reversing a hash function. I think it would be a good idea to add in some of the additional ideas I outlined in https://bitcointalk.org/index.php?topic=137933.msg1470730#msg1470730, but that is true of any UTXO idea.

If you really need to prove a tx was in the UTXO set as of the current best block, either wait for another block to be generated, or prove that it isn't in the UTXO set of the previous block, and prove that it is in the merkle tree of the current best block.

TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
February 12, 2013, 10:30:28 PM
 #260

Actually, here is an idea that could work: rather than making the generated UTXO set be the state of the UTXO tree for the current block, make it the state of the UTXO tree for the previous block. Then for the purposes of putting the tx in the tree, essentially define the tx hash not as H(d) but as H(b | d) with b equal to the hash of the block where the tx was confirmed. This works because finding a block hash is difficult, and once you have found any valid block hash not using it represents a huge financial hit.

This requires the tree to be completely regenerated for each block, since the hash of every transaction would change.

The attack on the system I suggested would require having lots of transactions that has the same initial bits in their hash.

If you had 1 million of them and added them to the tree, then it would become quite deep.  Each time you add one, the nodes would have to recalculate the entire path from root to that node and redo all the hashing.  The effort per node would be proportional to N squared, where N is the number of collisions.

A protection on that would be ignoring some transactions.

For example, if you have a binary tree, then the expected depth is log2(transactions).

With random hashes, it becomes very unlikely that any nodes would be much deeper than that.

A depth cutoff could be added.  If a branch is more than log2(transactions) * 4 deep, then it hashes to 0, so all nodes below it can be ignored.

This means that they cannot be verified as valid.

Quote
Of course, this doesn't actually work directly, because you usually don't know the block number when a txout was created. So we'll actually do something a bit different

You have to ask for proof anyway, the node could say which block the tx was added to the tree.

So, the hash used would be hash(hashBlock, hashTransaction), where block is the block where the tx was added to the chain.

All unspent txs are <block, tx> pairs anyway, since you need to scan back to the block to get the sig script to check for spending.

You won't know what the block hash is until you generate the transactions, so I think this protects it?

Quote
For every node in your radix tree, store the block number of the tree was last changed.

Ideally, the tree should just depend on what transactions are contained.

Quote
If you really need to prove a tx was in the UTXO set as of the current best block, either wait for another block to be generated, or prove that it isn't in the UTXO set of the previous block, and prove that it is in the merkle tree of the current best block.

You will probably have to do something like that anyway.  The plan was to merge mine, so some bitcoin blocks won't have matching tree root node hashes.  So, prove it wasn't spent up to 10 blocks previous and then check the last 10 blocks manually.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 [13] 14 15 16 17 18 19 20 21 22 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!