Bitcoin Forum
December 11, 2024, 10:33:24 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 [15] 16 17 18 19 20 21 22 »  All
  Print  
Author Topic: Ultimate blockchain compression w/ trust-free lite nodes  (Read 87947 times)
hazek
Legendary
*
Offline Offline

Activity: 1078
Merit: 1003


View Profile
April 20, 2013, 06:44:27 PM
 #281

And wouldn't this also remove tractability since now other nodes only have a fingerprint of the transaction(s) with which you received your coins and not the entire history anymore? I like this idea a lot on the surface.

My personality type: INTJ - please forgive my weaknesses (Not naturally in tune with others feelings; may be insensitive at times, tend to respond to conflict with logic and reason, tend to believe I'm always right)

If however you enjoyed my post: 15j781DjuJeVsZgYbDVt2NZsGrWKRWFHpp
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
April 20, 2013, 06:48:45 PM
 #282

Right.  Just use the transaction hash directly as key and accept that there might be imbalances.  However, the imbalances are not really going to happen because you are using a hash (effectively random) value as key.  So the law of large numbers does tree balancing for you.

Just to clarify:  tries/patricia trees/de la brandais trees do not have balancing issues.  They are all tightly bounded to a maximum number of operations to do queries, inserts, deletes.  It's just that the optimizations of PATRICIA/Brandais bring you far below that constant upperbound.  Thus, "unbalancing" simply removes optimization, but you're still operating well within the confines of constant time, no matter what the tree structure looks like.  

The distinction only matters if there was reason to believe that those optimizations are necessary to make this idea feasible.  I do not believe that is the case, here.   In terms of access times, I believe even a regular-old trie (forcing full traversal of each path) would still work. 


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
April 21, 2013, 12:45:46 AM
 #283

Another feature (or disadvantage) is that it allows dropping of extra info added into the blockchain.

For the system to work, all you need is lots of sha(sha(value)) to value mappings.  The values are always 2X36 bytes.  Values are always "hash(child1);coins(child1);hash(child2);coins(child1)".

This means that there is no bloat.  It is up to the coin owner to keep the full transaction data and they only submit it for spending.

You can still timestamp documents, but not add data to the blockchain as a permanent record.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
April 22, 2013, 04:58:00 PM
 #284

gmaxwell pointed out the obvious flaw in this proposal:  you can supply the input branches to prove that the TxOuts that you are spending exist, but you can't supply the destination branches.  Otherwise, full nodes have no idea how to update the sub-branches of the target address.  Even if they know that this is the first UTXO for that address, there may be lots of other branches on the way down to that node which are unknown to it.

There's not a way around this, other than just having full nodes store the entire trees.  Which means we're back to square one Sad

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
April 22, 2013, 05:26:47 PM
 #285

There's not a way around this, other than just having full nodes store the entire trees.  Which means we're back to square one Sad

That info has to be stored.  I see it that the live tree (and maybe 50-100 blocks of history) needs to be stored.

However, you could do it in a distributed fashion.  Every node in the tree has to be stored somewhere.

The spender could provide the old path and a new path that was correct within the last 50 steps.  The top of the tree, which would be change every block would be live for all full nodes anyway.

You only have to look at transactions that start with the same prefix as yours to see if the hash to the root changes.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
May 02, 2013, 09:48:23 PM
 #286

I had a little revelation last night, while thinking about this proposal.  In hindsight, it seems so simple.  But hindsight is always 20/20, right?  My thought process was:  I've implemented RAM-based PATRICIA trees before, but what's a good way to do this on disk?  For instance, I want to implement this in LevelDB, so I need some way to make LevelDB behave like a memory space.

One of the other issues with the PATRICIA/hybrid approach is the that there's a lot of data needed to store pointer lists, etc.  It does have quite a bit of overhead.  And you don't want to optimize it in such a way that limits the generic-ness of the structure. I'd prefer to maintain the textbook-generic-ness of this data-structure, and let implementations do their own optimizations as long as they can convert and reproduce the same calculations.  

The revelation was that you don't need to replicate a memory space with abstract pointers to each trie-node and leaf.  You can store them based on their node-prefix value, and the DB will auto-sort the values in depth-first-search order.  For instance, let's take this structure:



All you need to is store everything by its prefix.  Here's what the DB entries would look like:

Quote
Key -> Value
""     -> RootHash, SumValue, 3, "1", "3", "6"
"1"    -> NodeHash, SumValue, 2, "1", "3"
"11"   -> NodeHash, SumValue, 2, "2", "3"
"1122" -> LeafHash, Value
"1137" -> LeafHash, Value
"1342" -> LeafHash, Value
"3333" -> LeafHash, Value
"678"  -> NodeHash, SumValue, 3, "0", "5", "9"
"6780" -> LeafHash, Value
"6785" -> LeafHash, Value
"6789" -> LeafHash, Value

Each "numChildren" value (after the SumValue) can be exactly one byte, because you never have more than 256 ptrs, and each child pointer is also exactly 1 byte.  If you want to jump to a particular child, for instance, you are at node "11" and want to go the child at 3, you simply do iter->Seek("11"+"3") and it will skip "1122" and put the iterator right at "1137", which is the first database value >= "113".


Furthermore, you might be able to get away without even any pointers!  You might just store the node/leaf hash and value, and know about children after the fact, simply by continuing your iteration.  You are at IterA, and IterB=IterA.Next().   You know that IterB is a child node of IterA because IterB.key().startswith(IterA.key()).   That's stupid simple.  

So, you know what level you're at simply by looking at Iter.size()
So, you know that you are a child because IterNext.key().startswith(IterPrev.key()).
If the previous check fails, you know you finished traversing that branch and you can update IterPrev.

Though, there may be something I'm missing that would still require you to store the pointers.  But it's still a lot better than storing 6-8 bytes per pointer, which was originally where I thought the bulk of the data was originally going to end up.

Even better, you don't really have to implement the minutiae of the PATRICIA tree, because it's kind of done automatically by the nature of a key-sorted database.  The database inserts everything in the correct place for you, and it just so happens that tries and PATRICIA trees get iterated the same way, without having to store structure information.  On the contrary, a depth-first search on a BST will also be sorted this way but you have to store data at each node about the local structure of the tree, and update all the nearby nodes if there's a rebalance.  Since the PATRICIA tree has a deterministic structure based solely on the inclusive set, you can insert and remove nodes without any extra seek/updates, and natural iteration over the dataset will result in the right answer as if you implemented a full PATRICIA tree.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
ThomasV
Legendary
*
Offline Offline

Activity: 1896
Merit: 1353



View Profile WWW
May 03, 2013, 09:50:20 AM
 #287

subscribing

Electrum: the convenience of a web wallet, without the risks
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
May 05, 2013, 11:48:09 AM
 #288

I had a little revelation last night, while thinking about this proposal.  In hindsight, it seems so simple.  But hindsight is always 20/20, right?  My thought process was:  I've implemented RAM-based PATRICIA trees before, but what's a good way to do this on disk?  For instance, I want to implement this in LevelDB, so I need some way to make LevelDB behave like a memory space.

Assuming there are 4 billion UTXOs, that means that the tree will be dense for the first 32 bits on average.  All leaf nodes will have 256 - 32 = 224 bits of data each.

If you just store all the transaction hashes in the tree in full, then you need 32 bytes per entry, instead of 28 bytes, so you aren't really saving much.

Having a fixed 32 bytes per entry would mean that the file has fixed width entries, which would make seeking easier.

The only exception are outputs for the same transactions.  Each leaf could have a list of outputs and how much coin in each.  This breaks the fixed field length though.

The UTXO-id would be {tx-hash, out-index, value}.

You effectively save

{tx-hash, total-value}, {out-index, value}, .... {out-index, value}, {end-delimiter}

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
hazek
Legendary
*
Offline Offline

Activity: 1078
Merit: 1003


View Profile
May 05, 2013, 01:48:48 PM
 #289

For me, this is the most exciting thread on this forum.

My personality type: INTJ - please forgive my weaknesses (Not naturally in tune with others feelings; may be insensitive at times, tend to respond to conflict with logic and reason, tend to believe I'm always right)

If however you enjoyed my post: 15j781DjuJeVsZgYbDVt2NZsGrWKRWFHpp
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
May 05, 2013, 10:45:12 PM
 #290

For me, this is the most exciting thread on this forum.

Smiley I've actually received some pressure to start implementing this myself, with some urgency.  I have resisted solely because I'm totally swamped with other things, and expect I'll get into it in about 6 months.  And I felt guilty about that, but I have some personal/selfish reasons for that.

But now I don't feel so bad.  It seems like, once every month, I have some revelation about how this could be improved, or solving some aspect of it that I wasn't sure how to solve earlier.  Now I am comfortable with the downloading from unsynchronized peers and/or having multiple blocks generated while downloading that data, and I feel like I have a really good way to encode this with high-space efficiency.  This is making it all the easier for me to imagine implementing this, when I finally have time.  Or maybe someone else will.



Talking about the non-sync'd downloading (link in the previous paragraph), I just wanted to add a comment:  I noticed that LevelDB has read-snapshots, and it looks like other DB engines do, too.  (Do most of them?).  It certainly would simplify this even further. For instance, consider that I ask a node to send me some branch of the tree.  Two new blocks come in since the download started, that cause that peer to update the tree while it is in the process of sending it to me.  In a completely naive system, I would end up with internally inconsistent data, and no good way to get it without getting lucky to have no new blocks while downloading.

However, if you are using a read snapshot, you can essentially freeze the DB state in time so that you can read it's contents without worrying about any updates since it was frozen.  You just throw away the snapshot when you're done.  I assume it does this efficiently, by essentially storing the difference data between when you took the snapshot, and rewind those differences when you retrieve data from the tree.  This makes everything even more feasible.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Come-from-Beyond
Legendary
*
Offline Offline

Activity: 2142
Merit: 1010

Newbie


View Profile
May 10, 2013, 07:03:23 AM
 #291

Could anyone let us know the current progress in implementation of this idea?
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
May 10, 2013, 09:22:51 AM
 #292

Could anyone let us know the current progress in implementation of this idea?

I was thinking of looking into it "soon", but I have lots of other stuff going on.  My though are that it should be a distributed verification system and distributed hash table. 

The official client is going down the path of not allowing random transaction lookup, so the DHT is needed to support that.

Each node would randomly select transactions to verify.  You might set your node to verify only 1% of all transactions (p = 0.01).  When you get a new block with N transactions, you would attempt to verify only p * N of them (though it would be random, so you might verify more or less than that).

My thoughts are that all nodes would verify all branches that they are aware of.  Orphans within say 10k of the end of the chain would be verified.

It just marks blocks as valid and invalid.

The distributed hash table needs to store all transactions and also all internal nodes in all trees that are in use.  It is hash -> children nodes.

When you connect to a node, you tell it what you think is the end of the main chain.  You also give the last 10 blocks and nodes along the way.

For each location, you give

- hash of main chain header
- hash of UTXO root root

You would also be monitoring the main chain so you can find the chain with the longest POW.

You can then try find the fork points (since the power of 2 increase is relative to the start, all nodes would give the same values).  POW disagreements can be fixed by proving to one of the nodes that they aren't on the longest branch.

This should leave all nodes either agreeing or disagreeing based purely on validation.  You ask both nodes to prove the other node's block is invalid.  If a node won't switch to the longest POW branch, then you ask it to prove why.

This means that all nodes should keep a record of valid block headers (i.e. ones that meet POW) and the proof that they are actually invalid blocks.  This shouldn't happen that often, since creating an valid block header for invalid blocks is expensive.

This means that it doesn't even need to be an alt chain.  It is just a system where proof about invalid blocks is stored and shared.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
Evan
Hero Member
*****
Offline Offline

Activity: 507
Merit: 500



View Profile
May 10, 2013, 07:51:23 PM
 #293

This idea has been scattered throughout some other threads, but there is no one place that fully explains the idea with pictures.  I believe this relieves two major problems with the network at once -- compression/pruning, and lightweight node security -- and does so in a non-disruptive way.  I am not positive that this is the right way to go, but it definitely warrants discussion.



Summary:  [SEE ILLUSTRATIONS BELOW]

Use a special tree data structure to organize all unspent-TxOuts on the network, and use the root of this tree to communicate its "signature" between nodes.  The leaves of this tree actually correspond to addresses/scripts, and the data at the leaf is actually a root of the unspent-TxOut list for that address/script.  To maintain security of the tree signatures, it will be included in the header of an alternate blockchain, which will be secured by merged mining.  

This provides the same compression as the simpler unspent-TxOut merkle tree, but also gives nodes a way to download just the unspent-TxOut list for each address in their wallet, and verify that list directly against the blockheaders.  Therefore, even lightweight nodes can get full address information, from any untrusted peer, and with only a tiny amount of downloaded data (a few kB).  

(NOTE:  I have illustrated everything as using straight merkle-trees, but as noted in the downsides/uncertainties section: a variant of the merkle-tree will have to be to used that guarantees efficient updating of the tree.)


(1) Major Benefits:
  • (1a) Near-optimal blockchain compression:  theoretically, the size of the pruned blockchain would be proportional to the transaction volume (thus could go up or down), instead of the entire global history which always increases in size.  In practice, it wouldn't be so clean, but you really won't do any better than this.
  • (1b) Trustless lightweight-node support:  New nodes entering the network for the first time, will only have to download a tiny amount of data to get full, verifiable knowledge of their balance and how to spend it (much of which can be stored between loads).  A single honest peer out of thousands guarantees you get, and recognize, good data.
  • (1c) Perfectly non-disruptive:  There is no main-network protocol or blockchain changes at all.  All the balance-tree information is maintained and verified in a separate blockchain through merged mining.  In fact, it's so non-disruptive, it could be implemented without any core-dev support at all (though I/we would like their involvement)
  • (1d) Efficient tree querying&updating:  The full-but-pruned nodes of the network will be able to maintain this data structure efficiently.  New blocks simply add or remove unspent coins from the tree, and all operations are "constant time and space" (there is an upper limit on how much time and space is required to prove inclusion of, insert, or delete a piece of data, no matter how big the network is)
  • (1e) No user-setup or options:  Unlike overlay networks, achieving full trust does not require finding a trusted node, or subscribing to a service.  Just like the main blockchain -- you find a bunch of random peers and get the longest chain.  This could be bootstrapped in a similar fashion as the main network.

(2) Downsides and Uncertainties:
  • (2a) Complexity of concept:  This is not simple.  It's a second blockchain, requiring merged mining -- though if it is successful and supported by the community, it could be added to the network by requiring that miners compute and include the root hash of this data structure in the coinbase script (just like with block height).  This is entirely feasible, but it could be a bear to implement it.
  • (2b) Uncertainties about lite-node bootstrap data:  Depending on how the data is structured, there may still be a bit of a data for a lite node to download to get the full security of a full node.  It will, undoubtedly, be much less than downloading the entire chain.  But, there is obviously implications if this security comes at the cost of 1 MB/wallet, or 100 MB/wallet (still better than 4GB, as of this writing).  UPDATE: My initial estimate based on the "Hybrid PATRICIA/Brandais Tree" (aka Reiner-Tree), is that a wallet with 100 addresses could verify its own balance with about 250 kB.
  • (2c) [SEE UPDATE AT BOTTOM] Merkle-tree Alternative Needed: Vanilla merkle-trees will not work, because adding or removing single branches is likely to cause complete recomputation of the tree.  But it should be possible to create an alternative with the following properties:
    • Commutative computation:  a node should be able to get the same answer regardless of whether the tree is computed from scratch, or is based on updating a previous tree.
    • O(log(N)) updating: removing or adding a single leaf node should be able to be done in O(log(N)) time.  With a vanilla merkle tree, this is true only if you remove a node and add a node to the same leaf location.

(3)  Assumptions::
  • (3a) Need verifiable tree roots:  I argue that a regular overlay network won't suffice, solely because it's too easy for malicious nodes to spread incorrect data and muck up the network.  If there's enough malicious nodes in an overlay network, it could make lite nodes that depend on it unusable.  I am assuming it is necessary to have a verifiable source for pruned-headers -- a separate blockchain succeeds because correctness of data is required to be accepted.
  • (3b) Merged mining does what we think it does: It is a secure way to maintain a separate blockchain, leveraging existing mining power.  
  • (3c) Efficient sorting:  Leaf nodes of the main tree will have to be sorted so that all nodes can arrive at the same answer.  However, this can be done using bucket-sort in O(N) time, because the leaf nodes are hashes which should be uniformly distributed.



Alt-Chain Merkle Tree construction:

-- For each address/script, collect all unspent-TxOuts
-- Compute merkle root of each TxOut tree
-- Sort roots, use as leaf nodes for a master-merkle-tree.  
-- Include merkle-root of master tree in alternate chain header.





Getting your balance:

-- Download headers of both chains
-- Request unspent-TxOut-hash list.  
-- Compute sub-merkle root for this address
-- Request secondary-branch nodes  (O(log(N))
-- Compute master root; compare to block header
-- Request the full TxOuts for each unspent-TxOut-hash above





Alternate Chain:
All data is included on the alternate blockchain, which is maintained through merged mining on the main chain.  This is only one extra tx per block on the main chain.  That is the full extent of its impact on the main chain, and any nodes that are ignoring/unaware of the alt-chain.





Yes, this is a huge undertaking.  Yes, there's a lot of uncertainties. Yes, I need a new merkle tree structure.
But, this idea would kill two massive birds with one stone (kill two albatrosses with one block?)

Alright, tear it apart!




UPDATE:

After lots and lots of discussion and debate, I believe that the address index should be maintained as a trie-like structure.  Other's have expressed interest in a binary-search tree (BST).  Either way, the structure can be adapted to have the same properties we desire of a merkle tree, but with a lot more flexibility, such as very quick insertion, deletion, querying, updating, etc.  My preference is the creme-de-la-creme of tries -- a hybrid PATRICIA tree (level-compressed trie) and de-la-Braindais tree (node-compressed).  It looks something like this:



The structure would be indexed by TxOut script ("recipient"), and each node is recursively authenticated by the nodes below it.  The uniqueness of the trie structure guarantees that there is exactly one solution for a given set of TxOuts, which also means that only the existing set of TxOuts need to be obtained in order to create the trie (the BST requires replaying all transactions, in order, to have a well-defined internal structure).  For education on trie structures, see my pretty pictures in this post.

Have You seen my topic?  https://bitcointalk.org/index.php?topic=194471.0;topicseen  we should talk

I am poor, but i do work for Coin Smiley
1PtHcavXoakgNkQfEQdvnvEksEY2NvwaLM
ThomasV
Legendary
*
Offline Offline

Activity: 1896
Merit: 1353



View Profile WWW
May 11, 2013, 11:33:49 AM
 #294

I have started to experiment with this idea.
My goal is to add this hash tree to Electrum.

Each "numChildren" value (after the SumValue) can be exactly one byte, because you never have more than 256 ptrs, and each child pointer is also exactly 1 byte.  If you want to jump to a particular child, for instance, you are at node "11" and want to go the child at 3, you simply do iter->Seek("11"+"3") and it will skip "1122" and put the iterator right at "1137", which is the first database value >= "113".

Pointers can also be encoded as bits, using a fixed-size 32 bytes vector (assuming 256 pointers).
Of course variable-length storage would be more efficient, because most nodes will have sparse children, but I don't know if it is really worth the effort.
Indeed, keys will take up to 20 bytes, and node hashes will take 32 bytes anyway, so we're not adding an order of magnitude by using 32 bytes.


Quote
Furthermore, you might be able to get away without even any pointers!  You might just store the node/leaf hash and value, and know about children after the fact, simply by continuing your iteration.  You are at IterA, and IterB=IterA.Next().   You know that IterB is a child node of IterA because IterB.key().startswith(IterA.key()).   That's stupid simple.  

So, you know what level you're at simply by looking at Iter.size()
So, you know that you are a child because IterNext.key().startswith(IterPrev.key()).
If the previous check fails, you know you finished traversing that branch and you can update IterPrev.

Though, there may be something I'm missing that would still require you to store the pointers.  But it's still a lot better than storing 6-8 bytes per pointer, which was originally where I thought the bulk of the data was originally going to end up.

You can indeed do it without pointers, but iterating to find the children of a node can be very long.
And you will need to find the children of a node everytime you update its hash.



Electrum: the convenience of a web wallet, without the risks
Come-from-Beyond
Legendary
*
Offline Offline

Activity: 2142
Merit: 1010

Newbie


View Profile
May 11, 2013, 11:39:28 AM
 #295

After lots and lots of discussion and debate, I believe that the address index should be maintained as a trie-like structure.

It's possible to create a transaction that has no address at all. What is considered the address in this case?
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
May 11, 2013, 04:54:36 PM
 #296

I have started to experiment with this idea.
My goal is to add this hash tree to Electrum.

Each "numChildren" value (after the SumValue) can be exactly one byte, because you never have more than 256 ptrs, and each child pointer is also exactly 1 byte.  If you want to jump to a particular child, for instance, you are at node "11" and want to go the child at 3, you simply do iter->Seek("11"+"3") and it will skip "1122" and put the iterator right at "1137", which is the first database value >= "113".

Pointers can also be encoded as bits, using a fixed-size 32 bytes vector (assuming 256 pointers).
Of course variable-length storage would be more efficient, because most nodes will have sparse children, but I don't know if it is really worth the effort.
Indeed, keys will take up to 20 bytes, and node hashes will take 32 bytes anyway, so we're not adding an order of magnitude by using 32 bytes.

Quote
Furthermore, you might be able to get away without even any pointers!  You might just store the node/leaf hash and value, and know about children after the fact, simply by continuing your iteration.  You are at IterA, and IterB=IterA.Next().   You know that IterB is a child node of IterA because IterB.key().startswith(IterA.key()).   That's stupid simple.  

So, you know what level you're at simply by looking at Iter.size()
So, you know that you are a child because IterNext.key().startswith(IterPrev.key()).
If the previous check fails, you know you finished traversing that branch and you can update IterPrev.

Though, there may be something I'm missing that would still require you to store the pointers.  But it's still a lot better than storing 6-8 bytes per pointer, which was originally where I thought the bulk of the data was originally going to end up.

You can indeed do it without pointers, but iterating to find the children of a node can be very long.
And you will need to find the children of a node everytime you update its hash.

My point was you don't need any pointers at all, and finding the children isn't actually that long since the database is efficient at these kinds of operations.  If you are node "ABCD" and want to go to pointer P, you don't need a pointer to know how to get there.  Just iter->Seek("ABCDP") and you'll end up at the first elemtent equal to or greater than it.  At the deeper levels, the iterators will efficiently seek directly in front of themselves, and may already have your next target in cache already. 

If it starts with "ABCD" you know you are still in a child of ABCD, and if not, you know you are in a parallel branch and can finish processing the "ABCD" node.  Yes, there may be a lot of seek operations, but with the built-in optimizations, there's a very good chance that they will be fast, and because it's a PATRICIA tree, you'll rarely be doing more than 6 such operations to get the branch updated. 

On the other hand, I haven't thought this through thoroughly.  I only know that it seems like you can avoid the pointers altogether which I was expecting to make up the bulk of the storage overhead.  i.e. each node currently will only hold a sum (8 bytes) and its own hash (32 bytes).  If you need the pointers, you could end up 256, 8-byte pointers per node in addition to it, which is actually quite heavy at the higher, denser levels. 

After lots and lots of discussion and debate, I believe that the address index should be maintained as a trie-like structure.

It's possible to create a transaction that has no address at all. What is considered the address in this case?

There's a liitle room for negotation on this topic, but ultimately and "address" is a TxOut script.  In a totally naive world, your "addresses" would just be the exact serialization of the TxOut script -- so a 25-byte "address" for each standard, Pay2Hash160 script.  Or 35 or 67 for pay-to-public-key scripts.    23 bytes for a P2SH script.  And then anything that is non-standard would be simply serialized raw.

However, I don't like this, because a single address ends up with multiple equivalent representation.  Even though pay-to-public-key scripts are rare, there are addresses that use both (such as multi-use addresses that were used for mining and regular transactions).  Even though it's rare, you'd have to ask your peers for 2 different scripts per address (the Pay2Hash160 and PayToPubKey scripts).  I'd almost prefer making special cases for these addresses, given that they are so standard and fundamental to Bitcoin transactions.

So, I would vote for:

{Pay2Hash160, Pay2PubKey65, Pay2PubKey33} all be serialized as 21 bytes:  0x00 + Hash160.  Any Pay2PubKey variants will be bundled under that single key.
{P2SH} scripts will be serialized as 21 bytes:  0x05 + Hash160{script}. 
{EverythingElse} Will simply be the raw script. 

One problem I see with this is that it doesn't make it clean to adopt new standard scripts, without reconstructing the database in the future.  I suppose it wouldn't be the end of the world, but we also don't want to make an inflexible protocol decision.  This isn't just personal preference for storing address/scripts, it's actually describing the authenticated structure of the Reiner-tree.  So if we would be adding a new std script type, and we'd want a short form of it to store in the DB, we'd have to update the "protocol".  If this had been adopted already, that would be a hard fork.   If we just do raw scripts all around, this isn't really a problem, except that we may have to ask for extra branches to make sure we get all possible variants of a single public key.


@ ThomasV

I noticed you asked something about "SumValue" before.  I don't know if you got the question answered, but the idea was to recursively store the sum-of-value of each sub-branch, and have it authenticated along with the hashes.  Quite a few users, including gmaxwell (who was original only luke-warm on this whole idea), determined that was an extremely valuable addition to this spec to deal with miners who lie about their reward knowing that the network is made up almost entirely of lite-nodes who have no way to determine otherwise.  But those lite nodes know what the total coins in circulation should be, and thus would only have to look at the root-sum-value to determine if someone cheated. 

I don't know if I completely captured that concept.  I'm sure someone like gmaxwell or d'aniel can jump in and explain it better.  But it is an easy add-on to the original idea.  And also makes it possible to simply query your balance without downloading all the raw TxOuts (though, if you are using each address once, that doesn't actually save you a lot).


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1012


View Profile
May 12, 2013, 03:10:44 PM
 #297

So, I would vote for:

{Pay2Hash160, Pay2PubKey65, Pay2PubKey33} all be serialized as 21 bytes:  0x00 + Hash160.  Any Pay2PubKey variants will be bundled under that single key.
{P2SH} scripts will be serialized as 21 bytes:  0x05 + Hash160{script}. 
{EverythingElse} Will simply be the raw script. 

One problem I see with this is that it doesn't make it clean to adopt new standard scripts, without reconstructing the database in the future...

Why not hash160(txout.scriptPubKey)? I had assumed from the beginning that's what we'd be doing. "Addresses" are a UI issue - the protocol should only concern itself with scripts.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
ThomasV
Legendary
*
Offline Offline

Activity: 1896
Merit: 1353



View Profile WWW
May 12, 2013, 04:16:44 PM
 #298

So, I would vote for:

{Pay2Hash160, Pay2PubKey65, Pay2PubKey33} all be serialized as 21 bytes:  0x00 + Hash160.  Any Pay2PubKey variants will be bundled under that single key.
{P2SH} scripts will be serialized as 21 bytes:  0x05 + Hash160{script}. 
{EverythingElse} Will simply be the raw script. 

One problem I see with this is that it doesn't make it clean to adopt new standard scripts, without reconstructing the database in the future...

Why not hash160(txout.scriptPubKey)? I had assumed from the beginning that's what we'd be doing. "Addresses" are a UI issue - the protocol should only concern itself with scripts.

+1
this is also what I have assumed

Electrum: the convenience of a web wallet, without the risks
ThomasV
Legendary
*
Offline Offline

Activity: 1896
Merit: 1353



View Profile WWW
May 12, 2013, 04:22:40 PM
 #299

My point was you don't need any pointers at all, and finding the children isn't actually that long since the database is efficient at these kinds of operations.  If you are node "ABCD" and want to go to pointer P, you don't need a pointer to know how to get there.  Just iter->Seek("ABCDP") and you'll end up at the first elemtent equal to or greater than it.  At the deeper levels, the iterators will efficiently seek directly in front of themselves, and may already have your next target in cache already.  

If it starts with "ABCD" you know you are still in a child of ABCD, and if not, you know you are in a parallel branch and can finish processing the "ABCD" node.  Yes, there may be a lot of seek operations, but with the built-in optimizations, there's a very good chance that they will be fast, and because it's a PATRICIA tree, you'll rarely be doing more than 6 such operations to get the branch updated.  

no, you need to know the list of children in order to compute the hash of a node.
if you don't store pointers at all, you'll need to perform 256 iter.seek() and iter.next() operations per node, only to know its list of children

Quote
On the other hand, I haven't thought this through thoroughly.  I only know that it seems like you can avoid the pointers altogether which I was expecting to make up the bulk of the storage overhead.  i.e. each node currently will only hold a sum (8 bytes) and its own hash (32 bytes).  If you need the pointers, you could end up 256, 8-byte pointers per node in addition to it, which is actually quite heavy at the higher, denser levels.  

you only need 1 bit per pointer (true iff a child node exists), that's 32 bytes.

Electrum: the convenience of a web wallet, without the risks
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
May 12, 2013, 04:33:49 PM
Last edit: May 12, 2013, 05:00:14 PM by etotheipi
 #300

My point was you don't need any pointers at all, and finding the children isn't actually that long since the database is efficient at these kinds of operations.  If you are node "ABCD" and want to go to pointer P, you don't need a pointer to know how to get there.  Just iter->Seek("ABCDP") and you'll end up at the first elemtent equal to or greater than it.  At the deeper levels, the iterators will efficiently seek directly in front of themselves, and may already have your next target in cache already.  

If it starts with "ABCD" you know you are still in a child of ABCD, and if not, you know you are in a parallel branch and can finish processing the "ABCD" node.  Yes, there may be a lot of seek operations, but with the built-in optimizations, there's a very good chance that they will be fast, and because it's a PATRICIA tree, you'll rarely be doing more than 6 such operations to get the branch updated.  

no, you need to know the list of children in order to compute the hash of a node.
if you don't store pointers at all, you'll need to perform 256 iter.seek() and iter.next() operations per node, only to know its list of children

I don't think so.  If you are at ABCD and it has only 3 children, "ABCDE" "ABCDP" and "ABCDZ", there's still only 3 seeks.  You seek for "ABCDA", and the iterator ends up at ABCDE (which is the first element equal-to-or-greater-than your seek value).  So you know that's the first child, and that there's no point in seeking for "ABCDB", "ABCDC" etc.  Then your next seek is "ABCDF", which puts you at "ABCDP".  Rinse and repeat.  


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 [15] 16 17 18 19 20 21 22 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!